Safe modern for loops
I had the opportunity to accompany many students while they were learning C. Looking at their coding style, I noticed some bad habits that could lead to hard-to-detect bugs in real-world problems. I’ll show the tools that Modern C++ brings that can prevent that kind of problems.
Pre-C++11 for loops
Before discussing the problems themselves, let’s review how for loops work in C++.
In a very basic way, the code
for (/* init_statement */; /* condition */; /* iteration_expression */) {
/* statement */
}
is translated to
{
/* init_statement */;
while ( /* condition */ ) {
/* statement */
/* iteration_expression */;
}
}
Thus, we can call an arbitrary function do_something
for each integer between
0 (inclusive) and 10 (non-inclusive) by using
for (int i = 0; i < 10; ++i)
do_something(i);
We can also use for loops to iterate over containers.
std::vector<int> values( /* ... */ );
for (typename std::vector<int>::iterator it = values.begin();
it != values.end();
++it)
{
do_something(*it);
}
For a better review, I suggest reading this page.
C++11 range-based for loops
Modern C++ revisions (after C++11) bring a much better way to iterate over containers: the range-based for loops.
The last piece of code is equivalent to
std::vector<int> values( /* ... */ );
for (int i : values)
do_something(i);
For a better review, I suggest reading this page.
Problem: Value range
Now, let’s discuss the bad habits that I have seen. Imagine we want to iterate over a vector and call a user-defined function that receives both the index and the value at that position. A faulty solution would be
template <typename T, typename F>
void my_solution(const std::vector<T>& values, const F& f) {
for (int i = 0; i < values.size(); ++i)
f(i, values[i]);
}
Can you spot the bug? Many of us are so used to iterate using int
s that we
forget that it can’t represent all possible index values.
The problem lays in the fact that most likely
std::numeric_limits<int>::max() < std::numeric_limits<typename std::vector<T>::size_type>::max()
std::vector<T>::size_type
is the type that can hold any index or size of vectors.
Although many implementations use std::size_t
, there is no such guarantee.
Thus, a better generic solution is
template <typename T, typename F>
void my_solution(const std::vector<T>& values, const F& f) {
for (typename std::vector<T>::size_type i = 0; i < values.size(); ++i)
f(i, values[i]);
}
Problem: Iterating two containers
Another problem that can happen is regarding the const-correctness of the indices. It doesn’t come exactly from a bad habit, but from a limitation in the traditional for loop.
Consider the same situation as before, but now we want to have a callback for every combination of values in two containers. Based on what we have discussed, a common mistake is
template <typename T, typename F>
void my_solution(const std::vector<T>& a, const std::vector<T>& b, const F& f) {
for (typename std::vector<T>::size_type i = 0; i < a.size(); ++i)
for (typename std::vector<T>::size_type j = 0; j < b.size(); ++i)
f(i, j, a[i], b[j]);
}
In the second for loop, instead of incrementing j
we increment i
causing
a bug that is often hard to detect. A possible cause of the problem here is that
variables have no meaningful name. However, it is so common to iterate using
variables called i
and j
that long meaningful names would be awkward.
An ideal solution would forbid modifications of the iteration variable inside the
for loop. Thus, if we tried to modify the variable i
in the second loop, a compiler
error would be emitted. However, just using const
is not enough. The code
template <typename T, typename F>
void my_solution(const std::vector<T>& a, const std::vector<T>& b, const F& f) {
for (const typename std::vector<T>::size_type i = 0; i < a.size(); ++i)
for (const typename std::vector<T>::size_type j = 0; j < b.size(); ++j)
f(i, j, a[i], b[j]);
}
doesn’t compile since ++i
and ++j
tries to modify the variables.
Solution: cool::indices utility
By using range-based for loops, cool provides a
utility that solves the two mentioned problems. The function cool::indices(n, m)
creates a lazy-evaluated list of indices in the interval \([n, m)\) whose type is
big enough to hold m
and n
(or their own type if they have the same type.)
If only one value is provided, that is cool::indices(m)
is called, the
range goes from 0 (inclusive) to n
(exclusive.)
That brings the solution
template <typename T, typename F>
void my_solution(const std::vector<T>& a, const std::vector<T>& b, const F& f) {
for (const auto i : cool::indices(a.size()))
for (const auto j : cool::indices(b.size()))
f(i, j, a[i], b[j]);
}
which has several advantages:
i
andj
have typestd::vector<T>::size_type
without explicitly writing so;- the compiler would emit an error if one tries to modify
i
andj
; and - there are much fewer occurrences of the variables (no explicit comparison and increment), reducing the chances of mistyping.