Iterative methods done right (life's too short to write for-loops)
By Lorenzo Stella — Wednesday July 25, 2018 (updated: Tuesday July 31, 2018)
Iterative methods are a class of numerical algorithms that produce a sequence of (hopefully) better and better approximations to a solution of a problem, starting from an initial guess. Function minimization, linear and nonlinear systems of equations, are very often solved with iterative methods (especially when the problem is too large for direct methods to kick in).
On paper, iterative methods are commonly described as loops that somehow generate a sequence of approximations to the problem solution: in fact, that’s the immediate way of translating them into running pieces of code using the programming language of choice. Include a bit of additional logic (some stopping condition to halt the iterations, a few lines to display the algorithm’s status or log it to a file) and one has a pretty decent utility to run experiments.
However, that’s not necessarily the best way of doing the job, especially if you need to code this type of loops over and over again: off-by-one-errors are around the corner, and adding options and features one on top of the other quickly results in spaghetti code (which may not necessarily be a bad thing).
Instead, common patterns that show up when implementing iterative methods are much better exploited using iterables. I’ll illustrate this by examples in Julia, using the conjugate gradient method for positive (semi)definite linear systems as guinea pig.
Note: the examples that follow run on Julia 0.7 (as well as 1.0, but that’s not out yet). It goes without saying that similar things one can do in Python, Java, C++, or whatever language is preferred: in fact, I would highly encourage you to do so in case you have months (or years) ahead of experimenting with iterative methods.
Iterables in Julia
Iterables are objects one can iterate on, like lists or other types of collections.
Unlike collections however, iterables do not hold all elements in memory: instead,
they only need to be able to generate them in sequence, one after the other.
They’re like lazy collections.
In order to make
custom iterable types in Julia,
it is sufficient to identify what the state of the iteration is,
and define the
iterate function returning the next element in the sequence
and updated state.
For instance, suppose we want to compute the following generalized Fibonacci sequence:
The sequence is uniquely determined by and (with , yielding the standard Fibonacci sequence). So our iterable is
In order to unroll the sequence and compute each element, one only needs to keep
track of the pair of the two most recent elements: that
will be the state of our iteration. Then we need to define two
methods for the
iterate(iter::FibonacciIterable)returning the pair
(F0, state0)containing the first element in the sequence and initial state respectively;
iterate(iter::FibonacciIterable, state)returning the pair
(F, newstate)of the next element
Fin the sequence (given
state) and the updated state
As soon as the sequence is over,
iterate should return
nothing (Julia’s “none” value). This is never the case for our Fibonacci
sequences, since they’re infinite.
FibonacciIterable objects are immutable: we would like them not
to change as we iterate, since they identify the sequence which is being produced.
Instead, all mutations should occur in state updates.
The following definitions give exactly the above described generalized Fibonacci sequence:
We can now loop any
The conjugate gradient method
The conjugate gradient (CG) method solves linear systems
where is a positive semidefinite, symmetric matrix. It is particularly useful when is very large and is sparse, in which case direct methods (Cholesky factorization) are computationally prohibitive. Instead, CG works by only applying matrix-vector products with . Given an initial guess , the method produces a sequence of solution approximations according to the following recurrence:
- Initialize .
- For any do
See here for a detailed derivation of the method.
Let’s now translate CG into a Julia iterable type. The iteration is completely determined by matrix , vector , and the initial guess , so those will compose our iterable objects:
cgiterable constructor we allow for
x0 to be
nothing: in this case
we will assume the initial point to be the zero vector, and save ourselves one
The state of the iteration is composed of vectors , , and .
In addition to that, we will make room for additional stuff, so as to reuse all
possible memory and avoid allocations: vector and scalars
Overall, the state for CG is a bit more complex than in the Fibonacci example,
and keeping it in a
Tuple would be impractical. So let’s give things a name
by defining a custom type for the state:
CGState is defined as
mutable: this is because we will overwrite
the iteration state rather than allocate new objects, again for efficiency reasons.
The actual computation is carried out by the
Rather than the sequence of , we yield the sequence of states of the algorithm, since that contains all information that may be needed when experimenting (including itself).
Given the simplicity of the method, it is easy to check that the iterable type we just defined is a correct CG implementation. However, there are some apparent shortcomings:
- The produced sequence is infinite, and iterating over it will result in an infinite loop.
- As (that is,
state.rsprev) approaces zero, numerical issues appear due to divisions.
- No useful information is displayed onto the screen.
For the first problem, we can simply impose a maximum number of iterations by doing
This is not best practice: if
k is touched elsewhere in the loop,
then it can be mishandled, change value, and the counting logic be broken.
And in any case: counting iterations and stopping at a prescribed limit
is something one always wants to do, so we better have a correct, robust,
reusable way of doing it.
A cleaner solution is
or, even better
Here we are using some of Julia’s built-in iteration utilities:
enumeratetakes an iterable producing a sequence of
s, and returns an iterable producing pairs
(k, s), where
kis the (1-based) index of
sin the original sequence;
taketakes an additional integer
nand returns a truncated iterable that only yields the first
nelements in the original sequence.
These are iterable wrappers: given an iterable (i.e. a sequence) they wrap a new one around it that somehow extends its behaviour. What’s most important here is that we get to add features on top of our original iteration without ever touching its code.
In this spirit, let us now define more of these iterable wrappers that are useful when working with iterative methods, and nicely address problems 2 and 3 above (among other things) without getting in the way of our CG implementation.
Aside from imposing a maximum number of iterations, one usually wants to stop
the computation as soon as some condition is met. Here the
halt wrapper takes an
iter and a boolean function
fun: a new iterable is returned that
fun to each element of
true is returned, at which point
the iteration stops.
This is a simple but powerful one: for a given sequence, apply some function to
each of its elements, and yield the same original sequence.
I’m calling this wrapper
tee, like the
standard Unix T-pipe command,
because of the apparent analogy.
This can be used to display some summary of the algorithm’s state (number of iterations, most relevant quantities, maybe a progress bar). Or, in the case of distributed algorithms, one can use this to encapsulate a synchronization mechanism through which agents communicate and exchange information. Note that this is conceptually different from callbacks.
Not always you want to do something with all elements in a sequence. Sometimes you want to consider, say, every tenth element and do something about it. For instance, displaying information on the screen at every iteration may be too much overhead if each iteration takes 20 microseconds, or if there is five thousands of them. Or maybe you need to compute some relatively expensive stopping criterion, and you only want to do that every once in a while.
sample wrapper does exactly this. It is pretty similar to the
with a difference: with
sample, the last element in the original sequence
is always included in the filtered sequence. This is very important, since one
is precisely interested in the final state of an iterative algorithm—the
one that triggered the prescribed stopping criterion.
enumerate counts the elements as a sequence unfolds,
stopwatch measures time elapsed from the beginning
Measuring time in the context of numerical algorithms needs no justification, I believe.
Putting it all together
There’s another piece of code that we can factor out and put in our toolbox:
for loop. The following function takes whatever iterable, loops
over it until it’s finished (so it better be finite), and returns the last
We now have all pieces to assemble our CG solver routine.
In the following snippet, the
cg function instantiates a
for the given problem, buries it under the wrappers we described above,
then loops the resulting iterable to get the last state.
For simplicity, the composition of iterables is fixed, but one could think of
parsing here all sorts of options the user may provide (e.g. verbose/silent,
measure time/do not measure time, using multiple termination criteria…)
and cook up the iterable accordingly.
A quick test on a random, small, dense linear system shows the routine in action:
Implementing iterative methods as iterables has some advantages with respect to writing explicit, monolithic for-loops: above all, “core” computations are well isolated from all additional logics and can be modified, optimized, checked for correctness (and unit tested) much more easily. Additional functionalities (stopping criteria, I/O, …) can be wrapped around the core method, as well as tested, and stored for future usage. Ultimately, writing a solver amounts to designing its interface, options, and coding the logic that builds the right iterable.
In this post I’ve outlined a rather minimal set of iterables, and only explored the simple example of CG. However, using this design one should be able to implement and glue together many other kinds of pieces:
- Computing preconditioners (e.g. quasi-Newton or L-BFGS).
- Step-size selection rules (including line-searches).
- Nesterov acceleration (fast gradient methods).
- Restart (e.g. for GMRES or fast gradient methods).
Intuitively, all of these could be implemented using the
tee wrapper defined
above (when, like in the case of
CGIterable, the “elements” that
handles are the mutable states of our iteration) or could be implemented as
separate building blocks. In any case, that’s material for future thinking.