Rust's iterators optimize nicely—and contain a footgun

Monday, May 20, 2024

I saw a claim recently that in functional programming using "map/filter iterates over the list twice, while the foreach loop iterates only once." The author continued that "Haskell can fuse maps together as an optimization but I don't think you safely fuse arbitrary map/filters? I dunno."

There are really two claims here:

  • in functional programming, map/filter will do two iterations
  • there is an optimization in Haskell to combine maps, but this may not generalize to arbitrary maps/filters

The first claim is generally linked to whether your language has lazy iterators or not. In Rust (and Python and many others), there are lazy iterators, so we can generally avoid two iterations while doing map/filter.

The second claim gets trickier. In Rust, it appears that this does generalize to arbitrary maps/filters. It depends on the compiler, but it certainly seems doable since it looks like there are clear rewriting rules.

The upshot of all of these is that there's a footgun involved in all of this. The lazy iterator behavior is unintuitive for many when they encounter it, and there's a trap here that has bit many Rust users I've talked to1.

Avoiding multiple passes through lazy iterators and composition

Testing whether or not we have multiple passes is pretty straightforward. We can write a program that prints where it is as it goes.

let xs: Vec<i32> = (1..5).collect();

let ys: Vec<i32> =
    .map(|a| {
        println!("map: {}", a);
    .filter(|a| {
        println!("filter: {}", a);
        a % 2 == 0

println!("ys = {:?}", ys);

If this takes multiple passes over the list, we would expect to see it print all the maps, then print all the filters. If it takes one pass over the list, then we would expect to see it print map and filter steps interleaved.

It turns out, it prints the interleaved version:

map: 1
filter: 1
map: 2
filter: 2
map: 3
filter: 3
map: 4
filter: 4
ys = [2, 4]

The more interesting question though is why this is the case? It's a common thing I run into, the expectation that map will go through the list in full, then again for filter, etc. The fundamental reason is because iterators are lazy. We don't iterate through anything at all until we need it in the end result. So in this example, if we didn't call collect and println! to materialize the list, we would have zero iterations through the list.

Because iterators are lazy, we can do some really cool things. We can iterate over an infinite list, taking only until a condition is met. We can apply a map and filter and other operations to things like network streams!

These hold true in other languages, too: you can go confirm the same thing is true in Python, and it's common across languages as far as I can tell.

Can compilers optimize composed iterators into one for loop?

Now we come to a really fun, meaty question: what can compilers optimize here? In Haskell, as Hillel points out, GHC has an optimization called fusion. What this does, essentially, is transform the program at compile time such that intermediate states of iteration are removed.

Imagine, for example, that you have an expression like this in Rust:

(1..10).filter(|x| x%2 == 0).map(|x| x+1)

The question is: can we take arbitrary compositions of iterators like these and perform an optimization like GHC does with fusion of maps, filters, etc?

Here are three separate programs. The first uses filter and map. The second uses filter_map, a concise form of filtering and mapping2. And the third is what we would write by hand with a for loop.

fn main() {
    let xs = [1, 2, 3, 4, 5];
        .filter(|x| *x % 2 == 0)
        .map(|x| x + 1)
        .for_each(|x| println!("{}", x));
fn main() {
    let xs = [1, 2, 3, 4, 5];
        .filter_map(|x| if *x % 2 == 0 { Some(x + 1) } else { None })
        .for_each(|x| println!("{}", x));
fn main() {
    let xs = [1, 2, 3, 4, 5];
    for x in xs.iter() {
        if x % 2 == 0 {
            println!("{}", x+1);

If Rust is doing optimization similar to GHC's fusion, then these should output similar code without different performance characteristics. If it's not, then we'd expect to see some extra overhead inside each of the iterator-based versions that's not present in the for loop one.

To test this, I used Rust's playground and compiled each to assembly using the release target and the stable channel. Each of them output... the exact same assembly. The end result of each of these programs is the exact same binary.

So: yes. Rust will optimize iterator usage in much the same way that Haskell does. It will combine arbitrary iterator usage and reduce it down to a for loop3. That's pretty neat!

Now, how does it do it? That's beyond my expertise. It happens somewhere in the compiler. I'd love to find that out, though!

What's that footgun you mentioned?

Just like Chekhov's gun, any mentioned footgun will return at the end. This one is a footgun I've seen quite a few Rust programmers run into when they try to parallelize code. I've written about this one previously after I helped someone debug why their code got slower when using iterators.

Basically, they'd changed their code from this for-loop version:

fn main() {
    let mut handles = Vec::new();

    for i in 0..10 {
        let handle = do_work(i);

    for handle in handles {

Into this iterator version:

fn main() {
        .for_each(|handle| {

The rationale was along the lines of "using iterators is more idiomatic in Rust." The footgun here was that since iterators compose together, we changed the semantics of the program: Where we previously had two iterations through the list, one to populate it with thread handles and one to join those handles, we now have just one iteration to create and immediately join the handles.

The equivalent code instead would be this:

fn main() {
    let handles: Vec<_> = (0..10).map(do_work).collect();
    handles.for_each(|handle| { handle.join() });

Composition of iterators is incredibly powerful, and iterators are a fantastic tool in the Rust programmer's tool belt. And you have to remember that they do compose, so if you want multiple passes through a list, you have to ask for that explicitly.

Huge thanks to Erika and Hillel Wayne for feedback and encouragement on a draft of this post!


The footgun is also possible in other languages, to be clear. However, it seems to be more prevalent in Rust than, say, Python or Go, because Rust's affordances for functional programming patterns make the footgun code particularly easy to write, but hard in those other languages.


Notably, this isn't provided for a performance optimization but to let us write more concise code.


I tried this out on a few other examples, too. Here's one using fold or the same as a for loop. It also held true in the string and stdin-based examples I tried. If you find one where it doesn't, let me know!

If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts and support my work, subscribe to the newsletter. There is also an RSS feed.

Want to become a better programmer? Join the Recurse Center!
Want to hire great programmers? Hire via Recurse Center!