A second post on Go silliness (Miguel, aren’t you a C++ programmer?): in 1.23, Go finally added custom iterators. Now, back when I was at Google and involved in the Go compiler as “the annoying Rust guy who gets lunch with us”, there were proposals suggesting adding something like this, implemented as either an interface or a func:
This is not what Go did. No, Go did something really weird. And the implementation is incredible.
What’s an Iterator?
An iterator, in the context of programming language design, is a special type of value that can be used to walk through a sequence of values, without necessarily materializing the sequence as whatever the language’s array type is.
But, a proper iterator must fit with the language’s looping construct. An iterable type is one which can be used in a for-each loop, such as C++’s for (T x : y)
or Python’s for x in y
(modern languages usually only have a for-each loop as their only for
loop, because C-style for loops are not in anymore).
C++ Iterator Pairs
Every language defines a desugaring that defines how custom iteration works in term of the more primitive loops. For example, in C++, when we write for (T x : y) { ... }
(called a range-based for loop, added in C++11), desugars as follows1:
break
, continue
, and return
inside of the loop body require no special handling: they Just Work, because this is just a plain ol for loop.
This begin and end weirdness is because, if the iterator backs an actual array, begin and end can just be pointers to the first element and one-past-the-end and this will Just Work. Before C++11, the convention for C++ iterators was to construct types that imitated pointers; you would usually write loops over non-array types like this:
C++ simply codified common (if gross) practice. It is very tedious to implement C++ iterators, though. You need to provide a dummy end iterator, you need to provide some kind of comparison operator, and iterators that don’t return a reference out of operator*()
are… weird.
Begin and end can be different types (which is how C++20 ranges pretend to be iterable), but being able to query done-ness separately from the next value makes implementation annoying: it means that an iterator that has not begun iteration (i.e., ++
has not been executed yet, because it occurs in the loop’s latch, not its header2) needs to do extra work to answer != end
, which usually means an extra bool to keep track of whether iteration has started or not.
Here’s what writing an iterator (that is also an iterable usable in a range for-loop) over the non-zero elements of a std::span<const int>
might look like.
In this case, operator==
is not const
, which is a bit naughty. Purists might argue that this type should have a constructor, which adjusts ints
to point to the first non-zero element on construction, and operator++
to perform the mutation. That would look like this:
std::sentinel_for
(C++’s iterator concepts are terribly named) really wants operator==
to be const
, but I could have also just marked ints
as mutable
to avoid that. It it’s not already clear, I really dislike this pattern. See here for some faffing about with C++ iterators on my part.
Java Also Got This Wrong
At least Java provides a standard iterable interface, thankfully.
The desugaring of for (T x : y) { ... }
is then:
Do you see the problem here? Although Java now provides a standard interface, doesn’t require annoying equality comparisons, and doesn’t require an end value, these things are still a pain to implement! You still need to be able to query if you’re done before you’ve had a chance to step through the iterator.
Like before, suppose we have an int[]
, and we want to yield every non-zero value in it. How do we construct an iterator for that?
What a pain. Java’s anonymous classes being wordy aside, it’s annoying and error-prone to do this: it’s tempting to accidentally implement hasNext
by simply checking if the array is empty. (Aside, I hate that xs.length
throws on null arrays. Just return zero like in Go, c’mon).
Also, it’s no a single-abstract-method interface, so I can’t use a lambda to create an iterator.
At least break
, continue
, and return
Just Work, because the underlying operation is a for loop like before.
Rust Does It Better
Rust also has a standard iterable interface.
The desugaring for for x in y { ... }
is reasonably straightforward, like in Java:
This is so straightforward that it’s not so unusual to write it yourself, when you don’t plan on consuming the entire iterator. Alternatively, you can partially iterate over an iterator by taking a mutable reference to it. This is useful for iterators that can yield their remainder.
break
, continue
, and return
work in the obvious way.
The interface solves the problems C++ and Java had very cleanly: next
both computes the next item and whether the iterator has more elements. Rust even allows iterators to resume yielding Some
after yielding None
, although few algorithms will make use of this.
Implementing the non-zero iterator we’ve been writing so far is quite simple:
However, this can be written far more simply3 using iterator combinators:
It requires a little bit of effort to implement some iterators, but most of the common cases are easy to put together with composition.
Python iterators are basically the same thing, but there’s no interface to implement (because Python doesn’t believe in type safety). Lua iterators are similar. The Rust pattern of a function that returns the next item (or a special end-of-sequence value) is relatively popular because of this simplicity and composability, and because they can model a lot of iteration strategies.
So, What Did Go Do?
Well. Go has a range for syntax like many other languages. The syntax looks like this:
The x
can be a list of places, and the :=
can be plain assignment, =
. You can also write for range y { ... }
if the iteration values aren’t needed.
The behavior of this construct, like many others in Go, depends explicitly on the type after range
. Each range iteration can yield zero or more values; the
These are:
- For
[]T
,[n]T
, and*[n]T
, each step yields an index of the slice and the value at that offset, in order. - For
map[K]V
, each step yields a key and a value, in a random order. -
For
<- chan T
, it desugars into -
Starting in Go 1.22, ranging on an integer type would desugar into
All of these desugars are essentially still just loops, so break
, continue
, goto
, and return
all work as expected.
But, how do custom types, like weird map types, implement iteration? The usual4 implementation is sync.Map.Range
, which looks like this:
This function will call yield
for each element in the map. If the function returns false
, iteration will stop. This pattern is not uncommon, but sometimes libraries omit the bool
return (like container/ring.Ring.Do
). Some, like filepath.WalkDir
, have a more complex interface involving errors.
This is the template for what became rangefuncs, a mechanism for using the for-range syntax with certain function values.
Rangefuncs
The word “rangefunc” does not appear in Go’s specification. It is a term used to refer to them in some documentation, within the compiler, and in the runtime.
A rangefunc is any function with one of the following signatures:
func(yield func() bool)
func(yield func(V) bool)
func(yield func(K, V) bool)
They work like sync.Map.Range
does: the function calls yield
(hereafter simply called “the yield”) for each element, and stops early if yield returns false. The iter
package contains types for the second and third of these:
For example, the slices
package provides an adaptor for converting a slice into an iterator that ranges over it.
So. These things are actually pretty nuts. They break my brain somewhat, because this is the opposite of how iterators usually work. Go calls what I’ve described all the other languages do a “pull iterator”, whereas rangefuncs are “push iterators”.
They have a few obvious limitations. For one, you can’t do smart sizing like with Rust or C++ iterators5. Another is that you can’t easily “pause” iteration.
But they do have one advantage, which I think is the real reason Go went to so much trouble to implement them (and yes, I will dig into how insane that part is). Using push iterators by default means that users “only” need to write an ordinary for loop packaged into a function. Given that Go makes major performance sacrifices in order to be easy to learn6, trying to make it so that an iterator packages the actual looping construct it represents makes quite a bit of sense.
Rangefuncs are actually really cool in some respects, because they enable unusual patterns. For example, you can use a rangefunc to provide RAII blocks.
Being a block that you can put an epilog onto after yielding a single element is quite powerful! You can also use a nilary rangefunc to simply create a block that you can break out of, instead of having to use goto
.
So wait. You can return out of rangefunc loops. That means that… Go has non-local returns?!
Go Now Has Non-Local Returns
The desugaring for rangefuncs is very complicated. This is because break
, continue
, goto
, and return
all work in a rangefunc! How does this work? Let’s Godbolt it.
Let’s start with something really basic: a loop body that just calls a function.
This produces the following assembly output (which I’ve reformatted into Intel syntax, and removed some extraneous ABI things, including a writer barrier where (*)
is below).
This is a lot to take in, but if we look carefully, we decompile this function into a Go function:
Go will actually enforce invariants on the yield it synthesizes in a range for, in order to catch buggy code. In particular, __state
escapes because s
is an arbitrary function, so it gets spilled to the heap.
So, what happens when the loop body contains a break? Consider:
I’ll spare you the assembly listing, since it’s very similar, so I’ll just reverse-engineer the output directly:
Non-local returns are much more complicated. Consider:
The resulting assembly is something like this, with some irrelevant code, such as write barriers, removed:
Try to reverse engineer this yourself, if you like! If you write this out as Go, here’s what you get:
The reason __next
is an int is because it is also used when exiting the loop via goto
or a break
/continue
with label. It specifies where to jump to after the call into the rangefunc returns. Each potential control flow out of the loop is assigned some negative number.
The precise details of the lowering have been exquisitely documented by Russ Cox and David Chase, the primary implementers of the feature.
You might be curious what runtime.panicrangestate
does. It’s pretty simple, and it lives in runtime/panic.go
:
If you visit this function in runtime/panic.go, you will be greeted by this extremely terrifying comment from Russ Cox immediately after it.
This raises one more thing that works in range funcs, seamlessly: defer
. Yes, despite the yield executing multiple call stacks away, possibly on a different goroutine… defer
still gets attached to the calling function.
Go Now Has Non-Local Defer
The way defer works is that each G (the goroutine struct, runtime.g
) holds a linked list of defer records, of type _defer
. Each call to defer
sticks one of these onto this list. On function return, Go calls runtime.deferreturn()
, which essentially executes and pops defers off of the list until it finds one whose stack pointer is not the current function’s stack pointer (so, it must belong to another function).
Rangefuncs throw a wrench in that mix: if myFunc.range-n
defers, that defer has to be attached to myFunc
’s defer records somehow. So the list must have a way of inserting in the middle.
This is what this comment is about: when defer
occurs in the loop body, that defer gets attached to a defer record for that function, using a token that the yield captures; this is later canonicalized when walking the defer list on the way out of myFunc
. Because the yield can escape onto another goroutine, this part of the defer
chain has to be atomic.
Incredibly, this approach is extremely robust. For example, if we spawn the yield as a goroutine, and carefully synchronize between that and the outer function, we can force the runtime to hard-crash when defer
ing to a function that has returned.
This gets us fatal error: defer after range func returned
. Pretty sick! It accomplishes this by poisoning the token the yield func uses to defer.
I have tried various other attempts at causing memory unsafety with rangefuncs, but Go actually does a really good job of avoiding this. The only thing I’ve managed to do that’s especially interesting is to tear the return slot on a function without named returns, but that’s no worse than tearing any other value (which is still really bad, because you can tear interface values, but it’s not worse).
Pull Iterators and Coroutines
Of course we’re not done. Go provides a mechanism for converting push iterators into pull iterators. Essentially, there is a function that looks like this:
Essentially, you can request values with next()
, and stop()
can be used if you finish early. But also, this spawns a whole goroutine and uses channels to communicate and synchronize, which feels very unnecessary.
The implementation doesn’t use goroutines. It uses coroutines.
Giving Up on Goroutines
Spawning a goroutine is expensive. Doing so expends scheduler and memory resources. It’s overkill for a helper like this (ironic, because the original premise of Go was that goroutines would be cheap enough to allocate willy-nilly).
Go instead implements this using “coroutines”, a mechanism for concurrency without parallelism. This is intended to make context switching very cheap, because it does not need to go through the scheduler: instead, it uses cooperative multitasking.
The coroutine interface is something like the following. My “userland” implementation will not be very efficient, because it relies on the scheduler to transfer control. The goroutines may run on different CPUs, so synchronization is necessary for communication, even if they are not running concurrently.
When we create a coroutine with coro.New()
, it spawns a goroutine that waits on a mutex. Another goroutine can “take its place” as the mutex holder by calling c.Resume()
, which allows the coroutine spawned by coro.New
to resume and enter f()
.
Using the coroutine as a rendezvous point, two goroutines can perform concurrent work: in the case of iter.Pull
, one can be deep inside of whatever loops the iterator wants to do, and the other can request values.
Here’s what using my coro.Coro
to implement iter.Pull
might look like:
If you look at the implementation in iter.go
, it’s basically this, but with a lot of error checking and race detection, to prevent misuse, such as if next
or stop
escape to other goroutines.
Now, the main thing that runtime support brings here is that Resume()
is immediate: it does not go to the scheduler, which might not decide to immediately run the goroutine that last called Resume()
for a variety of reasons (for example, to ensure wakeup fairness). Coroutines sidestep fairness, by making Resume()
little more than a jump to the last Resume()
(with registers fixed up accordingly).
This is not going to be that cheap: a goroutine still needs to be allocated, and switching needs to poke and prod the underlying Gs a little bit. But it’s a cool optimization, and I hope coroutines eventually make their way into more things in Go, hopefully as a language or sync
primitive.
Conclusion
Congratulations, you have survived over 3000 words of me going on about iterators. Go’s push iterators are a unique approach to a common language design problem (even if it took a decade for them to materialize).
I encountered rangefuncs for the first time earlier this year and have found them absolutely fascinating, both from a “oh my god they actually did that” perspective and from a “how do we express iteration” perspective. I don’t think the result was perfect by any means, and it is unsuitable for languages that need the performance you can only get from pull iterators. I think they would be a great match for a language like Python or Java, though.
I’d like to thank David Chase, an old colleague, for tolerating my excited contrived questions about the guts of this feature.
-
Ugh, ok. This is the C++20 desugaring, and there are cases where we do not just call
std::begin()
. In particular, array references and class type references with.begin()
and.end()
do not callstd::begin()
and are open-coded. This means that you can’t use ADL to override these types’ iterator. ↩ -
In compiler jargon, a loop is broken up into three parts: the header, which is where the loop is entered, the body, which is one step of iteration, and the latch, which is the part that jumps back to the start of the body. This is where incrementation in a C-style for loop happens. ↩
-
And with better performance. Rust’s iterators can provide a size hint to help size containers before a call to
collect()
, via theFromIterator
trait. ↩ -
Some people observed that you can use a channel as a custom iterator, by having a parallel goroutine run a for loop to feed the channel. Do not do this. It is slow: it has to transit each element through the heap, forcing anything it points to escape. It takes up an extra M and a P in the scheduler, and requires potentially allocating a stack for a G. It’s probably faster to just build a slice and return that, especially for small iterations. ↩
-
For this reason, I wish that Go had instead defined something along these lines.
This is functionally identical to what they did, but it would have permitted future extensions such as the following interface:
This would mean that
slices.Collect
could be enhanced into something like this.I don’t think there’s an easy way to patch this up, at this point. ↩
-
Disclaimer: I am not going to dig into Go’s rationale for rangefuncs. Knowing how the sausage is made, most big Go proposals are a mix of understandable reasoning and less reasonable veiled post-hoc justification to compensate for either Google planning/approvals weirdness or because the design was some principal engineer’s pony. This isn’t even a Go thing, it’s a Google culture problem. I say this as the architect of Protobuf Editions, the biggest change to Protobuf since Rob’s misguided proto37 experiment. I have written this kind of language proposal, on purpose, because bad culture mandated it.
The purpose of a system is what it does. It is easier to understand a system by observing its response to stimuli, rather than what it says on the tin. So let’s use that lens.
Go wants to be easy to learn. It intended to replace C++ at Google (lol, lmao), which, of course, failed disastrously, because performance of the things already written in C++ is tied to revenue. They have successfully pivoted to being an easy-to-learn language that makes it easy to onboard programmers regardless of what they already use, as opposed to onboarding them to C++.
This does not mean that Go is user-friendly. In fact, user-friendliness is clearly not a core value. Rob and his greybeard crowd didn’t seem to care about the human aspect of interacting with a toolchain, so Go tooling rarely provides good diagnostics, nor did the language, until the last few years, try to reduce toil. After all, if it is tedious to use but simple, that does make it easy to onboard new programmers.
Rust is the opposite: it is very difficult to learn with a famously steep learning curve; however, it is very accessible, because the implementors have sanded down every corner and sharp edge using diagnostics, error messages, and tooling. C++ is neither of these things. It is very difficult to learn, and most compilers are pretty unhelpful (if they diagnose anything at all).
I think that Go has at least realized the language can be a pain to use in some situations, which is fueled in part by legitimate UX research. This is why Go has generics and other recent advanced language features, like being able to use the
for
syntax with integers or with custom iterators.I think that rangefuncs are easy to learn in the way Go needs them to be. If you expect more users to want to write rangefuncs than users want to write complicated uses of rangefuncs, I think push iterators are the easiest to learn how to use.
I think this is a much more important reason for all the trouble that rangefuncs generate for the compiler and runtime than, say, compatibility with existing code; I have not seen many cases in the wild or in the standard library that conform to the rangefunc signatures. ↩
-
But please don’t use proto3. I’m telling you that as the guy who maintained the compiler. Just don’t. ↩