rfcs/text/0000-gen-fn.md

23 KiB

Summary

Reserve the gen keyword in the 2024 edition.

Add gen {} blocks to the language. These implement Iterator by yielding elements. This is simpler and more intuitive than creating a custom type and manually implementing Iterator for that type, which requires writing an explicit Iterator::next method body. This is a change similar to adding async {} blocks that implement Future instead of having to manually write futures and their state machines.

Motivation

The main motivation of this RFC is to reserve a new keyword in the 2024 edition. We will discuss the semantic questions of generators in this document, but we do not have to settle them with this RFC. We'll describe current thinking on the semantics, but some questions will be left open to be answered at a later time after we gain more experience with the implementation.

Writing iterators manually can be very painful. Many iterators can be written by chaining Iterator methods, but some need to be written as a struct and have Iterator implemented for them. Some of the code that is written this way pushes people to avoid iterators and instead execute a for loop that eagerly writes values to mutable state. With this RFC, one can write the for loop and still get a lazy iterator of values.

As an example, here are multiple ways to write an iterator over something that contains integers while only keeping the odd integers and multiplying each by 2:

// `Iterator` methods
fn odd_dup(values: impl Iterator<Item = u32>) -> impl Iterator<Item = u32> {
    values.filter(|value| value.is_odd()).map(|value| value * 2)
}

// `std::iter::from_fn`
fn odd_dup(mut values: impl Iterator<Item = u32>) -> impl Iterator<Item = u32> {
    std::iter::from_fn(move || {
        loop {
            let value = values.next()?;
            if value % 2 == 1 {
                return Some(value * 2);
            }
        }
    })
}

// `struct` and manual `impl`
fn odd_dup(values: impl Iterator<Item = u32>) -> impl Iterator<Item = u32> {
    struct Foo<T>(T);
    impl<T: Iterator<Item = u32>> Iterator<Item = u32> for Foo<T> {
        type Item = u32;
        fn next(&mut self) -> Option<u32> {
            loop {
                let value = self.0.next()?;
                if value.is_odd() {
                    return Some(x * 2)
                }
            }
        }
    }
    Foo(values)
}

// `gen block`
fn odd_dup(values: impl Iterator<Item = u32>) -> impl Iterator<Item = u32> {
    gen {
        for value in values {
            if value.is_odd() {
                yield value * 2;
            }
        }
    }
}

Iterators created with gen return None once they return (implicitly at the end of the scope or explicitly with return). gen iterators are fused, so after returning None once, they will keep returning None forever.

Guide-level explanation

New keyword

Starting in the 2024 edition, gen is a keyword that cannot be used for naming any items or bindings. This means during the migration to the 2024 edition, all variables, functions, modules, types, etc. named gen must be renamed or be referred to via r#gen.

Returning/finishing an iterator

gen blocks must diverge or return the unit type. Specifically, the trailing expression must be of the unit or ! type, and any return statements in the block must either be given no argument at all or given an argument of the unit or ! type.

Diverging iterators

For example, a gen block that produces the infinite sequence 0, 1, 0, 1, 0, 1, ..., will never return None from next, and only drop its captured data when the iterator is dropped:

gen {
    loop {
        yield 0;
        yield 1;
    }
}

If a gen block panics, the behavior is very similar to return, except that next unwinds instead of returning None.

Error handling

Within gen blocks, the ? operator desugars as follows. When its argument returns a value indicating "do not short circuit" (e.g. Option::Some(..), Result::Ok(..), ControlFlow::Continue(..)), that value becomes the result of the expression as usual. When its argument returns a value indicating that short-circuiting is desired (e.g. Option::None, Result::Err(..), ControlFlow::Break(..)), the value is first yielded (after being converted by From::from as usual), then the block returns immediately.

Even when ? is used within a gen block, the block must return a value of type unit or !. That is, it does not return a value of Some(..), Ok(..), or Continue(..) as other such blocks might.

However, note that when ? is used within a gen block, all yield statements will need to be given an argument of a compatible type. For example, if None? is used in an expression, then all yield statements will need to be given arguments of type Option.

Fusing

Iterators produced by gen keep returning None when invoked again after they have returned None once. They do not implement FusedIterator, as that is not a language item, but may implement it in the future.

Reference-level explanation

New keyword

In the 2024 edition we reserve gen as a keyword. Previous editions will use r#gen to get the same features.

Error handling

foo? in gen blocks will stop iteration after the first error by desugaring to:

match foo.branch() {
    ControlFlow::Break(err) => {
        yield R::from_residual(err);
        return;
    },
    ControlFlow::Continue(val) => val,
}

This is the same behaviour that collect::<Result<_, _>>() performs on iterators over Results.

Implementation

This feature is mostly implemented via existing generators, though there are some special cases.

gen blocks

gen blocks are the same as an unstable generator...

  • ...without arguments,
  • ...with an additional check forbidding holding borrows across yield points,
  • ...and with an automatic Iterator implementation.
  • ...do not panic if invoked again after returning

Drawbacks

It's another language feature for something that can already be written entirely in user code.

In contrast to Generator, gen blocks that produce Iterators cannot hold references across yield points. See from_generator which has an Unpin bound on the generator it takes to produce an Iterator.

The gen keyword causes some fallout in the community, mostly around the rand crate, which has gen methods on its traits.

Rationale and alternatives

Keyword

We could use iter as the keyword. I prefer iter because I connect generators with a more powerful scheme than plain Iterators. The Generator trait can do everything that iter blocks and async blocks can do and more. I believe connecting the Iterator trait with iter blocks is the right choice, but that would require us to carve out many exceptions for this keyword as iter is used for module names and method names everywhere (including libstd/libcore). It may not be much worse than gen (see also the unresolved questions). We may want to use gen for full on generators in the future.

Do not do this

One alternative is to keep adding more helper methods to Iterator. It is already hard for new Rustaceans to be aware of all the capabilities of Iterator. Some of these new methods would need to be very generic. While it's not an Iterator example, array::try_map is something that has very complex diagnostics that are hard to improve, even if it's nice once it works.

Users can use crates like genawaiter or propane instead. genawaiter works on stable and provides gen! macro blocks that behave like gen blocks, but don't have compiler support for nice diagnostics or language support for the ? operator. The propane crate uses the Generator trait from nightly and works mostly like gen would.

The standard library includes std::iter::from_fn, which can be used in some cases, but as we saw in the example above, often the improvement over writing out a manual implementation of Iterator is limited.

return statements yield one last element

Similarly to try blocks, trailing expressions could yield their element.

There would then be no way to terminate iteration as return statements would have to have a value that is yielded before terminating iteration.

We could do something magical where returning () terminates the iteration, so this code...

fn foo() -> impl Iterator<Item = i32> {
    gen { 42 }
}

...could be a way to specify std::iter::once(42). The issue I see with this is that this...

fn foo() -> impl Iterator<Item = i32> {
    gen { 42; } // note the semicolon
}

...would then not return a value.

Furthermore this would make it unclear what the behaviour of this...

fn foo() -> impl Iterator<Item = ()> { gen {} }

...is supposed to be, as it could be either std::iter::once(()) or std::iter::empty::<()>().

Prior art

CLU, Alphard

The idea of generators that yield their values goes back at least as far as the Alphard language from circa 1975 (see "Alphard: Form and Content", Mary Shaw, 1981). This was later refined into the idea of iterators in the CLU language (see "A History of CLU", Barbara Liskov, 1992 and "CLU Reference Manual", Liskov et al., 1979).

The CLU language opened an iterator context with the iter keyword and produced values with yield statements. E.g.:

odds = iter () yields (int)
  x: int := 1
  while x <= 20 do
    yield x
    x := x + 2
  end
end odds

Icon

In Icon (introduced circa 1977), generators are woven deeply into the language, and any function can return a sequence of values. When done explicitly, the suspend keyword is used. E.g.:

procedure range(i, j)
  while i < j do {
    suspend i
    i +:= 1
  }
  fail
end

Python

In Python, any function that contains a yield statement returns a generator. E.g.:

def odd_dup(xs):
  for x in xs:
    if x % 2 == 1:
      yield x * 2

ECMAScript / JavaScript

In JavaScript, yield can be used within function* generator functions. E.g.:

function* oddDupUntilNegative(xs) {
  for (const x of xs) {
    if (x < 0) {
      return;
    } else if (x % 2 == 1) {
      yield x * 2;
    }
  }
}

These generator functions are general coroutines. yield forms an expression that returns the value passed to next. E.g.:

function* dup(x) {
  while (true) {
    x = yield x * 2;
  }
}

const g = dup(2);
console.assert(g.next().value === 4);
console.assert(g.next(3).value === 6);

Ruby

In Ruby, yield can be used with the Enumerator class to implement an iterator. E.g.:

def odd_dup_until_negative xs
  Enumerator.new do |y|
    xs.each do |x|
      if x < 0
        return
      elsif x % 2 == 1
        y.yield x * 2
      end
    end
  end
end

Ruby also uses yield for a general coroutine mechanism with the Fiber class. E.g.:

def dup
  Fiber.new do |x|
    while true
      x = Fiber.yield x * 2
    end
  end
end

g = dup
4 == (g.resume 2)
6 == (g.resume 3)

Kotlin

In Kotlin, a lazy Sequence can be built using sequence expressions and yield. E.g.:

fun oddDup(xs: Iterable<Int>): Sequence<Int> {
    return sequence {
        for (x in xs) {
            if (x % 2 == 1) {
                yield(x * 2);
            }
        }
    };
}

fun main() {
    for (x in oddDup(listOf(1, 2, 3, 4, 5))) {
        println(x);
    }
}

Swift

In Swift, AsyncStream is used with yield to produce asynchronous generators. E.g.:

import Foundation

let sequence = AsyncStream { k in
    for x in 0..<20 {
        if x % 2 == 1 {
            k.yield(x * 2)
        }
    }
    k.finish()
}

let semaphore = DispatchSemaphore(value: 0)
Task {
    for await elem in sequence {
        print(elem)
    }
    semaphore.signal()
}
semaphore.wait()

Synchronous generators are not yet available in Swift, but may be something they are planning.

C#

In C#, within an iterator, the yield statement is used to either yield the next value or to stop iteration. E.g.:

IEnumerable<int> OddDupUntilNegative(IEnumerable<int> xs)
{
    foreach (int x in xs)
    {
        if (x < 0)
        {
            yield break;
        }
        else if (x % 2 == 1)
        {
            yield return x * 2;
        }
    }
}

Analogously with this RFC and with async blocks in Rust (but unlike async Task in C#), execution of C# iterators does not start until they are iterated.

D

In D, yield is used when constructing a Generator. E.g.:

import std.concurrency;
import std.stdio: writefln;

auto odd_dup(int[] xs) {
    return new Generator!int({
        foreach(x; xs) {
            if (x % 2 == 1) {
                yield(x * 2);
            }
        }
    });
}

void main() {
    auto xs = odd_dup([1, 2, 3, 4, 5]);
    foreach (x; xs) {
        writefln("%d", x);
    }
}

As in Ruby, generators in D are built on top of a more general Fiber class that also uses yield.

Dart

In Dart, there are both synchronous and asynchronous generator functions. Synchronous generator functions return an Iteratable. E.g.:

Iterable<int> oddDup(Iterable<int> xs) sync* {
    for (final x in xs) {
        if (x % 2 == 1) {
            yield x * 2;
        }
    }
}

void main() {
    oddDup(List<int>.generate(20, (x) => x + 1)).forEach(print);
}

Asynchronous generator functions return a Stream object. E.g.:

Stream<int> oddDup(Iterable<int> xs) async* {
    for (final x in xs) {
        if (x % 2 == 1) {
            yield x * 2;
        }
    }
}

void main() {
  oddDup(List<int>.generate(20, (x) => x + 1)).forEach(print);
}

F#

In F#, generators can be expressed with sequence expressions using yield. E.g.:

let oddDup xs = seq {
  for x in xs do
    if x % 2 = 1 then
      yield x * 2 }

for x in oddDup (seq { 1 .. 20 }) do
  printfn "%d" x

Racket

In Racket, generators can be built using generator and yield. E.g.:

#lang racket
(require racket/generator)

(define (odd-dup xs)
  (generator ()
    (for ([x xs])
      (when (odd? x)
        (yield (* 2 x))))))

(define g (odd-dup '(1 2 3 4 5)))
(= (g) 2)
(= (g) 6)
(= (g) 10)

Note that because of the expressive power of call/cc (and continuations in general), generators can be written in Racket as a normal library.

Haskell, Idris, Clean, etc.

In Haskell (and in similar languages such as Idris, Clean, etc.), all functions are lazy unless specially annotated. Consequently, Haskell does not need a special yield operator. Any function can be a generator by recursively building a list of elements that will be lazily returned one at a time. E.g.:

oddDup :: (Integral x) => [x] -> [x]
oddDup [] = []
oddDup (x:xs)
  | odd x = x * 2 : oddDup xs
  | otherwise = oddDup xs

main :: IO ()
main = putStrLn $ show $ take 5 $ oddDup [1..20]

Unresolved questions

Keyword

Should we use iter as the keyword, as we're producing Iterators? We could use gen as proposed in this RFC and later extend its abilities to more powerful generators.

playground

#![feature(generators)]
#![feature(iter_from_generator)]

fn main() {
    let mut it = std::iter::from_generator(|| {
        yield 1
    });

    assert_eq!(it.next(), Some(1));
    assert_eq!(it.next(), None);
    it.next(); // panics
}

Contextual keyword

Popular crates (like rand) have methods called gen. If we forbid those, we are forcing those crates to make a major version bump when they update their edition, and we are requiring any users of those crates to use r#gen instead of gen when calling that method.

We could choose to use a contextual keyword and only forbid gen in:

  • bindings
  • field names (due to destructuring bindings)
  • enum variants
  • type names

This should avoid any parsing issues around gen followed by { in expressions.

Iterator::size_hint

Should we try to compute a conservative size_hint? This will reveal information from the body of a generator, but at least for simple cases users will likely expect size_hint to not just be the default. It is backwards compatible to later add support for opportunistically implementing size_hint.

Implement other Iterator traits.

Is there a possibility for implementing traits like DoubleEndedIterator, ExactSizeIterator at all?

Future possibilities

yield from (forwarding operation)

Python has the ability to yield from an iterator. Effectively this is syntax sugar for looping over all elements of the iterator and yielding them individually. There are infinite options to choose from if we want such a feature, so I'm listing general ideas:

Do nothing, just use loops

for x in iter {
    yield x
}

Language support

We could do something like postfix yield:

iter.yield

Or we could use an entirely new keyword.

stdlib macro

We could add a macro to the standard library and prelude. The macro would expand to a for loop + yield.

yield_all!(iter)

Complete Generator support

We already have a Generator trait on nightly that is more powerful than the Iterator API could possibly be:

  1. It uses Pin<&mut Self>, allowing self-references in the generator across yield points.
  2. It has arguments (yield returns the arguments passed to it in the subsequent invocations).

Similar to the ideas around async closures, I think we could argue for Generators to be gen closures while gen blocks are a simpler concept that has no arguments and only captures variables.

Either way, support for full Generators should be discussed and implemented separately, as there are many more open questions around them beyond a simpler way to write Iterators.

async interactions

We could support using await in gen async blocks, similar to how we support ? being used within gen blocks. We'd have similar limitations holding references held across await points as we do have with yield points. The solution space for gen async is large enough that I will not explore it here. This RFC's design is forward compatible with anything we decide on.

At present it is only possible to have a gen block yield futures, but not await within it, similar to how you cannot write iterators that await, but that return futures from next.

Self-referential gen blocks

We can allow gen blocks to hold borrows across yield points in the future.

There are a few options forward (though this list is probably not complete):

  • Add a separate trait for pinned iteration that is also usable with gen and for.
    • Downside: We would have very similar traits for the same thing.
  • Backward-compatibly add a way to change the argument type of Iterator::next.
    • Downside: It's unclear whether this is possible.
  • Implement Iterator for Pin<&mut G> instead of for G directly (whatever G is here, but it could be a gen block).
    • Downside: The thing being iterated over must now be pinned for the entire iteration, instead of for each invocation of next.

This RFC is forward compatible with any such designs, so I will not explore it here.

try interactions

We could allow gen try fn foo() -> i32 to mean something akin to gen fn foo() -> Result<i32, E>. Whatever we do here, it should mirror whatever try fn means in the future.

gen fn:

This does not introduce gen fn. The syntax design for them is fairly large and there are open questions around the difference between returning or yielding a type.

fn foo(args) yield item
fn foo(args) yields item
fn foo(args) => item
fn* foo(args) -> item // or any of the `fn foo` variants for the item type
gen fn foo(args) // or any of the above variants for the item type
gen foo(args) // or any of the above variants for the item type
generator fn foo(args) // or any of the above variants for the item type