book/nostarch/chapter13.md

1253 lines
48 KiB
Markdown
Raw Normal View History

2022-04-25 00:14:02 +00:00
<!-- DO NOT EDIT THIS FILE.
This file is periodically generated from the content in the `/src/`
directory, so all fixes need to be made in `/src/`.
-->
[TOC]
# Functional Language Features: Iterators and Closures
Rusts design has taken inspiration from many existing languages and
techniques, and one significant influence is *functional programming*.
Programming in a functional style often includes using functions as values by
passing them in arguments, returning them from other functions, assigning them
to variables for later execution, and so forth.
In this chapter, we wont debate the issue of what functional programming is or
isnt but will instead discuss some features of Rust that are similar to
features in many languages often referred to as functional.
More specifically, well cover:
* *Closures*, a function-like construct you can store in a variable
* *Iterators*, a way of processing a series of elements
2022-06-09 01:10:54 +00:00
* How to use closures and iterators to improve the I/O project in Chapter 12
* The performance of these closures and iterators (Spoiler alert: theyre
faster than you might think!)
2022-04-25 00:14:02 +00:00
2022-06-09 01:10:54 +00:00
We've already covered some other Rust features, such as pattern matching and
enums, that are also influenced by the functional style as well, but since
mastering closures and iterators is an important part of writing idiomatic,
fast Rust code, well devote this entire chapter to them.
2022-04-25 00:14:02 +00:00
2022-06-09 01:10:54 +00:00
## Closures: Anonymous Functions that Capture Their Environment
2022-04-25 00:14:02 +00:00
Rusts closures are anonymous functions you can save in a variable or pass as
arguments to other functions. You can create the closure in one place and then
2022-06-09 01:10:54 +00:00
call the closure elsewhere to evaluate it in a different context. Unlike
functions, closures can capture values from the scope in which theyre defined.
Well demonstrate how these closure features allow for code reuse and behavior
2022-04-25 00:14:02 +00:00
customization.
### Capturing the Environment with Closures
2022-06-09 01:10:54 +00:00
We'll first examine how we can use closures to capture values from the
environment theyre defined in for later use. Heres the scenario: Every so
often, our t-shirt company gives away a free shirt to someone on our mailing
list. People on the mailing list can optionally add their favorite color to
their profile. If the person chosen for a free shirt has their favorite color
set, they get that color shirt. If the person hasnt specified a favorite
color, they get whatever color the company currently has the most of.
2022-04-25 00:14:02 +00:00
There are many ways to implement this. For this example, were going to use an
2022-06-09 01:10:54 +00:00
enum called `ShirtColor` that has the variants `Red` and `Blue`.
<!-- are we saying the company only offers shirts in red or blue, or are we just
starting with these two colors? Likely not important for the code, but good to
clarify for the narrative! /LC -->
We represent the companys inventory with an `Inventory` struct that has a
field named `shirts` that contains a `Vec<ShirtColor>` representing the shirt
colors currently in stock. The method `shirt_giveaway` defined on `Inventory`
gets the optional shirt color preference of the free shirt winner, and returns
the shirt color the person will get. This setup is shown in Listing 13-1:
2022-04-25 00:14:02 +00:00
Filename: src/main.rs
```
#[derive(Debug, PartialEq, Copy, Clone)]
enum ShirtColor {
Red,
Blue,
}
struct Inventory {
shirts: Vec<ShirtColor>,
}
impl Inventory {
fn giveaway(&self, user_preference: Option<ShirtColor>) -> ShirtColor {
user_preference.unwrap_or_else(|| self.most_stocked())
}
fn most_stocked(&self) -> ShirtColor {
let mut num_red = 0;
let mut num_blue = 0;
for color in &self.shirts {
match color {
ShirtColor::Red => num_red += 1,
ShirtColor::Blue => num_blue += 1,
}
}
if num_red > num_blue {
ShirtColor::Red
} else {
ShirtColor::Blue
}
}
}
fn main() {
let store = Inventory {
shirts: vec![ShirtColor::Blue, ShirtColor::Red, ShirtColor::Blue],
};
let user_pref1 = Some(ShirtColor::Red);
let giveaway1 = store.giveaway(user_pref1);
println!(
"The user with preference {:?} gets {:?}",
user_pref1, giveaway1
);
let user_pref2 = None;
let giveaway2 = store.giveaway(user_pref2);
println!(
"The user with preference {:?} gets {:?}",
user_pref2, giveaway2
);
}
```
2022-06-09 01:10:54 +00:00
Listing 13-1: Framework of the shirt company giveaway situation
2022-04-25 00:14:02 +00:00
The `store` defined in `main` has two blue shirts and one red shirt in stock.
2022-06-09 01:10:54 +00:00
We then call the `giveaway` method for a user with a preference for a red
2022-04-25 00:14:02 +00:00
shirt and a user without any preference. Running this code prints:
2022-06-09 01:10:54 +00:00
<!-- Again... I know this is just a toy example, but it seems jarring for a
tshirt company to only have three shirts in stock. I think it's fine if we add
a line earlier that says something like "for the sake of simplicity, we'll deal
with just two shirt colors, and a small volume of stock" /LC -->
2022-04-25 00:14:02 +00:00
```
$ cargo run
Compiling shirt-company v0.1.0 (file:///projects/shirt-company)
Finished dev [unoptimized + debuginfo] target(s) in 0.27s
Running `target/debug/shirt-company`
The user with preference Some(Red) gets Red
The user with preference None gets Blue
```
2022-06-09 01:10:54 +00:00
Again, this code could be implemented in many ways, and here, to focus on
closures, we've stuck to concepts youve already learned except for the body of
the `giveaway` method that uses a closure. Let's examine that closure. The
`giveaway` method takes the user preference `Option<ShirtColor>` and calls
`unwrap_or_else` on it. The `unwrap_or_else` method on `Option<T>` is defined
by the standard library. It takes one argument: a closure without any arguments
that returns a value `T` (the same type stored in the `Some` variant of the
`Option<T>`, in this case `ShirtColor`). If the `Option<T>` is the `Some`
variant, `unwrap_or_else` returns the value from within the `Some`. If the
`Option<T>` is the `None` variant, `unwrap_or_else` calls the closure and
returns the value returned by the closure.
<!-- can you show us the code that here counts as they closure? is it, for
example, this whole section: (&self, user_preference: Option<ShirtColor>) ->
ShirtColor { user_preference.unwrap_or_else(|| self.most_stocked()) ? And what
indicates to Rust that it's a closure? Or do we not need to indicate that, Rust
doesn't care? I'm thinking about the earlier closure definition "You can create
the closure in one place and then call the closure elsewhere to evaluate it in
a different context" -- are we using this aspect of the closure here? Can you
highligt that in the text? /LC -->
One interesting aspect here is that weve passed a closure that calls
2022-04-25 00:14:02 +00:00
`self.most_stocked()` on the current `Inventory` instance. The standard library
didnt need to know anything about the `Inventory` or `ShirtColor` types we
2022-06-09 01:10:54 +00:00
defined, or the logic we want to use in this scenario. The closure captures an
immutable reference to the `self` `Inventory` instance and passes it with the
code we specify to the `unwrap_or_else` method. Functions, on the other hand,
are not able to capture their environment in this way.
2022-04-25 00:14:02 +00:00
### Closure Type Inference and Annotation
There are more differences between functions and closures. Closures dont
usually require you to annotate the types of the parameters or the return value
like `fn` functions do. Type annotations are required on functions because
2022-06-09 01:10:54 +00:00
functions are part of an explicit interface exposed to your users. Defining this
<!-- functions are part of the explicit interface, or type annotations are? /LC -->
2022-04-25 00:14:02 +00:00
interface rigidly is important for ensuring that everyone agrees on what types
2022-06-09 01:10:54 +00:00
of values a function uses and returns. Closures, on the other hand, arent used
in an exposed interface like this: theyre stored in variables and used without
naming them and exposing them to users of our library.
2022-04-25 00:14:02 +00:00
Closures are typically short and relevant only within a narrow context rather
than in any arbitrary scenario. Within these limited contexts, the compiler can
infer the types of the parameters and the return type, similar to how its able
to infer the types of most variables (there are rare cases where the compiler
needs closure type annotations too).
As with variables, we can add type annotations if we want to increase
explicitness and clarity at the cost of being more verbose than is strictly
necessary. Annotating the types for a closure would look like the definition
2022-06-09 01:10:54 +00:00
shown in Listing 13-x.
2022-04-25 00:14:02 +00:00
Filename: src/main.rs
```
let expensive_closure = |num: u32| -> u32 {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(2));
num
};
```
2022-06-09 01:10:54 +00:00
Listing 13-x: Adding optional type annotations of the parameter and return
2022-04-25 00:14:02 +00:00
value types in the closure
2022-06-09 01:10:54 +00:00
<!-- Interestng, so is this another way to define a closure: with the let
keywork like a variable? Earlier we defined in (I think!) in the function
definition: fn giveaway(&self,... Is it worth pointing out different ways they
can be defined, or should that be obvious to the reader? /LC -->
2022-04-25 00:14:02 +00:00
With type annotations added, the syntax of closures looks more similar to the
2022-06-09 01:10:54 +00:00
syntax of functions. Here we define a function that adds 1 to its parameter and
a closure that has the same behavior, for comparison. Weve added some spaces
to line up the relevant parts. This illustrates how closure syntax is similar
to function syntax except for the use of pipes and the amount of syntax that is
optional:
2022-04-25 00:14:02 +00:00
```
fn add_one_v1 (x: u32) -> u32 { x + 1 }
let add_one_v2 = |x: u32| -> u32 { x + 1 };
let add_one_v3 = |x| { x + 1 };
let add_one_v4 = |x| x + 1 ;
```
The first line shows a function definition, and the second line shows a fully
2022-06-09 01:10:54 +00:00
annotated closure definition. In the third line we remove the type annotations
from the closure definition, and in the fourth line we remove the brackets,
which are optional because the closure body has only one expression. These are
all valid definitions that will produce the same behavior when theyre called.
Calling the closures is required for `add_one_v3` and `add_one_v4` to be able
to compile because the types will be inferred from their usage.
<!-- I wasn't clear what was meant by "Calling the closures is required for
`add_one_v3` and `add_one_v4`..." -- I thought these were closures? /LC -->
For closure definitions, the compiler will infer one concrete type for each of
their parameters and for their return value. For instance, Listing 13-x shows
the definition of a short closure that just returns the value it receives as a
2022-04-25 00:14:02 +00:00
parameter. This closure isnt very useful except for the purposes of this
example. Note that we havent added any type annotations to the definition: if
2022-06-09 01:10:54 +00:00
<!-- if we did add type annotations, you mean? Or because we haven't? /LC -->
2022-04-25 00:14:02 +00:00
we then try to call the closure twice, using a `String` as an argument the
first time and a `u32` the second time, well get an error.
Filename: src/main.rs
```
let example_closure = |x| x;
let s = example_closure(String::from("hello"));
let n = example_closure(5);
```
2022-06-09 01:10:54 +00:00
Listing 13-x: Attempting to call a closure whose types are inferred with two
2022-04-27 01:31:24 +00:00
different types
2022-04-25 00:14:02 +00:00
The compiler gives us this error:
```
error[E0308]: mismatched types
--> src/main.rs:5:29
|
5 | let n = example_closure(5);
| ^- help: try using a conversion method: `.to_string()`
| |
| expected struct `String`, found integer
```
The first time we call `example_closure` with the `String` value, the compiler
infers the type of `x` and the return type of the closure to be `String`. Those
types are then locked into the closure in `example_closure`, and we get a type
2022-06-09 01:10:54 +00:00
error when we next try to use a different type with the same closure.
2022-04-25 00:14:02 +00:00
### Capturing References or Moving Ownership
Closures can capture values from their environment in three ways, which
directly map to the three ways a function can take a parameter: borrowing
immutably, borrowing mutably, and taking ownership. The closure will decide
which of these to use based on what the body of the function does with the
captured values.
2022-06-09 01:10:54 +00:00
In Listing 13-x we define a closure that captures an immutable borrow to the
vector named `list` because it only needs an immutable borrow to print the
value. This example also illustrates that a variable can bind to a closure
definition, and we can later call the closure by using the variable name and
parentheses as if the variable name were a function name:
<!-- That's cool. I'm changing to the active voice here, but I wanted to make
sure I'm not changing meaning: it is us calling the closure later
intentionally, right? It's not happening automatically behind the scenes? /LC
-->
2022-04-25 00:14:02 +00:00
Filename: src/main.rs
```
fn main() {
let list = vec![1, 2, 3];
println!("Before defining closure: {:?}", list);
let only_borrows = || println!("From closure: {:?}", list);
println!("Before calling closure: {:?}", list);
only_borrows();
println!("After calling closure: {:?}", list);
}
```
2022-06-09 01:10:54 +00:00
Listing 13-x: Defining and calling a closure that captures an immutable borrow
2022-04-25 00:14:02 +00:00
2022-06-09 01:10:54 +00:00
Because we can have multiple immutable borrows of `list` at the same time,
`list` is still accessible from the code before the closure definition, after
2022-04-25 00:14:02 +00:00
the closure definition but before the closure is called, and after the closure
2022-06-09 01:10:54 +00:00
is called. This code compiles, runs, and prints:
2022-04-25 00:14:02 +00:00
```
Before defining closure: [1, 2, 3]
Before calling closure: [1, 2, 3]
From closure: [1, 2, 3]
After calling closure: [1, 2, 3]
```
2022-06-09 01:10:54 +00:00
Next, in Listing 13-x we change the closure body so that it adds an element to
the `list` vector, which means we also need to change the closure definition to
need a mutable borrow:
2022-04-25 00:14:02 +00:00
Filename: src/main.rs
```
fn main() {
let mut list = vec![1, 2, 3];
println!("Before defining closure: {:?}", list);
let mut borrows_mutably = || list.push(7);
borrows_mutably();
println!("After calling closure: {:?}", list);
}
```
2022-06-09 01:10:54 +00:00
Listing 13-x: Defining and calling a closure that captures a mutable borrow
2022-04-25 00:14:02 +00:00
This code compiles, runs, and prints:
```
Before defining closure: [1, 2, 3]
After calling closure: [1, 2, 3, 7]
```
Note that theres no longer a `println!` between the definition and the call of
the `borrows_mutably` closure: when `borrows_mutably` is defined, it captures a
2022-06-09 01:10:54 +00:00
mutable reference to `list`. We don't use the closure again after the closure
is called, so the mutable borrow ends. Between the closure definition and the
closure call, an immutable borrow to print isnt allowed because no other
borrows are allowed when theres a mutable borrow. Try adding a `println!`
there to see what error message you get!
2022-04-25 00:14:02 +00:00
If you want to force the closure to take ownership of the values it uses in the
environment even though the body of the closure doesnt strictly need
ownership, you can use the `move` keyword before the parameter list. This
technique is mostly useful when passing a closure to a new thread to move the
2022-06-09 01:10:54 +00:00
data so that its owned by the new thread. Well have more examples of `move`
2022-04-25 00:14:02 +00:00
closures in Chapter 16 when we talk about concurrency.
2022-06-09 01:10:54 +00:00
### Moving Captured Values Out of Closures and the `Fn` Traits
Once a closure has captured a reference or we've moved a value into the
closure, the code in the body of the function affects what happens to the
references or
<!-- which function does this refer to -- is a closure always tied to a
function? I'm thinking of the let closure created earlier. Is "function" here
just a way to refer to the functionality of the closure? I'm wary of mixing the
two terms /LC -->
values as a result of calling the function.
<!-- do we mean "the references and values that are a result of calling the
function"? This line confused me a little. Surely it's self-evident that the
code in the function body affects the value or reference it's called on; I
think I'm missing something! /LC -->
A closure body can do any of the following: move a captured value out of the
closure, mutate the captured value, neither move nor mutate the value, or
capture nothing from the environment to begin with.
The way a closure captures and handles values from the environment affects
which traits
<!-- so the closure will automatically implement the traits depending on how we
set it to handle the values? /LC -->
the closure implements, and traits are how functions and structs can specify
what kinds of closures they can use. Closures will automatically implement one,
two, or all three of these `Fn` traits, in an additive fashion:
2022-04-25 00:14:02 +00:00
1. `FnOnce` applies to closures that can be called at least once. All closures
2022-06-09 01:10:54 +00:00
implement at least this trait, because all closures can be called. A closure
that moves captured values out of its body will only implement `FnOnce`
and none of the other `Fn` traits, because it can only be called once.
2022-04-25 00:14:02 +00:00
2. `FnMut` applies to closures that dont move captured values out of their
body, but that might mutate the captured values. These closures can be
called more than once.
3. `Fn` applies to closures that dont move captured values out of their body
and that dont mutate captured values. These closures can be called more
than once without mutating their environment, which is important in cases
such as calling a closure multiple times concurrently. Closures that dont
capture anything from their environment implement `Fn`.
2022-06-09 01:10:54 +00:00
<!-- so there isn't a trait for the first action listed above: moving a
capture value out of the closure? /LC -->
2022-04-25 00:14:02 +00:00
Lets look at the definition of the `unwrap_or_else` method on `Option<T>` that
2022-06-09 01:10:54 +00:00
we used in Listing 13-x:
2022-04-25 00:14:02 +00:00
```
impl<T> Option<T> {
pub fn unwrap_or_else<F>(self, f: F) -> T
where
F: FnOnce() -> T
{
match self {
Some(x) => x,
None => f(),
}
}
}
```
Recall that `T` is the generic type representing the type of the value in the
`Some` variant of an `Option`. That type `T` is also the return type of the
`unwrap_or_else` function: code that calls `unwrap_or_else` on an
`Option<String>`, for example, will get a `String`.
2022-06-09 01:10:54 +00:00
Next, notice that the `unwrap_or_else` function has the additional generic type
parameter `F`. The `F` type is the type of the parameter named `f`, which is
2022-04-25 00:14:02 +00:00
the closure we provide when calling `unwrap_or_else`.
The trait bound specified on the generic type `F` is `FnOnce() -> T`, which
means `F` must be able to be called at least once, take no arguments, and
return a `T`. Using `FnOnce` in the trait bound expresses the constraint that
`unwrap_or_else` is only going to call `f` at most one time. In the body of
`unwrap_or_else`, we can see that if the `Option` is `Some`, `f` wont be
called. If the `Option` is `None`, `f` will be called once. Because all
closures implement `FnOnce`, `unwrap_or_else` accepts the most different kinds
of closures and is as flexible as it can be.
> Note: Functions can implement all three of the `Fn` traits too. If what we
> want to do doesnt require capturing a value from the environment, we can use
> the name of a function rather than a closure where we need something that
> implements one of the `Fn` traits. For example, on an `Option<Vec<T>>` value,
> we could call `unwrap_or_else(Vec::new)` to get a new, empty vector if the
> value is `None`.
Now lets look at the standard library method `sort_by_key` defined on slices,
2022-06-09 01:10:54 +00:00
to see how that differs.
<!-- can you tell us what about this method makes it a good comparison: what's
the difference we're focusing on? /LC -->
It takes a closure that implements `FnMut`. The
2022-04-25 00:14:02 +00:00
closure gets one argument, a reference to the current item in the slice being
considered, and returns a value of type `K` that can be ordered. This function
is useful when you want to sort a slice by a particular attribute of each item.
2022-06-09 01:10:54 +00:00
In Listing 13-x, we have a list of `Rectangle` instances and we use
2022-04-25 00:14:02 +00:00
`sort_by_key` to order them by their `width` attribute from low to high:
```
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
fn main() {
let mut list = [
2022-06-09 01:10:54 +00:00
Rectangle { width: 10, height: 1 },
Rectangle { width: 3, height: 5 },
Rectangle { width: 7, height: 12 },
2022-04-25 00:14:02 +00:00
];
list.sort_by_key(|r| r.width);
println!("{:#?}", list);
}
```
2022-06-09 01:10:54 +00:00
Listing 13-x: Using `sort_by_key` to order rectangles by width
2022-04-27 01:31:24 +00:00
2022-04-25 00:14:02 +00:00
This code prints:
```
[
Rectangle {
width: 3,
height: 5,
},
Rectangle {
width: 7,
height: 12,
},
Rectangle {
width: 10,
height: 1,
},
]
```
The reason `sort_by_key` is defined to take an `FnMut` closure is that it calls
the closure multiple times: once for each item in the slice. The closure `|r|
r.width` doesnt capture, mutate, or move out anything from its environment, so
it meets the trait bound requirements.
2022-06-09 01:10:54 +00:00
In contrast, heres an example of a closure that implements just the `FnOnce`
trait, because it moves a value out of the environment. The compiler wont let
us use this closure with `sort_by_key`:
2022-04-25 00:14:02 +00:00
```
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
fn main() {
let mut list = [
2022-06-09 01:10:54 +00:00
Rectangle { width: 10, height: 1 },
Rectangle { width: 3, height: 5 },
Rectangle { width: 7, height: 12 },
2022-04-25 00:14:02 +00:00
];
let mut sort_operations = vec![];
let value = String::from("by key called");
list.sort_by_key(|r| {
sort_operations.push(value);
r.width
});
println!("{:#?}", list);
}
```
This is a contrived, convoluted way (that doesnt work) to try and count the
number of times `sort_by_key` gets called when sorting `list`. This code
2022-06-09 01:10:54 +00:00
attempts to do this counting by pushing `value`--a `String` from the closures
environment--into the `sort_operations` vector. The closure captures `value`
2022-04-25 00:14:02 +00:00
then moves `value` out of the closure by transferring ownership of `value` to
the `sort_operations` vector. This closure can be called once; trying to call
it a second time wouldnt work because `value` would no longer be in the
environment to be pushed into `sort_operations` again! Therefore, this closure
only implements `FnOnce`. When we try to compile this code, we get this error
that `value` cant be moved out of the closure because the closure must
implement `FnMut`:
```
error[E0507]: cannot move out of `value`, a captured variable in an `FnMut` closure
2022-06-09 01:10:54 +00:00
--> src/main.rs:18:30
2022-04-25 00:14:02 +00:00
|
2022-06-09 01:10:54 +00:00
15 | let value = String::from("by key called");
2022-04-25 00:14:02 +00:00
| ----- captured outer variable
2022-06-09 01:10:54 +00:00
16 |
17 | list.sort_by_key(|r| {
2022-04-25 00:14:02 +00:00
| ______________________-
2022-06-09 01:10:54 +00:00
18 | | sort_operations.push(value);
2022-04-25 00:14:02 +00:00
| | ^^^^^ move occurs because `value` has type `String`, which does not implement the `Copy` trait
2022-06-09 01:10:54 +00:00
19 | | r.width
20 | | });
2022-04-25 00:14:02 +00:00
| |_____- captured by this `FnMut` closure
```
The error points to the line in the closure body that moves `value` out of the
environment. To fix this, we need to change the closure body so that it doesnt
move values out of the environment. If were interested in the number of times
`sort_by_key` is called, keeping a counter in the environment and incrementing
its value in the closure body is a more straightforward way to calculate that.
2022-06-09 01:10:54 +00:00
<!-- are we interested in this? Are we saying this is basically what we're
doing, but we're going about it in a roundabout way? /LC -->
This closure works with `sort_by_key` because it is only capturing a mutable
reference to the `num_sort_operations` counter and can therefore be called more
than once:
2022-04-25 00:14:02 +00:00
```
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
fn main() {
let mut list = [
2022-06-09 01:10:54 +00:00
Rectangle { width: 10, height: 1 },
Rectangle { width: 3, height: 5 },
Rectangle { width: 7, height: 12 },
2022-04-25 00:14:02 +00:00
];
let mut num_sort_operations = 0;
list.sort_by_key(|r| {
num_sort_operations += 1;
r.width
});
println!("{:#?}, sorted in {num_sort_operations} operations", list);
}
```
The `Fn` traits are important when defining or using functions or types that
2022-06-09 01:10:54 +00:00
make use of closures. In the next section we'll discuss iterators. Many iterator
methods take closure arguments, so keep these closure details in mind as we
continue!
2022-04-25 00:14:02 +00:00
## Processing a Series of Items with Iterators
The iterator pattern allows you to perform some task on a sequence of items in
turn. An iterator is responsible for the logic of iterating over each item and
determining when the sequence has finished. When you use iterators, you dont
have to reimplement that logic yourself.
In Rust, iterators are *lazy*, meaning they have no effect until you call
methods that consume the iterator to use it up. For example, the code in
2022-06-09 01:10:54 +00:00
Listing 13-13 creates an iterator over the items in the vector `v1` by calling
2022-04-25 00:14:02 +00:00
the `iter` method defined on `Vec<T>`. This code by itself doesnt do anything
useful.
```
let v1 = vec![1, 2, 3];
let v1_iter = v1.iter();
```
2022-06-09 01:10:54 +00:00
Listing 13-13: Creating an iterator
2022-04-25 00:14:02 +00:00
2022-06-09 01:10:54 +00:00
The iterator is stored in the `v1_iter` variable. Once weve created an
iterator, we can use it in a variety of ways. In Listing 3-5 in Chapter 3, we
iterated over an array using a `for` loop to execute some code on each of its
items. Under the hood this implicitly created and then consumed an iterator,
but we glossed over how exactly that works until now.
2022-04-25 00:14:02 +00:00
2022-06-09 01:10:54 +00:00
In the example in Listing 13-14 we separate the creation of the iterator from
the use of the iterator in the `for` loop. When the `for` loop is called using
the iterator in `v1_iter`, each element in the iterator is used in one
iteration of the loop, which prints out each value.
2022-04-25 00:14:02 +00:00
```
let v1 = vec![1, 2, 3];
let v1_iter = v1.iter();
for val in v1_iter {
println!("Got: {}", val);
}
```
2022-06-09 01:10:54 +00:00
Listing 13-14: Using an iterator in a `for` loop
2022-04-25 00:14:02 +00:00
In languages that dont have iterators provided by their standard libraries,
you would likely write this same functionality by starting a variable at index
0, using that variable to index into the vector to get a value, and
incrementing the variable value in a loop until it reached the total number of
items in the vector.
Iterators handle all that logic for you, cutting down on repetitive code you
could potentially mess up. Iterators give you more flexibility to use the same
logic with many different kinds of sequences, not just data structures you can
index into, like vectors. Lets examine how iterators do that.
### The `Iterator` Trait and the `next` Method
All iterators implement a trait named `Iterator` that is defined in the
standard library. The definition of the trait looks like this:
```
pub trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
// methods with default implementations elided
}
```
Notice this definition uses some new syntax: `type Item` and `Self::Item`,
which are defining an *associated type* with this trait. Well talk about
associated types in depth in Chapter 19. For now, all you need to know is that
this code says implementing the `Iterator` trait requires that you also define
an `Item` type, and this `Item` type is used in the return type of the `next`
method. In other words, the `Item` type will be the type returned from the
iterator.
The `Iterator` trait only requires implementors to define one method: the
`next` method, which returns one item of the iterator at a time wrapped in
`Some` and, when iteration is over, returns `None`.
2022-06-09 01:10:54 +00:00
We can call the `next` method on iterators directly; Listing 13-15 demonstrates
2022-04-25 00:14:02 +00:00
what values are returned from repeated calls to `next` on the iterator created
from the vector.
Filename: src/lib.rs
```
#[test]
fn iterator_demonstration() {
let v1 = vec![1, 2, 3];
let mut v1_iter = v1.iter();
assert_eq!(v1_iter.next(), Some(&1));
assert_eq!(v1_iter.next(), Some(&2));
assert_eq!(v1_iter.next(), Some(&3));
assert_eq!(v1_iter.next(), None);
}
```
2022-06-09 01:10:54 +00:00
Listing 13-15: Calling the `next` method on an iterator
2022-04-25 00:14:02 +00:00
Note that we needed to make `v1_iter` mutable: calling the `next` method on an
iterator changes internal state that the iterator uses to keep track of where
it is in the sequence. In other words, this code *consumes*, or uses up, the
iterator. Each call to `next` eats up an item from the iterator. We didnt need
to make `v1_iter` mutable when we used a `for` loop because the loop took
ownership of `v1_iter` and made it mutable behind the scenes.
Also note that the values we get from the calls to `next` are immutable
references to the values in the vector. The `iter` method produces an iterator
over immutable references. If we want to create an iterator that takes
ownership of `v1` and returns owned values, we can call `into_iter` instead of
`iter`. Similarly, if we want to iterate over mutable references, we can call
`iter_mut` instead of `iter`.
### Methods that Consume the Iterator
The `Iterator` trait has a number of different methods with default
implementations provided by the standard library; you can find out about these
methods by looking in the standard library API documentation for the `Iterator`
trait. Some of these methods call the `next` method in their definition, which
is why youre required to implement the `next` method when implementing the
`Iterator` trait.
Methods that call `next` are called *consuming adaptors*, because calling them
uses up the iterator. One example is the `sum` method, which takes ownership of
the iterator and iterates through the items by repeatedly calling `next`, thus
consuming the iterator. As it iterates through, it adds each item to a running
2022-06-09 01:10:54 +00:00
total and returns the total when iteration is complete. Listing 13-16 has a
2022-04-25 00:14:02 +00:00
test illustrating a use of the `sum` method:
Filename: src/lib.rs
```
#[test]
fn iterator_sum() {
let v1 = vec![1, 2, 3];
let v1_iter = v1.iter();
let total: i32 = v1_iter.sum();
assert_eq!(total, 6);
}
```
2022-06-09 01:10:54 +00:00
Listing 13-16: Calling the `sum` method to get the total of all items in the
2022-04-27 01:31:24 +00:00
iterator
2022-04-25 00:14:02 +00:00
We arent allowed to use `v1_iter` after the call to `sum` because `sum` takes
ownership of the iterator we call it on.
### Methods that Produce Other Iterators
Other methods defined on the `Iterator` trait, known as *iterator adaptors*,
2022-06-09 01:10:54 +00:00
<!-- are all methods defined on Iterator know as iterator adaptors? If so,
should that term be introduced in the previous section? /LC -->
allow you to change iterators into different kinds of iterators.
<!-- is there a quick example of a different kind of iterator you could give? I
wasn't aware there were different kinds, unless you just mean change them into
iterators that act on something else? /LC -->
You can chain multiple calls to iterator adaptors to perform complex actions in
a readable way. But because all iterators are lazy, you have to call one of the
consuming adaptor methods to get results from calls to iterator adaptors.
Listing 13-17 shows an example of calling the iterator adaptor method `map`,
2022-04-25 00:14:02 +00:00
which takes a closure to call on each item to produce a new iterator. The
closure here creates a new iterator in which each item from the vector has been
incremented by 1. However, this code produces a warning:
Filename: src/main.rs
```
let v1: Vec<i32> = vec![1, 2, 3];
v1.iter().map(|x| x + 1);
```
2022-06-09 01:10:54 +00:00
Listing 13-17: Calling the iterator adaptor `map` to
create a new iterator
2022-04-25 00:14:02 +00:00
The warning we get is this:
```
warning: unused `Map` that must be used
--> src/main.rs:4:5
|
4 | v1.iter().map(|x| x + 1);
| ^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_must_use)]` on by default
= note: iterators are lazy and do nothing unless consumed
```
2022-06-09 01:10:54 +00:00
The code in Listing 13-17 doesnt do anything; the closure weve specified
2022-04-25 00:14:02 +00:00
never gets called. The warning reminds us why: iterator adaptors are lazy, and
we need to consume the iterator here.
To fix this and consume the iterator, well use the `collect` method, which we
used in Chapter 12 with `env::args` in Listing 12-1. This method consumes the
iterator and collects the resulting values into a collection data type.
2022-06-09 01:10:54 +00:00
In Listing 13-18, we collect the results of iterating over the iterator thats
2022-04-25 00:14:02 +00:00
returned from the call to `map` into a vector. This vector will end up
containing each item from the original vector incremented by 1.
Filename: src/main.rs
```
let v1: Vec<i32> = vec![1, 2, 3];
let v2: Vec<_> = v1.iter().map(|x| x + 1).collect();
assert_eq!(v2, vec![2, 3, 4]);
```
2022-06-09 01:10:54 +00:00
Listing 13-18: Calling the `map` method to create a new iterator and then
2022-04-25 00:14:02 +00:00
calling the `collect` method to consume the new iterator and create a vector
Because `map` takes a closure, we can specify any operation we want to perform
on each item. This is a great example of how closures let you customize some
behavior while reusing the iteration behavior that the `Iterator` trait
provides.
### Using Closures that Capture Their Environment
Now that weve introduced iterators, we can demonstrate a common use of
2022-06-09 01:10:54 +00:00
closures that capture their environment
<!-- are we saying we use filter to demonstrate some common use, or that using
filter is the common use? If the former, can you specify what the common usage
is? /LC -->
by using the `filter` iterator adaptor.
2022-04-25 00:14:02 +00:00
The `filter` method on an iterator takes a closure that takes each item from
the iterator and returns a Boolean. If the closure returns `true`, the value
will be included in the iterator produced by `filter`. If the closure returns
`false`, the value wont be included in the resulting iterator.
2022-06-09 01:10:54 +00:00
In Listing 13-19, we use `filter` with a closure that captures the `shoe_size`
2022-04-25 00:14:02 +00:00
variable from its environment to iterate over a collection of `Shoe` struct
instances. It will return only shoes that are the specified size.
Filename: src/lib.rs
```
#[derive(PartialEq, Debug)]
struct Shoe {
size: u32,
style: String,
}
fn shoes_in_size(shoes: Vec<Shoe>, shoe_size: u32) -> Vec<Shoe> {
shoes.into_iter().filter(|s| s.size == shoe_size).collect()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn filters_by_size() {
let shoes = vec![
Shoe {
size: 10,
style: String::from("sneaker"),
},
Shoe {
size: 13,
style: String::from("sandal"),
},
Shoe {
size: 10,
style: String::from("boot"),
},
];
let in_my_size = shoes_in_size(shoes, 10);
assert_eq!(
in_my_size,
vec![
Shoe {
size: 10,
style: String::from("sneaker")
},
Shoe {
size: 10,
style: String::from("boot")
},
]
);
}
}
```
2022-06-09 01:10:54 +00:00
Listing 13-19: Using the `filter` method with a closure that captures
2022-04-25 00:14:02 +00:00
`shoe_size`
The `shoes_in_size` function takes ownership of a vector of shoes and a shoe
size as parameters. It returns a vector containing only shoes of the specified
size.
In the body of `shoes_in_size`, we call `into_iter` to create an iterator
that takes ownership of the vector. Then we call `filter` to adapt that
iterator into a new iterator that only contains elements for which the closure
returns `true`.
The closure captures the `shoe_size` parameter from the environment and
compares the value with each shoes size, keeping only shoes of the size
specified. Finally, calling `collect` gathers the values returned by the
adapted iterator into a vector thats returned by the function.
The test shows that when we call `shoes_in_size`, we get back only shoes
that have the same size as the value we specified.
## Improving Our I/O Project
With this new knowledge about iterators, we can improve the I/O project in
Chapter 12 by using iterators to make places in the code clearer and more
concise. Lets look at how iterators can improve our implementation of the
`Config::new` function and the `search` function.
### Removing a `clone` Using an Iterator
In Listing 12-6, we added code that took a slice of `String` values and created
an instance of the `Config` struct by indexing into the slice and cloning the
2022-06-09 01:10:54 +00:00
values, allowing the `Config` struct to own those values. In Listing 13-24,
2022-04-25 00:14:02 +00:00
weve reproduced the implementation of the `Config::new` function as it was in
Listing 12-23:
Filename: src/lib.rs
```
impl Config {
pub fn new(args: &[String]) -> Result<Config, &'static str> {
if args.len() < 3 {
return Err("not enough arguments");
}
let query = args[1].clone();
let filename = args[2].clone();
let case_sensitive = env::var("CASE_INSENSITIVE").is_err();
Ok(Config {
query,
filename,
case_sensitive,
})
}
}
```
2022-06-09 01:10:54 +00:00
Listing 13-24: Reproduction of the `Config::new` function from Listing 12-23
2022-04-25 00:14:02 +00:00
At the time, we said not to worry about the inefficient `clone` calls because
we would remove them in the future. Well, that time is now!
We needed `clone` here because we have a slice with `String` elements in the
parameter `args`, but the `new` function doesnt own `args`. To return
ownership of a `Config` instance, we had to clone the values from the `query`
and `filename` fields of `Config` so the `Config` instance can own its values.
With our new knowledge about iterators, we can change the `new` function to
take ownership of an iterator as its argument instead of borrowing a slice.
Well use the iterator functionality instead of the code that checks the length
of the slice and indexes into specific locations. This will clarify what the
`Config::new` function is doing because the iterator will access the values.
Once `Config::new` takes ownership of the iterator and stops using indexing
operations that borrow, we can move the `String` values from the iterator into
`Config` rather than calling `clone` and making a new allocation.
#### Using the Returned Iterator Directly
Open your I/O projects *src/main.rs* file, which should look like this:
Filename: src/main.rs
```
fn main() {
let args: Vec<String> = env::args().collect();
let config = Config::new(&args).unwrap_or_else(|err| {
eprintln!("Problem parsing arguments: {}", err);
process::exit(1);
});
// --snip--
}
```
2022-06-09 01:10:54 +00:00
Well first change the start of the `main` function that we had in Listing
12-24 to the code in Listing 13-25, which this time uses an iterator. This
wont compile until we update `Config::new` as well.
2022-04-25 00:14:02 +00:00
Filename: src/main.rs
```
fn main() {
let config = Config::new(env::args()).unwrap_or_else(|err| {
eprintln!("Problem parsing arguments: {}", err);
process::exit(1);
});
// --snip--
}
```
2022-06-09 01:10:54 +00:00
Listing 13-25: Passing the return value of `env::args` to `Config::new`
2022-04-25 00:14:02 +00:00
The `env::args` function returns an iterator! Rather than collecting the
iterator values into a vector and then passing a slice to `Config::new`, now
were passing ownership of the iterator returned from `env::args` to
`Config::new` directly.
Next, we need to update the definition of `Config::new`. In your I/O projects
*src/lib.rs* file, lets change the signature of `Config::new` to look like
2022-06-09 01:10:54 +00:00
Listing 13-26. This still wont compile because we need to update the function
2022-04-25 00:14:02 +00:00
body.
Filename: src/lib.rs
```
impl Config {
pub fn new(
mut args: impl Iterator<Item = String>,
) -> Result<Config, &'static str> {
// --snip--
```
2022-06-09 01:10:54 +00:00
Listing 13-26: Updating the signature of `Config::new` to expect an iterator
2022-04-25 00:14:02 +00:00
The standard library documentation for the `env::args` function shows that the
type of the iterator it returns is `std::env::Args`, and that type implements
the `Iterator` trait and returns `String` values.
Weve updated the signature of the `Config::new` function so the parameter
`args` has a generic type with the trait bounds `impl Iterator<Item = String>`
instead of `&[String]`. This usage of the `impl Trait` syntax we discussed in
2022-06-09 01:10:54 +00:00
the “Traits as Parameters” section of Chapter 10 means that `args` can be any
type that implements the `Iterator` type and returns `String` items.
2022-04-25 00:14:02 +00:00
Because were taking ownership of `args` and well be mutating `args` by
iterating over it, we can add the `mut` keyword into the specification of the
`args` parameter to make it mutable.
#### Using `Iterator` Trait Methods Instead of Indexing
Next, well fix the body of `Config::new`. Because `args` implements the
2022-06-09 01:10:54 +00:00
`Iterator` trait, we know we can call the `next` method on it! Listing 13-27
2022-04-25 00:14:02 +00:00
updates the code from Listing 12-23 to use the `next` method:
Filename: src/lib.rs
```
impl Config {
pub fn new(
mut args: impl Iterator<Item = String>,
) -> Result<Config, &'static str> {
args.next();
let query = match args.next() {
Some(arg) => arg,
None => return Err("Didn't get a query string"),
};
let filename = match args.next() {
Some(arg) => arg,
None => return Err("Didn't get a file name"),
};
let case_sensitive = env::var("CASE_INSENSITIVE").is_err();
Ok(Config {
query,
filename,
case_sensitive,
})
}
}
```
2022-06-09 01:10:54 +00:00
Listing 13-27: Changing the body of `Config::new` to use iterator methods
2022-04-25 00:14:02 +00:00
Remember that the first value in the return value of `env::args` is the name of
the program. We want to ignore that and get to the next value, so first we call
`next` and do nothing with the return value. Second, we call `next` to get the
value we want to put in the `query` field of `Config`. If `next` returns a
`Some`, we use a `match` to extract the value. If it returns `None`, it means
not enough arguments were given and we return early with an `Err` value. We do
the same thing for the `filename` value.
### Making Code Clearer with Iterator Adaptors
We can also take advantage of iterators in the `search` function in our I/O
2022-06-09 01:10:54 +00:00
project, which is reproduced here in Listing 13-28 as it was in Listing 12-19:
2022-04-25 00:14:02 +00:00
Filename: src/lib.rs
```
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
let mut results = Vec::new();
for line in contents.lines() {
if line.contains(query) {
results.push(line);
}
}
results
}
```
2022-06-09 01:10:54 +00:00
Listing 13-28: The implementation of the `search` function from Listing 12-19
2022-04-25 00:14:02 +00:00
We can write this code in a more concise way using iterator adaptor methods.
Doing so also lets us avoid having a mutable intermediate `results` vector. The
functional programming style prefers to minimize the amount of mutable state to
make code clearer. Removing the mutable state might enable a future enhancement
to make searching happen in parallel, because we wouldnt have to manage
2022-06-09 01:10:54 +00:00
concurrent access to the `results` vector. Listing 13-29 shows this change:
2022-04-25 00:14:02 +00:00
Filename: src/lib.rs
```
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
contents
.lines()
.filter(|line| line.contains(query))
.collect()
}
```
2022-06-09 01:10:54 +00:00
Listing 13-29: Using iterator adaptor methods in the implementation of the
2022-04-25 00:14:02 +00:00
`search` function
Recall that the purpose of the `search` function is to return all lines in
`contents` that contain the `query`. Similar to the `filter` example in Listing
2022-06-09 01:10:54 +00:00
13-19, this code uses the `filter` adaptor to keep only the lines that
2022-04-25 00:14:02 +00:00
`line.contains(query)` returns `true` for. We then collect the matching lines
into another vector with `collect`. Much simpler! Feel free to make the same
change to use iterator methods in the `search_case_insensitive` function as
well.
2022-06-09 01:10:54 +00:00
### What to Use When
2022-04-25 00:14:02 +00:00
The next logical question is which style you should choose in your own code and
2022-06-09 01:10:54 +00:00
why: the original implementation in Listing 13-28 or the version using
iterators in Listing 13-29. Most Rust programmers prefer to use the iterator
2022-04-25 00:14:02 +00:00
style. Its a bit tougher to get the hang of at first, but once you get a feel
for the various iterator adaptors and what they do, iterators can be easier to
understand. Instead of fiddling with the various bits of looping and building
new vectors, the code focuses on the high-level objective of the loop. This
abstracts away some of the commonplace code so its easier to see the concepts
that are unique to this code, such as the filtering condition each element in
the iterator must pass.
But are the two implementations truly equivalent? The intuitive assumption
might be that the more low-level loop will be faster. Lets talk about
performance.
## Comparing Performance: Loops vs. Iterators
To determine whether to use loops or iterators, you need to know which
implementation is faster: the version of the `search` function with an explicit
`for` loop or the version with iterators.
We ran a benchmark by loading the entire contents of *The Adventures of
Sherlock Holmes* by Sir Arthur Conan Doyle into a `String` and looking for the
word *the* in the contents. Here are the results of the benchmark on the
version of `search` using the `for` loop and the version using iterators:
```
test bench_search_for ... bench: 19,620,300 ns/iter (+/- 915,700)
test bench_search_iter ... bench: 19,234,900 ns/iter (+/- 657,200)
```
The iterator version was slightly faster! We wont explain the benchmark code
here, because the point is not to prove that the two versions are equivalent
but to get a general sense of how these two implementations compare
performance-wise.
For a more comprehensive benchmark, you should check using various texts of
various sizes as the `contents`, different words and words of different lengths
as the `query`, and all kinds of other variations. The point is this:
iterators, although a high-level abstraction, get compiled down to roughly the
same code as if youd written the lower-level code yourself. Iterators are one
of Rusts *zero-cost abstractions*, by which we mean using the abstraction
imposes no additional runtime overhead. This is analogous to how Bjarne
Stroustrup, the original designer and implementor of C++, defines
*zero-overhead* in “Foundations of C++” (2012):
> In general, C++ implementations obey the zero-overhead principle: What you
> dont use, you dont pay for. And further: What you do use, you couldnt hand
> code any better.
As another example, the following code is taken from an audio decoder. The
decoding algorithm uses the linear prediction mathematical operation to
estimate future values based on a linear function of the previous samples. This
code uses an iterator chain to do some math on three variables in scope: a
`buffer` slice of data, an array of 12 `coefficients`, and an amount by which
to shift data in `qlp_shift`. Weve declared the variables within this example
but not given them any values; although this code doesnt have much meaning
outside of its context, its still a concise, real-world example of how Rust
translates high-level ideas to low-level code.
```
let buffer: &mut [i32];
let coefficients: [i64; 12];
let qlp_shift: i16;
for i in 12..buffer.len() {
let prediction = coefficients.iter()
.zip(&buffer[i - 12..i])
.map(|(&c, &s)| c * s as i64)
.sum::<i64>() >> qlp_shift;
let delta = buffer[i];
buffer[i] = prediction as i32 + delta;
}
```
To calculate the value of `prediction`, this code iterates through each of the
12 values in `coefficients` and uses the `zip` method to pair the coefficient
values with the previous 12 values in `buffer`. Then, for each pair, we
multiply the values together, sum all the results, and shift the bits in the
sum `qlp_shift` bits to the right.
Calculations in applications like audio decoders often prioritize performance
most highly. Here, were creating an iterator, using two adaptors, and then
consuming the value. What assembly code would this Rust code compile to? Well,
as of this writing, it compiles down to the same assembly youd write by hand.
Theres no loop at all corresponding to the iteration over the values in
`coefficients`: Rust knows that there are 12 iterations, so it “unrolls” the
loop. *Unrolling* is an optimization that removes the overhead of the loop
controlling code and instead generates repetitive code for each iteration of
the loop.
All of the coefficients get stored in registers, which means accessing the
values is very fast. There are no bounds checks on the array access at runtime.
All these optimizations that Rust is able to apply make the resulting code
extremely efficient. Now that you know this, you can use iterators and closures
without fear! They make code seem like its higher level but dont impose a
runtime performance penalty for doing so.
## Summary
Closures and iterators are Rust features inspired by functional programming
language ideas. They contribute to Rusts capability to clearly express
high-level ideas at low-level performance. The implementations of closures and
iterators are such that runtime performance is not affected. This is part of
Rusts goal to strive to provide zero-cost abstractions.
Now that weve improved the expressiveness of our I/O project, lets look at
some more features of `cargo` that will help us share the project with the
world.