book/nostarch/chapter08.md

1100 lines
42 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!-- DO NOT EDIT THIS FILE.
This file is periodically generated from the content in the `/src/`
directory, so all fixes need to be made in `/src/`.
-->
[TOC]
# Common Collections
Rusts standard library includes a number of very useful data structures called
*collections*. Most other data types represent one specific value, but
collections can contain multiple values. Unlike the built-in array and tuple
types, the data these collections point to is stored on the heap, which means
the amount of data does not need to be known at compile time and can grow or
shrink as the program runs. Each kind of collection has different capabilities
and costs, and choosing an appropriate one for your current situation is a
skill youll develop over time. In this chapter, well discuss three
collections that are used very often in Rust programs:
* A *vector* allows you to store a variable number of values next to each other.
* A *string* is a collection of characters. Weve mentioned the `String` type
previously, but in this chapter well talk about it in depth.
* A *hash map* allows you to associate a value with a specific key. Its a
particular implementation of the more general data structure called a *map*.
To learn about the other kinds of collections provided by the standard library,
see the documentation at *../std/collections/index.html*.
Well discuss how to create and update vectors, strings, and hash maps, as well
as what makes each special.
## Storing Lists of Values with Vectors
The first collection type well look at is `Vec<T>`, also known as a *vector*.
Vectors allow you to store more than one value in a single data structure that
puts all the values next to each other in memory. Vectors can only store values
of the same type. They are useful when you have a list of items, such as the
lines of text in a file or the prices of items in a shopping cart.
### Creating a New Vector
To create a new empty vector, we call the `Vec::new` function, as shown in
Listing 8-1.
```
let v: Vec<i32> = Vec::new();
```
Listing 8-1: Creating a new, empty vector to hold values
of type `i32`
Note that we added a type annotation here. Because we arent inserting any
values into this vector, Rust doesnt know what kind of elements we intend to
store. This is an important point. Vectors are implemented using generics;
well cover how to use generics with your own types in Chapter 10. For now,
know that the `Vec<T>` type provided by the standard library can hold any type.
When we create a vector to hold a specific type, we can specify the type within
angle brackets. In Listing 8-1, weve told Rust that the `Vec<T>` in `v` will
hold elements of the `i32` type.
More often, youll create a `Vec<T>` with initial values and Rust will infer
the type of value you want to store, so you rarely need to do this type
annotation. Rust conveniently provides the `vec!` macro, which will create a
new vector that holds the values you give it. Listing 8-2 creates a new
`Vec<i32>` that holds the values `1`, `2`, and `3`. The integer type is `i32`
because thats the default integer type, as we discussed in the “Data
Types” section of Chapter 3.
```
let v = vec![1, 2, 3];
```
Listing 8-2: Creating a new vector containing
values
Because weve given initial `i32` values, Rust can infer that the type of `v`
is `Vec<i32>`, and the type annotation isnt necessary. Next, well look at how
to modify a vector.
### Updating a Vector
To create a vector and then add elements to it, we can use the `push` method,
as shown in Listing 8-3.
```
let mut v = Vec::new();
v.push(5);
v.push(6);
v.push(7);
v.push(8);
```
Listing 8-3: Using the `push` method to add values to a
vector
As with any variable, if we want to be able to change its value, we need to
make it mutable using the `mut` keyword, as discussed in Chapter 3. The numbers
we place inside are all of type `i32`, and Rust infers this from the data, so
we dont need the `Vec<i32>` annotation.
### Reading Elements of Vectors
There are two ways to reference a value stored in a vector: via indexing or by
using the `get` method. In the following examples, weve annotated the types of
the values that are returned from these functions for extra clarity.
Listing 8-4 shows both methods of accessing a value in a vector, with indexing
syntax and the `get` method.
```
let v = vec![1, 2, 3, 4, 5];
let third: &i32 = &v[2];
println!("The third element is {third}");
let third: Option<&i32> = v.get(2);
match third {
Some(third) => println!("The third element is {third}"),
None => println!("There is no third element."),
}
```
Listing 8-4: Using indexing syntax and using the `get`
method to access an item in a vector
Note a few details here. We use the index value of `2` to get the third element
because vectors are indexed by number, starting at zero. Using `&` and `[]`
gives us a reference to the element at the index value. When we use the `get`
method with the index passed as an argument, we get an `Option<&T>` that we can
use with `match`.
Rust provides these two ways to reference an element so you can choose how the
program behaves when you try to use an index value outside the range of
existing elements. As an example, lets see what happens when we have a vector
of five elements and then we try to access an element at index 100 with each
technique, as shown in Listing 8-5.
```
let v = vec![1, 2, 3, 4, 5];
let does_not_exist = &v[100];
let does_not_exist = v.get(100);
```
Listing 8-5: Attempting to access the element at index
100 in a vector containing five elements
When we run this code, the first `[]` method will cause the program to panic
because it references a nonexistent element. This method is best used when you
want your program to crash if theres an attempt to access an element past the
end of the vector.
When the `get` method is passed an index that is outside the vector, it returns
`None` without panicking. You would use this method if accessing an element
beyond the range of the vector may happen occasionally under normal
circumstances. Your code will then have logic to handle having either
`Some(&element)` or `None`, as discussed in Chapter 6. For example, the index
could be coming from a person entering a number. If they accidentally enter a
number thats too large and the program gets a `None` value, you could tell the
user how many items are in the current vector and give them another chance to
enter a valid value. That would be more user-friendly than crashing the program
due to a typo!
When the program has a valid reference, the borrow checker enforces the
ownership and borrowing rules (covered in Chapter 4) to ensure this reference
and any other references to the contents of the vector remain valid. Recall the
rule that states you cant have mutable and immutable references in the same
scope. That rule applies in Listing 8-6, where we hold an immutable reference
to the first element in a vector and try to add an element to the end. This
program wont work if we also try to refer to that element later in the
function.
```
let mut v = vec![1, 2, 3, 4, 5];
let first = &v[0];
v.push(6);
println!("The first element is: {first}");
```
Listing 8-6: Attempting to add an element to a vector
while holding a reference to an item
Compiling this code will result in this error:
```
$ cargo run
Compiling collections v0.1.0 (file:///projects/collections)
error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
--> src/main.rs:6:5
|
4 | let first = &v[0];
| - immutable borrow occurs here
5 |
6 | v.push(6);
| ^^^^^^^^^ mutable borrow occurs here
7 |
8 | println!("The first element is: {first}");
| ------- immutable borrow later used here
For more information about this error, try `rustc --explain E0502`.
error: could not compile `collections` (bin "collections") due to 1 previous error
```
The code in Listing 8-6 might look like it should work: why should a reference
to the first element care about changes at the end of the vector? This error is
due to the way vectors work: because vectors put the values next to each other
in memory, adding a new element onto the end of the vector might require
allocating new memory and copying the old elements to the new space, if there
isnt enough room to put all the elements next to each other where the vector
is currently stored. In that case, the reference to the first element would be
pointing to deallocated memory. The borrowing rules prevent programs from
ending up in that situation.
> Note: For more on the implementation details of the `Vec<T>` type, see “The
> Rustonomicon” at *../nomicon/vec/vec.html*.
### Iterating Over the Values in a Vector
To access each element in a vector in turn, we would iterate through all of the
elements rather than use indices to access one at a time. Listing 8-7 shows how
to use a `for` loop to get immutable references to each element in a vector of
`i32` values and print them.
```
let v = vec![100, 32, 57];
for i in &v {
println!("{i}");
}
```
Listing 8-7: Printing each element in a vector by
iterating over the elements using a `for` loop
We can also iterate over mutable references to each element in a mutable vector
in order to make changes to all the elements. The `for` loop in Listing 8-8
will add `50` to each element.
```
let mut v = vec![100, 32, 57];
for i in &mut v {
*i += 50;
}
```
Listing 8-8: Iterating over mutable references to
elements in a vector
To change the value that the mutable reference refers to, we have to use the
`*` dereference operator to get to the value in `i` before we can use the `+=`
operator. Well talk more about the dereference operator in the “Following the
Pointer to the Value with the Dereference Operator”
section of Chapter 15.
Iterating over a vector, whether immutably or mutably, is safe because of the
borrow checkers rules. If we attempted to insert or remove items in the `for`
loop bodies in Listing 8-7 and Listing 8-8, we would get a compiler error
similar to the one we got with the code in Listing 8-6. The reference to the
vector that the `for` loop holds prevents simultaneous modification of the
whole vector.
### Using an Enum to Store Multiple Types
Vectors can only store values that are of the same type. This can be
inconvenient; there are definitely use cases for needing to store a list of
items of different types. Fortunately, the variants of an enum are defined
under the same enum type, so when we need one type to represent elements of
different types, we can define and use an enum!
For example, say we want to get values from a row in a spreadsheet in which
some of the columns in the row contain integers, some floating-point numbers,
and some strings. We can define an enum whose variants will hold the different
value types, and all the enum variants will be considered the same type: that
of the enum. Then we can create a vector to hold that enum and so, ultimately,
hold different types. Weve demonstrated this in Listing 8-9.
```
enum SpreadsheetCell {
Int(i32),
Float(f64),
Text(String),
}
let row = vec![
SpreadsheetCell::Int(3),
SpreadsheetCell::Text(String::from("blue")),
SpreadsheetCell::Float(10.12),
];
```
Listing 8-9: Defining an `enum` to store values of
different types in one vector
Rust needs to know what types will be in the vector at compile time so it knows
exactly how much memory on the heap will be needed to store each element. We
must also be explicit about what types are allowed in this vector. If Rust
allowed a vector to hold any type, there would be a chance that one or more of
the types would cause errors with the operations performed on the elements of
the vector. Using an enum plus a `match` expression means that Rust will ensure
at compile time that every possible case is handled, as discussed in Chapter 6.
If you dont know the exhaustive set of types a program will get at runtime to
store in a vector, the enum technique wont work. Instead, you can use a trait
object, which well cover in Chapter 17.
Now that weve discussed some of the most common ways to use vectors, be sure
to review the API documentation for all of the many
useful methods defined on `Vec<T>` by the standard library. For example, in
addition to `push`, a `pop` method removes and returns the last element.
### Dropping a Vector Drops Its Elements
Like any other `struct`, a vector is freed when it goes out of scope, as
annotated in Listing 8-10.
```
{
let v = vec![1, 2, 3, 4];
// do stuff with v
} // <- v goes out of scope and is freed here
```
Listing 8-10: Showing where the vector and its elements
are dropped
When the vector gets dropped, all of its contents are also dropped, meaning the
integers it holds will be cleaned up. The borrow checker ensures that any
references to contents of a vector are only used while the vector itself is
valid.
Lets move on to the next collection type: `String`!
## Storing UTF-8 Encoded Text with Strings
We talked about strings in Chapter 4, but well look at them in more depth now.
New Rustaceans commonly get stuck on strings for a combination of three
reasons: Rusts propensity for exposing possible errors, strings being a more
complicated data structure than many programmers give them credit for, and
UTF-8. These factors combine in a way that can seem difficult when youre
coming from other programming languages.
We discuss strings in the context of collections because strings are
implemented as a collection of bytes, plus some methods to provide useful
functionality when those bytes are interpreted as text. In this section, well
talk about the operations on `String` that every collection type has, such as
creating, updating, and reading. Well also discuss the ways in which `String`
is different from the other collections, namely how indexing into a `String` is
complicated by the differences between how people and computers interpret
`String` data.
### What Is a String?
Well first define what we mean by the term *string*. Rust has only one string
type in the core language, which is the string slice `str` that is usually seen
in its borrowed form `&str`. In Chapter 4, we talked about *string slices*,
which are references to some UTF-8 encoded string data stored elsewhere. String
literals, for example, are stored in the programs binary and are therefore
string slices.
The `String` type, which is provided by Rusts standard library rather than
coded into the core language, is a growable, mutable, owned, UTF-8 encoded
string type. When Rustaceans refer to “strings” in Rust, they might be
referring to either the `String` or the string slice `&str` types, not just one
of those types. Although this section is largely about `String`, both types are
used heavily in Rusts standard library, and both `String` and string slices
are UTF-8 encoded.
### Creating a New String
Many of the same operations available with `Vec<T>` are available with `String`
as well because `String` is actually implemented as a wrapper around a vector
of bytes with some extra guarantees, restrictions, and capabilities. An example
of a function that works the same way with `Vec<T>` and `String` is the `new`
function to create an instance, shown in Listing 8-11.
```
let mut s = String::new();
```
Listing 8-11: Creating a new, empty `String`
This line creates a new, empty string called `s`, into which we can then load
data. Often, well have some initial data with which we want to start the
string. For that, we use the `to_string` method, which is available on any type
that implements the `Display` trait, as string literals do. Listing 8-12 shows
two examples.
```
let data = "initial contents";
let s = data.to_string();
// the method also works on a literal directly:
let s = "initial contents".to_string();
```
Listing 8-12: Using the `to_string` method to create a
`String` from a string literal
This code creates a string containing `initial contents`.
We can also use the function `String::from` to create a `String` from a string
literal. The code in Listing 8-13 is equivalent to the code in Listing 8-12
that uses `to_string`.
```
let s = String::from("initial contents");
```
Listing 8-13: Using the `String::from` function to create
a `String` from a string literal
Because strings are used for so many things, we can use many different generic
APIs for strings, providing us with a lot of options. Some of them can seem
redundant, but they all have their place! In this case, `String::from` and
`to_string` do the same thing, so which one you choose is a matter of style and
readability.
Remember that strings are UTF-8 encoded, so we can include any properly encoded
data in them, as shown in Listing 8-14.
```
let hello = String::from("السلام عليكم");
let hello = String::from("Dobrý den");
let hello = String::from("Hello");
let hello = String::from("שלום");
let hello = String::from("नमस्ते");
let hello = String::from("こんにちは");
let hello = String::from("안녕하세요");
let hello = String::from("你好");
let hello = String::from("Olá");
let hello = String::from("Здравствуйте");
let hello = String::from("Hola");
```
Listing 8-14: Storing greetings in different languages in
strings
All of these are valid `String` values.
### Updating a String
A `String` can grow in size and its contents can change, just like the contents
of a `Vec<T>`, if you push more data into it. In addition, you can conveniently
use the `+` operator or the `format!` macro to concatenate `String` values.
#### Appending to a String with `push_str` and `push`
We can grow a `String` by using the `push_str` method to append a string slice,
as shown in Listing 8-15.
```
let mut s = String::from("foo");
s.push_str("bar");
```
Listing 8-15: Appending a string slice to a `String`
using the `push_str` method
After these two lines, `s` will contain `foobar`. The `push_str` method takes a
string slice because we dont necessarily want to take ownership of the
parameter. For example, in the code in Listing 8-16, we want to be able to use
`s2` after appending its contents to `s1`.
```
let mut s1 = String::from("foo");
let s2 = "bar";
s1.push_str(s2);
println!("s2 is {s2}");
```
Listing 8-16: Using a string slice after appending its
contents to a `String`
If the `push_str` method took ownership of `s2`, we wouldnt be able to print
its value on the last line. However, this code works as wed expect!
The `push` method takes a single character as a parameter and adds it to the
`String`. Listing 8-17 adds the letter *l* to a `String` using the `push`
method.
```
let mut s = String::from("lo");
s.push('l');
```
Listing 8-17: Adding one character to a `String` value
using `push`
As a result, `s` will contain `lol`.
#### Concatenation with the `+` Operator or the `format!` Macro
Often, youll want to combine two existing strings. One way to do so is to use
the `+` operator, as shown in Listing 8-18.
```
let s1 = String::from("Hello, ");
let s2 = String::from("world!");
let s3 = s1 + &s2; // note s1 has been moved here and can no longer be used
```
Listing 8-18: Using the `+` operator to combine two
`String` values into a new `String` value
The string `s3` will contain `Hello, world!`. The reason `s1` is no longer
valid after the addition, and the reason we used a reference to `s2`, has to do
with the signature of the method thats called when we use the `+` operator.
The `+` operator uses the `add` method, whose signature looks something like
this:
```
fn add(self, s: &str) -> String {
```
In the standard library, youll see `add` defined using generics and associated
types. Here, weve substituted in concrete types, which is what happens when we
call this method with `String` values. Well discuss generics in Chapter 10.
This signature gives us the clues we need in order to understand the tricky
bits of the `+` operator.
First, `s2` has an `&`, meaning that were adding a *reference* of the second
string to the first string. This is because of the `s` parameter in the `add`
function: we can only add a `&str` to a `String`; we cant add two `String`
values together. But wait—the type of `&s2` is `&String`, not `&str`, as
specified in the second parameter to `add`. So why does Listing 8-18 compile?
The reason were able to use `&s2` in the call to `add` is that the compiler
can *coerce* the `&String` argument into a `&str`. When we call the `add`
method, Rust uses a *deref coercion*, which here turns `&s2` into `&s2[..]`.
Well discuss deref coercion in more depth in Chapter 15. Because `add` does
not take ownership of the `s` parameter, `s2` will still be a valid `String`
after this operation.
Second, we can see in the signature that `add` takes ownership of `self`
because `self` does *not* have an `&`. This means `s1` in Listing 8-18 will be
moved into the `add` call and will no longer be valid after that. So, although
`let s3 = s1 + &s2;` looks like it will copy both strings and create a new one,
this statement actually takes ownership of `s1`, appends a copy of the contents
of `s2`, and then returns ownership of the result. In other words, it looks
like its making a lot of copies, but it isnt; the implementation is more
efficient than copying.
If we need to concatenate multiple strings, the behavior of the `+` operator
gets unwieldy:
```
let s1 = String::from("tic");
let s2 = String::from("tac");
let s3 = String::from("toe");
let s = s1 + "-" + &s2 + "-" + &s3;
```
At this point, `s` will be `tic-tac-toe`. With all of the `+` and `"`
characters, its difficult to see whats going on. For combining strings in
more complicated ways, we can instead use the `format!` macro:
```
let s1 = String::from("tic");
let s2 = String::from("tac");
let s3 = String::from("toe");
let s = format!("{s1}-{s2}-{s3}");
```
This code also sets `s` to `tic-tac-toe`. The `format!` macro works like
`println!`, but instead of printing the output to the screen, it returns a
`String` with the contents. The version of the code using `format!` is much
easier to read, and the code generated by the `format!` macro uses references
so that this call doesnt take ownership of any of its parameters.
### Indexing into Strings
In many other programming languages, accessing individual characters in a
string by referencing them by index is a valid and common operation. However,
if you try to access parts of a `String` using indexing syntax in Rust, youll
get an error. Consider the invalid code in Listing 8-19.
```
let s1 = String::from("hello");
let h = s1[0];
```
Listing 8-19: Attempting to use indexing syntax with a
String
This code will result in the following error:
```
$ cargo run
Compiling collections v0.1.0 (file:///projects/collections)
error[E0277]: the type `String` cannot be indexed by `{integer}`
--> src/main.rs:3:16
|
3 | let h = s1[0];
| ^ `String` cannot be indexed by `{integer}`
|
= help: the trait `Index<{integer}>` is not implemented for `String`
= help: the following other types implement trait `Index<Idx>`:
<String as Index<RangeFull>>
<String as Index<std::ops::Range<usize>>>
<String as Index<RangeFrom<usize>>>
<String as Index<RangeTo<usize>>>
<String as Index<RangeInclusive<usize>>>
<String as Index<RangeToInclusive<usize>>>
For more information about this error, try `rustc --explain E0277`.
error: could not compile `collections` (bin "collections") due to 1 previous error
```
The error and the note tell the story: Rust strings dont support indexing. But
why not? To answer that question, we need to discuss how Rust stores strings in
memory.
#### Internal Representation
A `String` is a wrapper over a `Vec<u8>`. Lets look at some of our properly
encoded UTF-8 example strings from Listing 8-14. First, this one:
```
let hello = String::from("Hola");
```
In this case, `len` will be `4`, which means the vector storing the string
`"Hola"` is 4 bytes long. Each of these letters takes one byte when encoded in
UTF-8. The following line, however, may surprise you (note that this string
begins with the capital Cyrillic letter *Ze*, not the number 3):
```
let hello = String::from("Здравствуйте");
```
If you were asked how long the string is, you might say 12. In fact, Rusts
answer is 24: thats the number of bytes it takes to encode “Здравствуйте” in
UTF-8, because each Unicode scalar value in that string takes 2 bytes of
storage. Therefore, an index into the strings bytes will not always correlate
to a valid Unicode scalar value. To demonstrate, consider this invalid Rust
code:
```
let hello = "Здравствуйте";
let answer = &hello[0];
```
You already know that `answer` will not be `З`, the first letter. When encoded
in UTF-8, the first byte of `З` is `208` and the second is `151`, so it would
seem that `answer` should in fact be `208`, but `208` is not a valid character
on its own. Returning `208` is likely not what a user would want if they asked
for the first letter of this string; however, thats the only data that Rust
has at byte index 0. Users generally dont want the byte value returned, even
if the string contains only Latin letters: if `&"hello"[0]` were valid code
that returned the byte value, it would return `104`, not `h`.
The answer, then, is that to avoid returning an unexpected value and causing
bugs that might not be discovered immediately, Rust doesnt compile this code
at all and prevents misunderstandings early in the development process.
#### Bytes and Scalar Values and Grapheme Clusters! Oh My!
Another point about UTF-8 is that there are actually three relevant ways to
look at strings from Rusts perspective: as bytes, scalar values, and grapheme
clusters (the closest thing to what we would call *letters*).
If we look at the Hindi word “नमस्ते” written in the Devanagari script, it is
stored as a vector of `u8` values that looks like this:
```
[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164,
224, 165, 135]
```
Thats 18 bytes and is how computers ultimately store this data. If we look at
them as Unicode scalar values, which are what Rusts `char` type is, those
bytes look like this:
```
['न', 'म', 'स', '्', 'त', 'े']
```
There are six `char` values here, but the fourth and sixth are not letters:
theyre diacritics that dont make sense on their own. Finally, if we look at
them as grapheme clusters, wed get what a person would call the four letters
that make up the Hindi word:
```
["न", "म", "स्", "ते"]
```
Rust provides different ways of interpreting the raw string data that computers
store so that each program can choose the interpretation it needs, no matter
what human language the data is in.
A final reason Rust doesnt allow us to index into a `String` to get a
character is that indexing operations are expected to always take constant time
(O(1)). But it isnt possible to guarantee that performance with a `String`,
because Rust would have to walk through the contents from the beginning to the
index to determine how many valid characters there were.
### Slicing Strings
Indexing into a string is often a bad idea because its not clear what the
return type of the string-indexing operation should be: a byte value, a
character, a grapheme cluster, or a string slice. If you really need to use
indices to create string slices, therefore, Rust asks you to be more specific.
Rather than indexing using `[]` with a single number, you can use `[]` with a
range to create a string slice containing particular bytes:
```
let hello = "Здравствуйте";
let s = &hello[0..4];
```
Here, `s` will be a `&str` that contains the first four bytes of the string.
Earlier, we mentioned that each of these characters was two bytes, which means
`s` will be `Зд`.
If we were to try to slice only part of a characters bytes with something like
`&hello[0..1]`, Rust would panic at runtime in the same way as if an invalid
index were accessed in a vector:
```
$ cargo run
Compiling collections v0.1.0 (file:///projects/collections)
Finished dev [unoptimized + debuginfo] target(s) in 0.43s
Running `target/debug/collections`
thread 'main' panicked at src/main.rs:4:19:
byte index 1 is not a char boundary; it is inside 'З' (bytes 0..2) of `Здравствуйте`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
You should use caution when creating string slices with ranges, because doing
so can crash your program.
### Methods for Iterating Over Strings
The best way to operate on pieces of strings is to be explicit about whether
you want characters or bytes. For individual Unicode scalar values, use the
`chars` method. Calling `chars` on “Зд” separates out and returns two values of
type `char`, and you can iterate over the result to access each element:
```
for c in "Зд".chars() {
println!("{c}");
}
```
This code will print the following:
```
З
д
```
Alternatively, the `bytes` method returns each raw byte, which might be
appropriate for your domain:
```
for b in "Зд".bytes() {
println!("{b}");
}
```
This code will print the four bytes that make up this string:
```
208
151
208
180
```
But be sure to remember that valid Unicode scalar values may be made up of more
than one byte.
Getting grapheme clusters from strings, as with the Devanagari script, is
complex, so this functionality is not provided by the standard library. Crates
are available on crates.io if this is the
functionality you need.
### Strings Are Not So Simple
To summarize, strings are complicated. Different programming languages make
different choices about how to present this complexity to the programmer. Rust
has chosen to make the correct handling of `String` data the default behavior
for all Rust programs, which means programmers have to put more thought into
handling UTF-8 data up front. This trade-off exposes more of the complexity of
strings than is apparent in other programming languages, but it prevents you
from having to handle errors involving non-ASCII characters later in your
development life cycle.
The good news is that the standard library offers a lot of functionality built
off the `String` and `&str` types to help handle these complex situations
correctly. Be sure to check out the documentation for useful methods like
`contains` for searching in a string and `replace` for substituting parts of a
string with another string.
Lets switch to something a bit less complex: hash maps!
## Storing Keys with Associated Values in Hash Maps
The last of our common collections is the *hash map*. The type `HashMap<K, V>`
stores a mapping of keys of type `K` to values of type `V` using a *hashing
function*, which determines how it places these keys and values into memory.
Many programming languages support this kind of data structure, but they often
use a different name, such as *hash*, *map*, *object*, *hash table*,
*dictionary*, or *associative array*, just to name a few.
Hash maps are useful when you want to look up data not by using an index, as
you can with vectors, but by using a key that can be of any type. For example,
in a game, you could keep track of each teams score in a hash map in which
each key is a teams name and the values are each teams score. Given a team
name, you can retrieve its score.
Well go over the basic API of hash maps in this section, but many more goodies
are hiding in the functions defined on `HashMap<K, V>` by the standard library.
As always, check the standard library documentation for more information.
### Creating a New Hash Map
One way to create an empty hash map is to use `new` and to add elements with
`insert`. In Listing 8-20, were keeping track of the scores of two teams whose
names are *Blue* and *Yellow*. The Blue team starts with 10 points, and the
Yellow team starts with 50.
```
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
```
Listing 8-20: Creating a new hash map and inserting some
keys and values
Note that we need to first `use` the `HashMap` from the collections portion of
the standard library. Of our three common collections, this one is the least
often used, so its not included in the features brought into scope
automatically in the prelude. Hash maps also have less support from the
standard library; theres no built-in macro to construct them, for example.
Just like vectors, hash maps store their data on the heap. This `HashMap` has
keys of type `String` and values of type `i32`. Like vectors, hash maps are
homogeneous: all of the keys must have the same type, and all of the values
must have the same type.
### Accessing Values in a Hash Map
We can get a value out of the hash map by providing its key to the `get`
method, as shown in Listing 8-21.
```
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
let team_name = String::from("Blue");
let score = scores.get(&team_name).copied().unwrap_or(0);
```
Listing 8-21: Accessing the score for the Blue team
stored in the hash map
Here, `score` will have the value thats associated with the Blue team, and the
result will be `10`. The `get` method returns an `Option<&V>`; if theres no
value for that key in the hash map, `get` will return `None`. This program
handles the `Option` by calling `copied` to get an `Option<i32>` rather than an
`Option<&i32>`, then `unwrap_or` to set `score` to zero if `scores` doesnt
have an entry for the key.
We can iterate over each keyvalue pair in a hash map in a similar manner as we
do with vectors, using a `for` loop:
```
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
for (key, value) in &scores {
println!("{key}: {value}");
}
```
This code will print each pair in an arbitrary order:
```
Yellow: 50
Blue: 10
```
### Hash Maps and Ownership
For types that implement the `Copy` trait, like `i32`, the values are copied
into the hash map. For owned values like `String`, the values will be moved and
the hash map will be the owner of those values, as demonstrated in Listing 8-22.
```
use std::collections::HashMap;
let field_name = String::from("Favorite color");
let field_value = String::from("Blue");
let mut map = HashMap::new();
map.insert(field_name, field_value);
// field_name and field_value are invalid at this point, try using them and
// see what compiler error you get!
```
Listing 8-22: Showing that keys and values are owned by
the hash map once theyre inserted
We arent able to use the variables `field_name` and `field_value` after
theyve been moved into the hash map with the call to `insert`.
If we insert references to values into the hash map, the values wont be moved
into the hash map. The values that the references point to must be valid for at
least as long as the hash map is valid. Well talk more about these issues in
the “Validating References with
Lifetimes” section in
Chapter 10.
### Updating a Hash Map
Although the number of key and value pairs is growable, each unique key can
only have one value associated with it at a time (but not vice versa: for
example, both the Blue team and the Yellow team could have the value `10`
stored in the `scores` hash map).
When you want to change the data in a hash map, you have to decide how to
handle the case when a key already has a value assigned. You could replace the
old value with the new value, completely disregarding the old value. You could
keep the old value and ignore the new value, only adding the new value if the
key *doesnt* already have a value. Or you could combine the old value and the
new value. Lets look at how to do each of these!
#### Overwriting a Value
If we insert a key and a value into a hash map and then insert that same key
with a different value, the value associated with that key will be replaced.
Even though the code in Listing 8-23 calls `insert` twice, the hash map will
only contain one keyvalue pair because were inserting the value for the Blue
teams key both times.
```
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Blue"), 25);
println!("{scores:?}");
```
Listing 8-23: Replacing a value stored with a particular
key
This code will print `{"Blue": 25}`. The original value of `10` has been
overwritten.
<!-- Old headings. Do not remove or links may break. -->
<a id="only-inserting-a-value-if-the-key-has-no-value"></a>
#### Adding a Key and Value Only If a Key Isnt Present
Its common to check whether a particular key already exists in the hash map
with a value and then to take the following actions: if the key does exist in
the hash map, the existing value should remain the way it is; if the key
doesnt exist, insert it and a value for it.
Hash maps have a special API for this called `entry` that takes the key you
want to check as a parameter. The return value of the `entry` method is an enum
called `Entry` that represents a value that might or might not exist. Lets say
we want to check whether the key for the Yellow team has a value associated
with it. If it doesnt, we want to insert the value `50`, and the same for the
Blue team. Using the `entry` API, the code looks like Listing 8-24.
```
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.entry(String::from("Yellow")).or_insert(50);
scores.entry(String::from("Blue")).or_insert(50);
println!("{scores:?}");
```
Listing 8-24: Using the `entry` method to only insert if
the key does not already have a value
The `or_insert` method on `Entry` is defined to return a mutable reference to
the value for the corresponding `Entry` key if that key exists, and if not, it
inserts the parameter as the new value for this key and returns a mutable
reference to the new value. This technique is much cleaner than writing the
logic ourselves and, in addition, plays more nicely with the borrow checker.
Running the code in Listing 8-24 will print `{"Yellow": 50, "Blue": 10}`. The
first call to `entry` will insert the key for the Yellow team with the value
`50` because the Yellow team doesnt have a value already. The second call to
`entry` will not change the hash map because the Blue team already has the
value `10`.
#### Updating a Value Based on the Old Value
Another common use case for hash maps is to look up a keys value and then
update it based on the old value. For instance, Listing 8-25 shows code that
counts how many times each word appears in some text. We use a hash map with
the words as keys and increment the value to keep track of how many times weve
seen that word. If its the first time weve seen a word, well first insert
the value `0`.
```
use std::collections::HashMap;
let text = "hello world wonderful world";
let mut map = HashMap::new();
for word in text.split_whitespace() {
let count = map.entry(word).or_insert(0);
*count += 1;
}
println!("{map:?}");
```
Listing 8-25: Counting occurrences of words using a hash
map that stores words and counts
This code will print `{"world": 2, "hello": 1, "wonderful": 1}`. You might see
the same keyvalue pairs printed in a different order: recall from the
“Accessing Values in a Hash Map” section that
iterating over a hash map happens in an arbitrary order.
The `split_whitespace` method returns an iterator over subslices, separated by
whitespace, of the value in `text`. The `or_insert` method returns a mutable
reference (`&mut V`) to the value for the specified key. Here, we store that
mutable reference in the `count` variable, so in order to assign to that value,
we must first dereference `count` using the asterisk (`*`). The mutable
reference goes out of scope at the end of the `for` loop, so all of these
changes are safe and allowed by the borrowing rules.
### Hashing Functions
By default, `HashMap` uses a hashing function called *SipHash* that can provide
resistance to denial-of-service (DoS) attacks involving hash
tables^siphash at *[https://en.wikipedia.org/wiki/SipHash](https://en.wikipedia.org/wiki/SipHash)*<!-- ignore -->. This is not the fastest hashing algorithm
available, but the trade-off for better security that comes with the drop in
performance is worth it. If you profile your code and find that the default
hash function is too slow for your purposes, you can switch to another function
by specifying a different hasher. A *hasher* is a type that implements the
`BuildHasher` trait. Well talk about traits and how to implement them in
Chapter 10. You dont necessarily have to implement
your own hasher from scratch; crates.io
has libraries shared by other Rust users that provide hashers implementing many
common hashing algorithms.
## Summary
Vectors, strings, and hash maps will provide a large amount of functionality
necessary in programs when you need to store, access, and modify data. Here are
some exercises you should now be equipped to solve:
1. Given a list of integers, use a vector and return the median (when sorted,
the value in the middle position) and mode (the value that occurs most
often; a hash map will be helpful here) of the list.
1. Convert strings to pig latin. The first consonant of each word is moved to
the end of the word and *ay* is added, so *first* becomes *irst-fay*. Words
that start with a vowel have *hay* added to the end instead (*apple* becomes
*apple-hay*). Keep in mind the details about UTF-8 encoding!
1. Using a hash map and vectors, create a text interface to allow a user to add
employee names to a department in a company; for example, “Add Sally to
Engineering” or “Add Amir to Sales.” Then let the user retrieve a list of all
people in a department or all people in the company by department, sorted
alphabetically.
The standard library API documentation describes methods that vectors, strings,
and hash maps have that will be helpful for these exercises!
Were getting into more complex programs in which operations can fail, so its
a perfect time to discuss error handling. Well do that next!