mirror of https://github.com/rust-lang/book
401 lines
13 KiB
Markdown
401 lines
13 KiB
Markdown
|
## The Slice Type
|
|||
|
|
|||
|
Another data type that does not have ownership is the *slice*. Slices let you
|
|||
|
reference a contiguous sequence of elements in a collection rather than the
|
|||
|
whole collection.
|
|||
|
|
|||
|
Here’s a small programming problem: write a function that takes a string and
|
|||
|
returns the first word it finds in that string. If the function doesn’t find a
|
|||
|
space in the string, the whole string must be one word, so the entire string
|
|||
|
should be returned.
|
|||
|
|
|||
|
Let’s think about the signature of this function:
|
|||
|
|
|||
|
```rust,ignore
|
|||
|
fn first_word(s: &String) -> ?
|
|||
|
```
|
|||
|
|
|||
|
This function, `first_word`, has a `&String` as a parameter. We don’t want
|
|||
|
ownership, so this is fine. But what should we return? We don’t really have a
|
|||
|
way to talk about *part* of a string. However, we could return the index of the
|
|||
|
end of the word. Let’s try that, as shown in Listing 4-7:
|
|||
|
|
|||
|
<span class="filename">Filename: src/main.rs</span>
|
|||
|
|
|||
|
```rust
|
|||
|
fn first_word(s: &String) -> usize {
|
|||
|
let bytes = s.as_bytes();
|
|||
|
|
|||
|
for (i, &item) in bytes.iter().enumerate() {
|
|||
|
if item == b' ' {
|
|||
|
return i;
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
s.len()
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
<span class="caption">Listing 4-7: The `first_word` function that returns a
|
|||
|
byte index value into the `String` parameter</span>
|
|||
|
|
|||
|
Because we need to go through the `String` element by element and check whether
|
|||
|
a value is a space, we’ll convert our `String` to an array of bytes using the
|
|||
|
`as_bytes` method:
|
|||
|
|
|||
|
```rust,ignore
|
|||
|
let bytes = s.as_bytes();
|
|||
|
```
|
|||
|
|
|||
|
Next, we create an iterator over the array of bytes using the `iter` method:
|
|||
|
|
|||
|
```rust,ignore
|
|||
|
for (i, &item) in bytes.iter().enumerate() {
|
|||
|
```
|
|||
|
|
|||
|
We’ll discuss iterators in more detail in Chapter 13. For now, know that `iter`
|
|||
|
is a method that returns each element in a collection and that `enumerate`
|
|||
|
wraps the result of `iter` and returns each element as part of a tuple instead.
|
|||
|
The first element of the tuple returned from `enumerate` is the index, and the
|
|||
|
second element is a reference to the element. This is a bit more convenient
|
|||
|
than calculating the index ourselves.
|
|||
|
|
|||
|
Because the `enumerate` method returns a tuple, we can use patterns to
|
|||
|
destructure that tuple, just like everywhere else in Rust. So in the `for`
|
|||
|
loop, we specify a pattern that has `i` for the index in the tuple and `&item`
|
|||
|
for the single byte in the tuple. Because we get a reference to the element
|
|||
|
from `.iter().enumerate()`, we use `&` in the pattern.
|
|||
|
|
|||
|
Inside the `for` loop, we search for the byte that represents the space by
|
|||
|
using the byte literal syntax. If we find a space, we return the position.
|
|||
|
Otherwise, we return the length of the string by using `s.len()`:
|
|||
|
|
|||
|
```rust,ignore
|
|||
|
if item == b' ' {
|
|||
|
return i;
|
|||
|
}
|
|||
|
}
|
|||
|
s.len()
|
|||
|
```
|
|||
|
|
|||
|
We now have a way to find out the index of the end of the first word in the
|
|||
|
string, but there’s a problem. We’re returning a `usize` on its own, but it’s
|
|||
|
only a meaningful number in the context of the `&String`. In other words,
|
|||
|
because it’s a separate value from the `String`, there’s no guarantee that it
|
|||
|
will still be valid in the future. Consider the program in Listing 4-8 that
|
|||
|
uses the `first_word` function from Listing 4-7:
|
|||
|
|
|||
|
<span class="filename">Filename: src/main.rs</span>
|
|||
|
|
|||
|
```rust
|
|||
|
# fn first_word(s: &String) -> usize {
|
|||
|
# let bytes = s.as_bytes();
|
|||
|
#
|
|||
|
# for (i, &item) in bytes.iter().enumerate() {
|
|||
|
# if item == b' ' {
|
|||
|
# return i;
|
|||
|
# }
|
|||
|
# }
|
|||
|
#
|
|||
|
# s.len()
|
|||
|
# }
|
|||
|
#
|
|||
|
fn main() {
|
|||
|
let mut s = String::from("hello world");
|
|||
|
|
|||
|
let word = first_word(&s); // word will get the value 5
|
|||
|
|
|||
|
s.clear(); // This empties the String, making it equal to ""
|
|||
|
|
|||
|
// word still has the value 5 here, but there's no more string that
|
|||
|
// we could meaningfully use the value 5 with. word is now totally invalid!
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
<span class="caption">Listing 4-8: Storing the result from calling the
|
|||
|
`first_word` function and then changing the `String` contents</span>
|
|||
|
|
|||
|
This program compiles without any errors and would also do so if we used `word`
|
|||
|
after calling `s.clear()`. Because `word` isn’t connected to the state of `s`
|
|||
|
at all, `word` still contains the value `5`. We could use that value `5` with
|
|||
|
the variable `s` to try to extract the first word out, but this would be a bug
|
|||
|
because the contents of `s` have changed since we saved `5` in `word`.
|
|||
|
|
|||
|
Having to worry about the index in `word` getting out of sync with the data in
|
|||
|
`s` is tedious and error prone! Managing these indices is even more brittle if
|
|||
|
we write a `second_word` function. Its signature would have to look like this:
|
|||
|
|
|||
|
```rust,ignore
|
|||
|
fn second_word(s: &String) -> (usize, usize) {
|
|||
|
```
|
|||
|
|
|||
|
Now we’re tracking a starting *and* an ending index, and we have even more
|
|||
|
values that were calculated from data in a particular state but aren’t tied to
|
|||
|
that state at all. We now have three unrelated variables floating around that
|
|||
|
need to be kept in sync.
|
|||
|
|
|||
|
Luckily, Rust has a solution to this problem: string slices.
|
|||
|
|
|||
|
### String Slices
|
|||
|
|
|||
|
A *string slice* is a reference to part of a `String`, and it looks like this:
|
|||
|
|
|||
|
```rust
|
|||
|
let s = String::from("hello world");
|
|||
|
|
|||
|
let hello = &s[0..5];
|
|||
|
let world = &s[6..11];
|
|||
|
```
|
|||
|
|
|||
|
This is similar to taking a reference to the whole `String` but with the extra
|
|||
|
`[0..5]` bit. Rather than a reference to the entire `String`, it’s a reference
|
|||
|
to a portion of the `String`. The `start..end` syntax is a range that begins at
|
|||
|
`start` and continues up to, but not including, `end`. If we wanted to include
|
|||
|
`end`, we can use `..=` instead of `..`:
|
|||
|
|
|||
|
```rust
|
|||
|
let s = String::from("hello world");
|
|||
|
|
|||
|
let hello = &s[0..=4];
|
|||
|
let world = &s[6..=10];
|
|||
|
```
|
|||
|
|
|||
|
The `=` means that we’re including the last number, if that helps you remember
|
|||
|
the difference between `..` and `..=`.
|
|||
|
|
|||
|
We can create slices using a range within brackets by specifying
|
|||
|
`[starting_index..ending_index]`, where `starting_index` is the first position
|
|||
|
in the slice and `ending_index` is one more than the last position in the
|
|||
|
slice. Internally, the slice data structure stores the starting position and
|
|||
|
the length of the slice, which corresponds to `ending_index` minus
|
|||
|
`starting_index`. So in the case of `let world = &s[6..11];`, `world` would be
|
|||
|
a slice that contains a pointer to the 7th byte of `s` and a length value of 5.
|
|||
|
|
|||
|
Figure 4-6 shows this in a diagram.
|
|||
|
|
|||
|
<img alt="world containing a pointer to the 6th byte of String s and a length 5" src="img/trpl04-06.svg" class="center" style="width: 50%;" />
|
|||
|
|
|||
|
<span class="caption">Figure 4-6: String slice referring to part of a
|
|||
|
`String`</span>
|
|||
|
|
|||
|
With Rust’s `..` range syntax, if you want to start at the first index (zero),
|
|||
|
you can drop the value before the two periods. In other words, these are equal:
|
|||
|
|
|||
|
```rust
|
|||
|
let s = String::from("hello");
|
|||
|
|
|||
|
let slice = &s[0..2];
|
|||
|
let slice = &s[..2];
|
|||
|
```
|
|||
|
|
|||
|
By the same token, if your slice includes the last byte of the `String`, you
|
|||
|
can drop the trailing number. That means these are equal:
|
|||
|
|
|||
|
```rust
|
|||
|
let s = String::from("hello");
|
|||
|
|
|||
|
let len = s.len();
|
|||
|
|
|||
|
let slice = &s[3..len];
|
|||
|
let slice = &s[3..];
|
|||
|
```
|
|||
|
|
|||
|
You can also drop both values to take a slice of the entire string. So these
|
|||
|
are equal:
|
|||
|
|
|||
|
```rust
|
|||
|
let s = String::from("hello");
|
|||
|
|
|||
|
let len = s.len();
|
|||
|
|
|||
|
let slice = &s[0..len];
|
|||
|
let slice = &s[..];
|
|||
|
```
|
|||
|
|
|||
|
> Note: String slice range indices must occur at valid UTF-8 character
|
|||
|
> boundaries. If you attempt to create a string slice in the middle of a
|
|||
|
> multibyte character, your program will exit with an error. For the purposes
|
|||
|
> of introducing string slices, we are assuming ASCII only in this section; a
|
|||
|
> more thorough discussion of UTF-8 handling is in the “Strings” section of
|
|||
|
> Chapter 8.
|
|||
|
|
|||
|
With all this information in mind, let’s rewrite `first_word` to return a
|
|||
|
slice. The type that signifies “string slice” is written as `&str`:
|
|||
|
|
|||
|
<span class="filename">Filename: src/main.rs</span>
|
|||
|
|
|||
|
```rust
|
|||
|
fn first_word(s: &String) -> &str {
|
|||
|
let bytes = s.as_bytes();
|
|||
|
|
|||
|
for (i, &item) in bytes.iter().enumerate() {
|
|||
|
if item == b' ' {
|
|||
|
return &s[0..i];
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
&s[..]
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
We get the index for the end of the word in the same way as we did in Listing
|
|||
|
4-7, by looking for the first occurrence of a space. When we find a space, we
|
|||
|
return a string slice using the start of the string and the index of the space
|
|||
|
as the starting and ending indices.
|
|||
|
|
|||
|
Now when we call `first_word`, we get back a single value that is tied to the
|
|||
|
underlying data. The value is made up of a reference to the starting point of
|
|||
|
the slice and the number of elements in the slice.
|
|||
|
|
|||
|
Returning a slice would also work for a `second_word` function:
|
|||
|
|
|||
|
```rust,ignore
|
|||
|
fn second_word(s: &String) -> &str {
|
|||
|
```
|
|||
|
|
|||
|
We now have a straightforward API that’s much harder to mess up, because the
|
|||
|
compiler will ensure the references into the `String` remain valid. Remember
|
|||
|
the bug in the program in Listing 4-8, when we got the index to the end of the
|
|||
|
first word but then cleared the string so our index was invalid? That code was
|
|||
|
logically incorrect but didn’t show any immediate errors. The problems would
|
|||
|
show up later if we kept trying to use the first word index with an emptied
|
|||
|
string. Slices make this bug impossible and let us know we have a problem with
|
|||
|
our code much sooner. Using the slice version of `first_word` will throw a
|
|||
|
compile time error:
|
|||
|
|
|||
|
<span class="filename">Filename: src/main.rs</span>
|
|||
|
|
|||
|
```rust,ignore,does_not_compile
|
|||
|
fn main() {
|
|||
|
let mut s = String::from("hello world");
|
|||
|
|
|||
|
let word = first_word(&s);
|
|||
|
|
|||
|
s.clear(); // Error!
|
|||
|
|
|||
|
println!("the first word is: {}", word);
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
Here’s the compiler error:
|
|||
|
|
|||
|
```text
|
|||
|
error[E0502]: cannot borrow `s` as mutable because it is also borrowed as immutable
|
|||
|
--> src/main.rs:10:5
|
|||
|
|
|
|||
|
8 | let word = first_word(&s);
|
|||
|
| -- immutable borrow occurs here
|
|||
|
9 |
|
|||
|
10 | s.clear(); // Error!
|
|||
|
| ^^^^^^^^^ mutable borrow occurs here
|
|||
|
11 |
|
|||
|
12 | println!("the first word is: {}", word);
|
|||
|
| ---- borrow later used here
|
|||
|
```
|
|||
|
|
|||
|
Recall from the borrowing rules that if we have an immutable reference to
|
|||
|
something, we cannot also take a mutable reference. Because `clear` needs to
|
|||
|
truncate the `String`, it tries to take a mutable reference, which fails. Not
|
|||
|
only has Rust made our API easier to use, but it has also eliminated an entire
|
|||
|
class of errors at compile time!
|
|||
|
|
|||
|
#### String Literals Are Slices
|
|||
|
|
|||
|
Recall that we talked about string literals being stored inside the binary. Now
|
|||
|
that we know about slices, we can properly understand string literals:
|
|||
|
|
|||
|
```rust
|
|||
|
let s = "Hello, world!";
|
|||
|
```
|
|||
|
|
|||
|
The type of `s` here is `&str`: it’s a slice pointing to that specific point of
|
|||
|
the binary. This is also why string literals are immutable; `&str` is an
|
|||
|
immutable reference.
|
|||
|
|
|||
|
#### String Slices as Parameters
|
|||
|
|
|||
|
Knowing that you can take slices of literals and `String`s leads us to one more
|
|||
|
improvement on `first_word`, and that’s its signature:
|
|||
|
|
|||
|
```rust,ignore
|
|||
|
fn first_word(s: &String) -> &str {
|
|||
|
```
|
|||
|
|
|||
|
A more experienced Rustacean would write the following line instead because it
|
|||
|
allows us to use the same function on both `String`s and `&str`s:
|
|||
|
|
|||
|
```rust,ignore
|
|||
|
fn first_word(s: &str) -> &str {
|
|||
|
```
|
|||
|
|
|||
|
If we have a string slice, we can pass that directly. If we have a `String`, we
|
|||
|
can pass a slice of the entire `String`. Defining a function to take a string
|
|||
|
slice instead of a reference to a `String` makes our API more general and useful
|
|||
|
without losing any functionality:
|
|||
|
|
|||
|
<span class="filename">Filename: src/main.rs</span>
|
|||
|
|
|||
|
```rust
|
|||
|
# fn first_word(s: &str) -> &str {
|
|||
|
# let bytes = s.as_bytes();
|
|||
|
#
|
|||
|
# for (i, &item) in bytes.iter().enumerate() {
|
|||
|
# if item == b' ' {
|
|||
|
# return &s[0..i];
|
|||
|
# }
|
|||
|
# }
|
|||
|
#
|
|||
|
# &s[..]
|
|||
|
# }
|
|||
|
fn main() {
|
|||
|
let my_string = String::from("hello world");
|
|||
|
|
|||
|
// first_word works on slices of `String`s
|
|||
|
let word = first_word(&my_string[..]);
|
|||
|
|
|||
|
let my_string_literal = "hello world";
|
|||
|
|
|||
|
// first_word works on slices of string literals
|
|||
|
let word = first_word(&my_string_literal[..]);
|
|||
|
|
|||
|
// Because string literals *are* string slices already,
|
|||
|
// this works too, without the slice syntax!
|
|||
|
let word = first_word(my_string_literal);
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
### Other Slices
|
|||
|
|
|||
|
String slices, as you might imagine, are specific to strings. But there’s a
|
|||
|
more general slice type, too. Consider this array:
|
|||
|
|
|||
|
```rust
|
|||
|
let a = [1, 2, 3, 4, 5];
|
|||
|
```
|
|||
|
|
|||
|
Just as we might want to refer to a part of a string, we might want to refer
|
|||
|
to part of an array. We’d do so like this:
|
|||
|
|
|||
|
```rust
|
|||
|
let a = [1, 2, 3, 4, 5];
|
|||
|
|
|||
|
let slice = &a[1..3];
|
|||
|
```
|
|||
|
|
|||
|
This slice has the type `&[i32]`. It works the same way as string slices do, by
|
|||
|
storing a reference to the first element and a length. You’ll use this kind of
|
|||
|
slice for all sorts of other collections. We’ll discuss these collections in
|
|||
|
detail when we talk about vectors in Chapter 8.
|
|||
|
|
|||
|
## Summary
|
|||
|
|
|||
|
The concepts of ownership, borrowing, and slices ensure memory safety in Rust
|
|||
|
programs at compile time. The Rust language gives you control over your memory
|
|||
|
usage in the same way as other systems programming languages, but having the
|
|||
|
owner of data automatically clean up that data when the owner goes out of scope
|
|||
|
means you don’t have to write and debug extra code to get this control.
|
|||
|
|
|||
|
Ownership affects how lots of other parts of Rust work, so we’ll talk about
|
|||
|
these concepts further throughout the rest of the book. Let’s move on to
|
|||
|
Chapter 5 and look at grouping pieces of data together in a `struct`.
|