book/nostarch/chapter19.md

2297 lines
88 KiB
Markdown
Raw Permalink Normal View History

2022-03-05 02:24:35 +00:00
<!-- DO NOT EDIT THIS FILE.
This file is periodically generated from the content in the `/src/`
directory, so all fixes need to be made in `/src/`.
-->
[TOC]
# Advanced Features
By now, youve learned the most commonly used parts of the Rust programming
2022-08-27 23:23:23 +00:00
language. Before we do one more project, in Chapter 20, well look at a few
2022-05-23 00:43:00 +00:00
aspects of the language you might run into every once in a while, but may not
use every day. You can use this chapter as a reference for when you encounter
any unknowns. The features covered here are useful in very specific situations.
Although you might not reach for them often, we want to make sure you have a
grasp of all the features Rust has to offer.
2022-03-05 02:24:35 +00:00
In this chapter, well cover:
* Unsafe Rust: how to opt out of some of Rusts guarantees and take
2022-08-27 23:23:23 +00:00
responsibility for manually upholding those guarantees
2022-03-05 02:24:35 +00:00
* Advanced traits: associated types, default type parameters, fully qualified
2022-08-27 23:23:23 +00:00
syntax, supertraits, and the newtype pattern in relation to traits
2022-03-05 02:24:35 +00:00
* Advanced types: more about the newtype pattern, type aliases, the never type,
2022-08-27 23:23:23 +00:00
and dynamically sized types
2022-03-05 02:24:35 +00:00
* Advanced functions and closures: function pointers and returning closures
* Macros: ways to define code that defines more code at compile time
2022-09-13 16:54:09 +00:00
2022-03-05 02:24:35 +00:00
Its a panoply of Rust features with something for everyone! Lets dive in!
## Unsafe Rust
All the code weve discussed so far has had Rusts memory safety guarantees
enforced at compile time. However, Rust has a second language hidden inside it
that doesnt enforce these memory safety guarantees: its called *unsafe Rust*
and works just like regular Rust, but gives us extra superpowers.
Unsafe Rust exists because, by nature, static analysis is conservative. When
the compiler tries to determine whether or not code upholds the guarantees,
2022-05-23 00:43:00 +00:00
its better for it to reject some valid programs than to accept some invalid
programs. Although the code *might* be okay, if the Rust compiler doesnt have
enough information to be confident, it will reject the code. In these cases,
you can use unsafe code to tell the compiler, “Trust me, I know what Im
doing.” Be warned, however, that you use unsafe Rust at your own risk: if you
use unsafe code incorrectly, problems can occur due to memory unsafety, such as
null pointer dereferencing.
2022-03-05 02:24:35 +00:00
Another reason Rust has an unsafe alter ego is that the underlying computer
hardware is inherently unsafe. If Rust didnt let you do unsafe operations, you
couldnt do certain tasks. Rust needs to allow you to do low-level systems
programming, such as directly interacting with the operating system or even
writing your own operating system. Working with low-level systems programming
is one of the goals of the language. Lets explore what we can do with unsafe
Rust and how to do it.
### Unsafe Superpowers
To switch to unsafe Rust, use the `unsafe` keyword and then start a new block
2022-05-23 00:43:00 +00:00
that holds the unsafe code. You can take five actions in unsafe Rust that you
cant in safe Rust, which we call *unsafe superpowers*. Those superpowers
include the ability to:
2022-03-05 02:24:35 +00:00
2022-08-29 19:25:55 +00:00
1. Dereference a raw pointer
1. Call an unsafe function or method
1. Access or modify a mutable static variable
1. Implement an unsafe trait
1. Access fields of `union`s
2022-09-13 17:38:46 +00:00
2022-03-05 02:24:35 +00:00
Its important to understand that `unsafe` doesnt turn off the borrow checker
2022-08-27 23:23:23 +00:00
or disable any of Rusts other safety checks: if you use a reference in unsafe
2022-03-05 02:24:35 +00:00
code, it will still be checked. The `unsafe` keyword only gives you access to
these five features that are then not checked by the compiler for memory
2022-08-27 23:23:23 +00:00
safety. Youll still get some degree of safety inside an unsafe block.
2022-03-05 02:24:35 +00:00
In addition, `unsafe` does not mean the code inside the block is necessarily
dangerous or that it will definitely have memory safety problems: the intent is
that as the programmer, youll ensure the code inside an `unsafe` block will
access memory in a valid way.
2022-08-27 23:23:23 +00:00
People are fallible and mistakes will happen, but by requiring these five
unsafe operations to be inside blocks annotated with `unsafe`, youll know that
2022-03-05 02:24:35 +00:00
any errors related to memory safety must be within an `unsafe` block. Keep
`unsafe` blocks small; youll be thankful later when you investigate memory
bugs.
2022-08-27 23:23:23 +00:00
To isolate unsafe code as much as possible, its best to enclose such code
2022-03-05 02:24:35 +00:00
within a safe abstraction and provide a safe API, which well discuss later in
the chapter when we examine unsafe functions and methods. Parts of the standard
library are implemented as safe abstractions over unsafe code that has been
audited. Wrapping unsafe code in a safe abstraction prevents uses of `unsafe`
from leaking out into all the places that you or your users might want to use
the functionality implemented with `unsafe` code, because using a safe
abstraction is safe.
Lets look at each of the five unsafe superpowers in turn. Well also look at
some abstractions that provide a safe interface to unsafe code.
### Dereferencing a Raw Pointer
2022-08-27 23:23:23 +00:00
In “Dangling References” on page XX, we mentioned that the compiler ensures
references are always valid. Unsafe Rust has two new types called *raw
pointers* that are similar to references. As with references, raw pointers can
be immutable or mutable and are written as `*const T` and `*mut T`,
respectively. The asterisk isnt the dereference operator; its part of the
2022-03-05 02:24:35 +00:00
type name. In the context of raw pointers, *immutable* means that the pointer
cant be directly assigned to after being dereferenced.
Different from references and smart pointers, raw pointers:
* Are allowed to ignore the borrowing rules by having both immutable and
2022-08-27 23:23:23 +00:00
mutable pointers or multiple mutable pointers to the same location
2022-03-05 02:24:35 +00:00
* Arent guaranteed to point to valid memory
* Are allowed to be null
* Dont implement any automatic cleanup
2022-09-13 16:54:09 +00:00
2022-03-05 02:24:35 +00:00
By opting out of having Rust enforce these guarantees, you can give up
guaranteed safety in exchange for greater performance or the ability to
interface with another language or hardware where Rusts guarantees dont apply.
Listing 19-1 shows how to create an immutable and a mutable raw pointer from
references.
```
let mut num = 5;
let r1 = &num as *const i32;
let r2 = &mut num as *mut i32;
```
2022-09-13 16:54:09 +00:00
Listing 19-1: Creating raw pointers from references
2022-03-05 02:24:35 +00:00
Notice that we dont include the `unsafe` keyword in this code. We can create
raw pointers in safe code; we just cant dereference raw pointers outside an
unsafe block, as youll see in a bit.
Weve created raw pointers by using `as` to cast an immutable and a mutable
reference into their corresponding raw pointer types. Because we created them
directly from references guaranteed to be valid, we know these particular raw
pointers are valid, but we cant make that assumption about just any raw
pointer.
2022-05-23 00:43:00 +00:00
To demonstrate this, next well create a raw pointer whose validity we cant be
so certain of. Listing 19-2 shows how to create a raw pointer to an arbitrary
location in memory. Trying to use arbitrary memory is undefined: there might be
data at that address or there might not, the compiler might optimize the code
so there is no memory access, or the program might terminate with a
segmentation fault. Usually, there is no good reason to write code like this,
but it is possible.
2022-03-05 02:24:35 +00:00
```
let address = 0x012345usize;
let r = address as *const i32;
```
2022-09-13 16:54:09 +00:00
Listing 19-2: Creating a raw pointer to an arbitrary memory address
2022-03-05 02:24:35 +00:00
Recall that we can create raw pointers in safe code, but we cant *dereference*
raw pointers and read the data being pointed to. In Listing 19-3, we use the
dereference operator `*` on a raw pointer that requires an `unsafe` block.
```
let mut num = 5;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
let r1 = &num as *const i32;
let r2 = &mut num as *mut i32;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
unsafe {
println!("r1 is: {}", *r1);
println!("r2 is: {}", *r2);
}
```
2022-09-13 16:54:09 +00:00
Listing 19-3: Dereferencing raw pointers within an `unsafe` block
2022-03-05 02:24:35 +00:00
Creating a pointer does no harm; its only when we try to access the value that
it points at that we might end up dealing with an invalid value.
2022-08-27 23:23:23 +00:00
Note also that in Listings 19-1 and 19-3, we created `*const i32` and `*mut
i32` raw pointers that both pointed to the same memory location, where `num` is
2022-03-05 02:24:35 +00:00
stored. If we instead tried to create an immutable and a mutable reference to
`num`, the code would not have compiled because Rusts ownership rules dont
allow a mutable reference at the same time as any immutable references. With
raw pointers, we can create a mutable pointer and an immutable pointer to the
same location and change data through the mutable pointer, potentially creating
a data race. Be careful!
With all of these dangers, why would you ever use raw pointers? One major use
case is when interfacing with C code, as youll see in “Calling an Unsafe
2022-08-27 23:23:23 +00:00
Function or Method” on page XX. Another case is when building up safe
2022-03-05 02:24:35 +00:00
abstractions that the borrow checker doesnt understand. Well introduce unsafe
functions and then look at an example of a safe abstraction that uses unsafe
code.
### Calling an Unsafe Function or Method
2022-05-23 00:43:00 +00:00
The second type of operation you can perform in an unsafe block is calling
unsafe functions. Unsafe functions and methods look exactly like regular
functions and methods, but they have an extra `unsafe` before the rest of the
definition. The `unsafe` keyword in this context indicates the function has
requirements we need to uphold when we call this function, because Rust cant
guarantee weve met these requirements. By calling an unsafe function within an
`unsafe` block, were saying that weve read this functions documentation and
2022-08-27 23:23:23 +00:00
we take responsibility for upholding the functions contracts.
2022-03-05 02:24:35 +00:00
Here is an unsafe function named `dangerous` that doesnt do anything in its
body:
```
unsafe fn dangerous() {}
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
unsafe {
dangerous();
}
```
We must call the `dangerous` function within a separate `unsafe` block. If we
try to call `dangerous` without the `unsafe` block, well get an error:
```
error[E0133]: call to unsafe function is unsafe and requires
unsafe function or block
2022-03-05 02:24:35 +00:00
--> src/main.rs:4:5
|
4 | dangerous();
| ^^^^^^^^^^^ call to unsafe function
|
= note: consult the function's documentation for information on
how to avoid undefined behavior
2022-03-05 02:24:35 +00:00
```
2022-05-23 00:43:00 +00:00
With the `unsafe` block, were asserting to Rust that weve read the functions
documentation, we understand how to use it properly, and weve verified that
were fulfilling the contract of the function.
2022-03-05 02:24:35 +00:00
Bodies of unsafe functions are effectively `unsafe` blocks, so to perform other
unsafe operations within an unsafe function, we dont need to add another
`unsafe` block.
#### Creating a Safe Abstraction over Unsafe Code
Just because a function contains unsafe code doesnt mean we need to mark the
entire function as unsafe. In fact, wrapping unsafe code in a safe function is
2022-05-23 00:43:00 +00:00
a common abstraction. As an example, lets study the `split_at_mut` function
from the standard library, which requires some unsafe code. Well explore how
we might implement it. This safe method is defined on mutable slices: it takes
one slice and makes it two by splitting the slice at the index given as an
2022-03-05 02:24:35 +00:00
argument. Listing 19-4 shows how to use `split_at_mut`.
```
let mut v = vec![1, 2, 3, 4, 5, 6];
let r = &mut v[..];
let (a, b) = r.split_at_mut(3);
assert_eq!(a, &mut [1, 2, 3]);
assert_eq!(b, &mut [4, 5, 6]);
```
2022-09-13 16:54:09 +00:00
Listing 19-4: Using the safe `split_at_mut` function
2022-03-05 02:24:35 +00:00
We cant implement this function using only safe Rust. An attempt might look
something like Listing 19-5, which wont compile. For simplicity, well
implement `split_at_mut` as a function rather than a method and only for slices
of `i32` values rather than for a generic type `T`.
```
fn split_at_mut(
values: &mut [i32],
mid: usize,
) -> (&mut [i32], &mut [i32]) {
2022-03-05 02:24:35 +00:00
let len = values.len();
assert!(mid <= len);
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
(&mut values[..mid], &mut values[mid..])
}
```
2022-09-13 16:54:09 +00:00
Listing 19-5: An attempted implementation of `split_at_mut` using only safe Rust
2022-03-05 02:24:35 +00:00
This function first gets the total length of the slice. Then it asserts that
the index given as a parameter is within the slice by checking whether its
less than or equal to the length. The assertion means that if we pass an index
that is greater than the length to split the slice at, the function will panic
before it attempts to use that index.
Then we return two mutable slices in a tuple: one from the start of the
original slice to the `mid` index and another from `mid` to the end of the
slice.
When we try to compile the code in Listing 19-5, well get an error:
```
error[E0499]: cannot borrow `*values` as mutable more than once at a time
--> src/main.rs:9:31
2022-03-05 02:24:35 +00:00
|
2 | values: &mut [i32],
| - let's call the lifetime of this reference `'1`
2022-03-05 02:24:35 +00:00
...
9 | (&mut values[..mid], &mut values[mid..])
2022-03-05 02:24:35 +00:00
| --------------------------^^^^^^--------
| | | |
| | | second mutable borrow occurs here
| | first mutable borrow occurs here
| returning this value requires that `*values` is borrowed for `'1`
```
Rusts borrow checker cant understand that were borrowing different parts of
the slice; it only knows that were borrowing from the same slice twice.
Borrowing different parts of a slice is fundamentally okay because the two
slices arent overlapping, but Rust isnt smart enough to know this. When we
know code is okay, but Rust doesnt, its time to reach for unsafe code.
Listing 19-6 shows how to use an `unsafe` block, a raw pointer, and some calls
to unsafe functions to make the implementation of `split_at_mut` work.
```
use std::slice;
2022-08-27 23:23:23 +00:00
fn split_at_mut(
values: &mut [i32],
mid: usize,
) -> (&mut [i32], &mut [i32]) {
2022-08-27 23:23:23 +00:00
1 let len = values.len();
2 let ptr = values.as_mut_ptr();
3 assert!(mid <= len);
2022-03-05 02:24:35 +00:00
2022-08-27 23:23:23 +00:00
4 unsafe {
2022-03-05 02:24:35 +00:00
(
2022-08-27 23:23:23 +00:00
5 slice::from_raw_parts_mut(ptr, mid),
6 slice::from_raw_parts_mut(ptr.add(mid), len - mid),
2022-03-05 02:24:35 +00:00
)
}
2022-08-27 23:23:23 +00:00
}
```
2022-03-05 02:24:35 +00:00
2022-09-13 16:54:09 +00:00
Listing 19-6: Using unsafe code in the implementation of the `split_at_mut`
function
2022-05-23 00:43:00 +00:00
2022-08-27 23:23:23 +00:00
Recall from “The Slice Type” on page XX that a slice is a pointer to some data
and the length of the slice. We use the `len` method to get the length of a
slice [1] and the `as_mut_ptr` method to access the raw pointer of a slice [2].
In this case, because we have a mutable slice to `i32` values, `as_mut_ptr`
returns a raw pointer with the type `*mut i32`, which weve stored in the
variable `ptr`.
2022-03-05 02:24:35 +00:00
We keep the assertion that the `mid` index is within the slice [3]. Then we get
to the unsafe code [4]: the `slice::from_raw_parts_mut` function takes a raw
2022-05-23 00:43:00 +00:00
pointer and a length, and it creates a slice. We use it to create a slice that
starts from `ptr` and is `mid` items long [5]. Then we call the `add` method on
`ptr` with `mid` as an argument to get a raw pointer that starts at `mid`, and
we create a slice using that pointer and the remaining number of items after
`mid` as the length [6].
2022-03-05 02:24:35 +00:00
The function `slice::from_raw_parts_mut` is unsafe because it takes a raw
pointer and must trust that this pointer is valid. The `add` method on raw
2022-08-27 23:23:23 +00:00
pointers is also unsafe because it must trust that the offset location is also
2022-03-05 02:24:35 +00:00
a valid pointer. Therefore, we had to put an `unsafe` block around our calls to
2022-08-27 23:23:23 +00:00
`slice::from_raw_parts_mut` and `add` so we could call them. By looking at the
code and by adding the assertion that `mid` must be less than or equal to
2022-03-05 02:24:35 +00:00
`len`, we can tell that all the raw pointers used within the `unsafe` block
will be valid pointers to data within the slice. This is an acceptable and
appropriate use of `unsafe`.
2022-08-27 23:23:23 +00:00
Note that we dont need to mark the resultant `split_at_mut` function as
2022-03-05 02:24:35 +00:00
`unsafe`, and we can call this function from safe Rust. Weve created a safe
abstraction to the unsafe code with an implementation of the function that uses
`unsafe` code in a safe way, because it creates only valid pointers from the
data this function has access to.
In contrast, the use of `slice::from_raw_parts_mut` in Listing 19-7 would
likely crash when the slice is used. This code takes an arbitrary memory
location and creates a slice 10,000 items long.
```
use std::slice;
let address = 0x01234usize;
let r = address as *mut i32;
2022-08-27 23:23:23 +00:00
let values: &[i32] = unsafe {
slice::from_raw_parts_mut(r, 10000)
};
2022-03-05 02:24:35 +00:00
```
2022-09-13 16:54:09 +00:00
Listing 19-7: Creating a slice from an arbitrary memory location
2022-03-05 02:24:35 +00:00
We dont own the memory at this arbitrary location, and there is no guarantee
that the slice this code creates contains valid `i32` values. Attempting to use
`values` as though its a valid slice results in undefined behavior.
2022-08-27 23:23:23 +00:00
#### Using extern Functions to Call External Code
2022-03-05 02:24:35 +00:00
2022-08-27 23:23:23 +00:00
Sometimes your Rust code might need to interact with code written in another
2022-05-23 00:43:00 +00:00
language. For this, Rust has the keyword `extern` that facilitates the creation
2022-08-29 19:25:55 +00:00
and use of a *Foreign Function Interface* *(FFI)*, which is a way for a
2022-03-05 02:24:35 +00:00
programming language to define functions and enable a different (foreign)
programming language to call those functions.
Listing 19-8 demonstrates how to set up an integration with the `abs` function
from the C standard library. Functions declared within `extern` blocks are
always unsafe to call from Rust code. The reason is that other languages dont
enforce Rusts rules and guarantees, and Rust cant check them, so
responsibility falls on the programmer to ensure safety.
Filename: src/main.rs
```
extern "C" {
fn abs(input: i32) -> i32;
}
fn main() {
unsafe {
println!(
"Absolute value of -3 according to C: {}",
abs(-3)
);
2022-09-13 16:54:09 +00:00
}
}
2022-08-27 23:23:23 +00:00
```
2022-09-13 16:54:09 +00:00
Listing 19-8: Declaring and calling an `extern` function defined in another
language
2022-03-05 02:24:35 +00:00
Within the `extern "C"` block, we list the names and signatures of external
functions from another language we want to call. The `"C"` part defines which
2022-08-27 23:23:23 +00:00
*application binary interface* *(ABI)* the external function uses: the ABI
2022-03-05 02:24:35 +00:00
defines how to call the function at the assembly level. The `"C"` ABI is the
most common and follows the C programming languages ABI.
2022-08-27 23:23:23 +00:00
> ### Calling Rust Functions from Other Languages
2022-09-13 16:54:09 +00:00
>
2022-03-05 02:24:35 +00:00
> We can also use `extern` to create an interface that allows other languages
2022-08-27 23:23:23 +00:00
to call Rust functions. Instead of creating a whole `extern` block, we add the
`extern` keyword and specify the ABI to use just before the `fn` keyword for
the relevant function. We also need to add a `#[no_mangle]` annotation to tell
the Rust compiler not to mangle the name of this function. *Mangling* is when a
compiler changes the name weve given a function to a different name that
contains more information for other parts of the compilation process to consume
but is less human readable. Every programming language compiler mangles names
slightly differently, so for a Rust function to be nameable by other languages,
we must disable the Rust compilers name mangling.
2022-09-13 16:54:09 +00:00
>
2022-03-05 02:24:35 +00:00
> In the following example, we make the `call_from_c` function accessible from
2022-08-27 23:23:23 +00:00
C code, after its compiled to a shared library and linked from C:
2022-09-13 16:54:09 +00:00
>
> ```
> #[no_mangle]
> pub extern "C" fn call_from_c() {
> println!("Just called a Rust function from C!");
> }
> ```
>
2022-03-05 02:24:35 +00:00
> This usage of `extern` does not require `unsafe`.
### Accessing or Modifying a Mutable Static Variable
2022-08-29 19:25:55 +00:00
In this book, weve not yet talked about global variables, which Rust does
2022-05-23 00:43:00 +00:00
support but can be problematic with Rusts ownership rules. If two threads are
2022-03-05 02:24:35 +00:00
accessing the same mutable global variable, it can cause a data race.
In Rust, global variables are called *static* variables. Listing 19-9 shows an
2022-08-27 23:23:23 +00:00
example declaration and use of a static variable with a string slice as a value.
2022-03-05 02:24:35 +00:00
Filename: src/main.rs
```
static HELLO_WORLD: &str = "Hello, world!";
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
fn main() {
println!("value is: {HELLO_WORLD}");
2022-03-05 02:24:35 +00:00
}
```
2022-09-13 16:54:09 +00:00
Listing 19-9: Defining and using an immutable static variable
2022-03-05 02:24:35 +00:00
Static variables are similar to constants, which we discussed in “Constants” on
page XX. The names of static variables are in `SCREAMING_SNAKE_CASE` by
convention. Static variables can only store references with the `'static`
lifetime, which means the Rust compiler can figure out the lifetime and we
arent required to annotate it explicitly. Accessing an immutable static
variable is safe.
2022-03-05 02:24:35 +00:00
2022-05-23 00:43:00 +00:00
A subtle difference between constants and immutable static variables is that
values in a static variable have a fixed address in memory. Using the value
will always access the same data. Constants, on the other hand, are allowed to
duplicate their data whenever theyre used. Another difference is that static
2022-03-05 02:24:35 +00:00
variables can be mutable. Accessing and modifying mutable static variables is
*unsafe*. Listing 19-10 shows how to declare, access, and modify a mutable
static variable named `COUNTER`.
Filename: src/main.rs
```
static mut COUNTER: u32 = 0;
2022-08-27 23:23:23 +00:00
fn add_to_count(inc: u32) {
unsafe {
COUNTER += inc;
}
2022-03-05 02:24:35 +00:00
}
fn main() {
add_to_count(3);
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
unsafe {
println!("COUNTER: {COUNTER}");
2022-03-05 02:24:35 +00:00
}
}
```
2022-09-13 16:54:09 +00:00
Listing 19-10: Reading from or writing to a mutable static variable is unsafe.
2022-03-05 02:24:35 +00:00
As with regular variables, we specify mutability using the `mut` keyword. Any
code that reads or writes from `COUNTER` must be within an `unsafe` block. This
code compiles and prints `COUNTER: 3` as we would expect because its single
threaded. Having multiple threads access `COUNTER` would likely result in data
races.
With mutable data that is globally accessible, its difficult to ensure there
are no data races, which is why Rust considers mutable static variables to be
unsafe. Where possible, its preferable to use the concurrency techniques and
thread-safe smart pointers we discussed in Chapter 16 so the compiler checks
2022-08-27 23:23:23 +00:00
that data access from different threads is done safely.
2022-03-05 02:24:35 +00:00
### Implementing an Unsafe Trait
2022-05-23 00:43:00 +00:00
We can use `unsafe` to implement an unsafe trait. A trait is unsafe when at
least one of its methods has some invariant that the compiler cant verify. We
declare that a trait is `unsafe` by adding the `unsafe` keyword before `trait`
and marking the implementation of the trait as `unsafe` too, as shown in
Listing 19-11.
2022-03-05 02:24:35 +00:00
```
unsafe trait Foo {
// methods go here
}
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
unsafe impl Foo for i32 {
// method implementations go here
}
2022-08-27 23:23:23 +00:00
```
2022-09-13 16:54:09 +00:00
Listing 19-11: Defining and implementing an unsafe trait
2022-03-05 02:24:35 +00:00
By using `unsafe impl`, were promising that well uphold the invariants that
the compiler cant verify.
2022-09-13 16:54:09 +00:00
As an example, recall the `Send` and `Sync` marker traits we discussed in
2022-08-27 23:23:23 +00:00
“Extensible Concurrency with the Send and Sync Traits” on page XX: the compiler
implements these traits automatically if our types are composed entirely of
`Send` and `Sync` types. If we implement a type that contains a type that is
not `Send` or `Sync`, such as raw pointers, and we want to mark that type as
`Send` or `Sync`, we must use `unsafe`. Rust cant verify that our type upholds
the guarantees that it can be safely sent across threads or accessed from
multiple threads; therefore, we need to do those checks manually and indicate
as such with `unsafe`.
2022-03-05 02:24:35 +00:00
### Accessing Fields of a Union
2022-08-29 19:25:55 +00:00
The final action that works only with `unsafe` is accessing fields of a union.
A `union` is similar to a `struct`, but only one declared field is used in a
particular instance at one time. Unions are primarily used to interface with
unions in C code. Accessing union fields is unsafe because Rust cant guarantee
the type of the data currently being stored in the union instance. You can
learn more about unions in the Rust Reference at
2022-08-27 23:23:23 +00:00
*https://doc.rust-lang.org/reference/items/unions.html**.*
2022-03-05 02:24:35 +00:00
### When to Use Unsafe Code
2022-05-23 00:43:00 +00:00
Using `unsafe` to use one of the five superpowers just discussed isnt wrong or
even frowned upon, but it is trickier to get `unsafe` code correct because the
compiler cant help uphold memory safety. When you have a reason to use
`unsafe` code, you can do so, and having the explicit `unsafe` annotation makes
it easier to track down the source of problems when they occur.
2022-03-05 02:24:35 +00:00
## Advanced Traits
2022-08-27 23:23:23 +00:00
We first covered traits in “Traits: Defining Shared Behavior” on page XX, but
we didnt discuss the more advanced details. Now that you know more about Rust,
we can get into the nitty-gritty.
2022-03-05 02:24:35 +00:00
2022-08-27 23:23:23 +00:00
### Associated Types
2022-03-05 02:24:35 +00:00
*Associated types* connect a type placeholder with a trait such that the trait
method definitions can use these placeholder types in their signatures. The
2022-05-23 00:43:00 +00:00
implementor of a trait will specify the concrete type to be used instead of the
placeholder type for the particular implementation. That way, we can define a
trait that uses some types without needing to know exactly what those types are
until the trait is implemented.
2022-03-05 02:24:35 +00:00
Weve described most of the advanced features in this chapter as being rarely
needed. Associated types are somewhere in the middle: theyre used more rarely
than features explained in the rest of the book but more commonly than many of
the other features discussed in this chapter.
One example of a trait with an associated type is the `Iterator` trait that the
standard library provides. The associated type is named `Item` and stands in
for the type of the values the type implementing the `Iterator` trait is
2022-05-23 00:43:00 +00:00
iterating over. The definition of the `Iterator` trait is as shown in Listing
19-12.
2022-03-05 02:24:35 +00:00
```
pub trait Iterator {
type Item;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
fn next(&mut self) -> Option<Self::Item>;
}
```
2022-09-13 16:54:09 +00:00
Listing 19-12: The definition of the `Iterator` trait that has an associated
type `Item`
2022-03-05 02:24:35 +00:00
2022-05-23 00:43:00 +00:00
The type `Item` is a placeholder, and the `next` methods definition shows that
it will return values of type `Option<Self::Item>`. Implementors of the
2022-03-05 02:24:35 +00:00
`Iterator` trait will specify the concrete type for `Item`, and the `next`
method will return an `Option` containing a value of that concrete type.
Associated types might seem like a similar concept to generics, in that the
latter allow us to define a function without specifying what types it can
2022-05-23 00:43:00 +00:00
handle. To examine the difference between the two concepts, well look at an
implementation of the `Iterator` trait on a type named `Counter` that specifies
the `Item` type is `u32`:
2022-03-05 02:24:35 +00:00
Filename: src/lib.rs
```
impl Iterator for Counter {
type Item = u32;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
fn next(&mut self) -> Option<Self::Item> {
2022-08-27 23:23:23 +00:00
--snip--
2022-03-05 02:24:35 +00:00
```
This syntax seems comparable to that of generics. So why not just define the
`Iterator` trait with generics, as shown in Listing 19-13?
```
pub trait Iterator<T> {
fn next(&mut self) -> Option<T>;
}
```
2022-09-13 16:54:09 +00:00
Listing 19-13: A hypothetical definition of the `Iterator` trait using generics
2022-03-05 02:24:35 +00:00
The difference is that when using generics, as in Listing 19-13, we must
annotate the types in each implementation; because we can also implement
2022-08-29 19:25:55 +00:00
`Iterator<``String``> for Counter` or any other type, we could have multiple
2022-03-05 02:24:35 +00:00
implementations of `Iterator` for `Counter`. In other words, when a trait has a
generic parameter, it can be implemented for a type multiple times, changing
the concrete types of the generic type parameters each time. When we use the
`next` method on `Counter`, we would have to provide type annotations to
indicate which implementation of `Iterator` we want to use.
With associated types, we dont need to annotate types because we cant
implement a trait on a type multiple times. In Listing 19-12 with the
2022-08-27 23:23:23 +00:00
definition that uses associated types, we can choose what the type of `Item`
will be only once because there can be only one `impl Iterator for Counter`. We
dont have to specify that we want an iterator of `u32` values everywhere we
call `next` on `Counter`.
2022-03-05 02:24:35 +00:00
2022-06-15 00:26:08 +00:00
Associated types also become part of the traits contract: implementors of the
trait must provide a type to stand in for the associated type placeholder.
Associated types often have a name that describes how the type will be used,
2022-08-27 23:23:23 +00:00
and documenting the associated type in the API documentation is a good practice.
2022-06-15 00:26:08 +00:00
2022-03-05 02:24:35 +00:00
### Default Generic Type Parameters and Operator Overloading
When we use generic type parameters, we can specify a default concrete type for
the generic type. This eliminates the need for implementors of the trait to
2022-05-23 00:43:00 +00:00
specify a concrete type if the default type works. You specify a default type
2022-09-13 16:54:09 +00:00
when declaring a generic type with the `<`PlaceholderType`=`ConcreteType`>`
2022-08-29 19:25:55 +00:00
syntax.
2022-03-05 02:24:35 +00:00
2022-05-23 00:43:00 +00:00
A great example of a situation where this technique is useful is with *operator
overloading*, in which you customize the behavior of an operator (such as `+`)
in particular situations.
2022-03-05 02:24:35 +00:00
Rust doesnt allow you to create your own operators or overload arbitrary
operators. But you can overload the operations and corresponding traits listed
in `std::ops` by implementing the traits associated with the operator. For
example, in Listing 19-14 we overload the `+` operator to add two `Point`
instances together. We do this by implementing the `Add` trait on a `Point`
2022-08-27 23:23:23 +00:00
struct.
2022-03-05 02:24:35 +00:00
Filename: src/main.rs
```
use std::ops::Add;
#[derive(Debug, Copy, Clone, PartialEq)]
struct Point {
x: i32,
y: i32,
}
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
impl Add for Point {
type Output = Point;
fn add(self, other: Point) -> Point {
Point {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
fn main() {
assert_eq!(
Point { x: 1, y: 0 } + Point { x: 2, y: 3 },
Point { x: 3, y: 3 }
);
}
```
2022-09-13 16:54:09 +00:00
Listing 19-14: Implementing the `Add` trait to overload the `+` operator for
`Point` instances
2022-03-05 02:24:35 +00:00
The `add` method adds the `x` values of two `Point` instances and the `y`
values of two `Point` instances to create a new `Point`. The `Add` trait has an
associated type named `Output` that determines the type returned from the `add`
method.
The default generic type in this code is within the `Add` trait. Here is its
definition:
```
trait Add<Rhs=Self> {
type Output;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
fn add(self, rhs: Rhs) -> Self::Output;
}
```
This code should look generally familiar: a trait with one method and an
associated type. The new part is `Rhs=Self`: this syntax is called *default
2022-08-27 23:23:23 +00:00
type parameters*. The `Rhs` generic type parameter (short for “right-hand
2022-03-05 02:24:35 +00:00
side”) defines the type of the `rhs` parameter in the `add` method. If we dont
specify a concrete type for `Rhs` when we implement the `Add` trait, the type
of `Rhs` will default to `Self`, which will be the type were implementing
`Add` on.
When we implemented `Add` for `Point`, we used the default for `Rhs` because we
wanted to add two `Point` instances. Lets look at an example of implementing
the `Add` trait where we want to customize the `Rhs` type rather than using the
default.
We have two structs, `Millimeters` and `Meters`, holding values in different
units. This thin wrapping of an existing type in another struct is known as the
2022-08-27 23:23:23 +00:00
*newtype pattern*, which we describe in more detail in “Using the Newtype
Pattern to Implement External Traits on External Types” on page XX. We want to
add values in millimeters to values in meters and have the implementation of
`Add` do the conversion correctly. We can implement `Add` for `Millimeters`
with `Meters` as the `Rhs`, as shown in Listing 19-15.
2022-03-05 02:24:35 +00:00
Filename: src/lib.rs
```
use std::ops::Add;
struct Millimeters(u32);
struct Meters(u32);
impl Add<Meters> for Millimeters {
type Output = Millimeters;
fn add(self, other: Meters) -> Millimeters {
Millimeters(self.0 + (other.0 * 1000))
}
}
```
2022-09-13 16:54:09 +00:00
Listing 19-15: Implementing the `Add` trait on `Millimeters` to add
`Millimeters` and `Meters`
2022-03-05 02:24:35 +00:00
To add `Millimeters` and `Meters`, we specify `impl Add<Meters>` to set the
value of the `Rhs` type parameter instead of using the default of `Self`.
Youll use default type parameters in two main ways:
2022-08-27 23:23:23 +00:00
1. To extend a type without breaking existing code
1. To allow customization in specific cases most users wont need
2022-09-13 17:38:46 +00:00
2022-03-05 02:24:35 +00:00
The standard librarys `Add` trait is an example of the second purpose:
usually, youll add two like types, but the `Add` trait provides the ability to
customize beyond that. Using a default type parameter in the `Add` trait
definition means you dont have to specify the extra parameter most of the
time. In other words, a bit of implementation boilerplate isnt needed, making
it easier to use the trait.
The first purpose is similar to the second but in reverse: if you want to add a
type parameter to an existing trait, you can give it a default to allow
extension of the functionality of the trait without breaking the existing
implementation code.
### Disambiguating Between Methods with the Same Name
2022-03-05 02:24:35 +00:00
Nothing in Rust prevents a trait from having a method with the same name as
another traits method, nor does Rust prevent you from implementing both traits
on one type. Its also possible to implement a method directly on the type with
the same name as methods from traits.
When calling methods with the same name, youll need to tell Rust which one you
want to use. Consider the code in Listing 19-16 where weve defined two traits,
`Pilot` and `Wizard`, that both have a method called `fly`. We then implement
both traits on a type `Human` that already has a method named `fly` implemented
on it. Each `fly` method does something different.
Filename: src/main.rs
```
trait Pilot {
fn fly(&self);
}
trait Wizard {
fn fly(&self);
2022-09-13 16:54:09 +00:00
}
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
struct Human;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
impl Pilot for Human {
fn fly(&self) {
println!("This is your captain speaking.");
}
}
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
impl Wizard for Human {
fn fly(&self) {
println!("Up!");
}
}
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
impl Human {
fn fly(&self) {
println!("*waving arms furiously*");
}
}
```
2022-09-13 16:54:09 +00:00
Listing 19-16: Two traits are defined to have a `fly` method and are
implemented on the `Human` type, and a `fly` method is implemented on `Human`
directly.
2022-03-05 02:24:35 +00:00
When we call `fly` on an instance of `Human`, the compiler defaults to calling
the method that is directly implemented on the type, as shown in Listing 19-17.
Filename: src/main.rs
```
fn main() {
let person = Human;
person.fly();
}
```
2022-09-13 16:54:09 +00:00
Listing 19-17: Calling `fly` on an instance of `Human`
2022-03-05 02:24:35 +00:00
Running this code will print `*waving arms furiously*`, showing that Rust
called the `fly` method implemented on `Human` directly.
To call the `fly` methods from either the `Pilot` trait or the `Wizard` trait,
we need to use more explicit syntax to specify which `fly` method we mean.
Listing 19-18 demonstrates this syntax.
Filename: src/main.rs
```
fn main() {
let person = Human;
Pilot::fly(&person);
Wizard::fly(&person);
person.fly();
}
```
2022-09-13 16:54:09 +00:00
Listing 19-18: Specifying which traits `fly` method we want to call
2022-03-05 02:24:35 +00:00
Specifying the trait name before the method name clarifies to Rust which
implementation of `fly` we want to call. We could also write
`Human::fly(&person)`, which is equivalent to the `person.fly()` that we used
in Listing 19-18, but this is a bit longer to write if we dont need to
disambiguate.
Running this code prints the following:
2022-08-27 23:23:23 +00:00
```
2022-03-05 02:24:35 +00:00
This is your captain speaking.
Up!
*waving arms furiously*
```
Because the `fly` method takes a `self` parameter, if we had two *types* that
both implement one *trait*, Rust could figure out which implementation of a
trait to use based on the type of `self`.
However, associated functions that are not methods dont have a `self`
parameter. When there are multiple types or traits that define non-method
2022-08-27 23:23:23 +00:00
functions with the same function name, Rust doesnt always know which type you
2022-08-29 19:25:55 +00:00
mean unless you use fully qualified syntax. For example, in Listing 19-19 we
2022-08-27 23:23:23 +00:00
create a trait for an animal shelter that wants to name all baby dogs Spot. We
make an `Animal` trait with an associated non-method function `baby_name`. The
`Animal` trait is implemented for the struct `Dog`, on which we also provide an
associated non-method function `baby_name` directly.
2022-03-05 02:24:35 +00:00
Filename: src/main.rs
```
trait Animal {
fn baby_name() -> String;
}
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
struct Dog;
impl Dog {
fn baby_name() -> String {
String::from("Spot")
}
}
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
impl Animal for Dog {
fn baby_name() -> String {
String::from("puppy")
}
}
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
fn main() {
println!("A baby dog is called a {}", Dog::baby_name());
}
```
2022-09-13 16:54:09 +00:00
Listing 19-19: A trait with an associated function and a type with an
associated function of the same name that also implements the trait
2022-03-05 02:24:35 +00:00
2022-05-23 00:43:00 +00:00
We implement the code for naming all puppies Spot in the `baby_name` associated
function that is defined on `Dog`. The `Dog` type also implements the trait
`Animal`, which describes characteristics that all animals have. Baby dogs are
called puppies, and that is expressed in the implementation of the `Animal`
trait on `Dog` in the `baby_name` function associated with the `Animal` trait.
2022-03-05 02:24:35 +00:00
In `main`, we call the `Dog::baby_name` function, which calls the associated
function defined on `Dog` directly. This code prints the following:
```
A baby dog is called a Spot
```
This output isnt what we wanted. We want to call the `baby_name` function that
is part of the `Animal` trait that we implemented on `Dog` so the code prints
`A baby dog is called a puppy`. The technique of specifying the trait name that
we used in Listing 19-18 doesnt help here; if we change `main` to the code in
Listing 19-20, well get a compilation error.
Filename: src/main.rs
```
fn main() {
println!("A baby dog is called a {}", Animal::baby_name());
}
```
2022-09-13 16:54:09 +00:00
Listing 19-20: Attempting to call the `baby_name` function from the `Animal`
trait, but Rust doesnt know which implementation to use
2022-03-05 02:24:35 +00:00
Because `Animal::baby_name` doesnt have a `self` parameter, and there could be
other types that implement the `Animal` trait, Rust cant figure out which
implementation of `Animal::baby_name` we want. Well get this compiler error:
```
2022-08-27 23:23:23 +00:00
error[E0283]: type annotations needed
--> src/main.rs:20:43
2022-03-05 02:24:35 +00:00
|
20 | println!("A baby dog is called a {}", Animal::baby_name());
2022-08-27 23:23:23 +00:00
| ^^^^^^^^^^^^^^^^^ cannot infer
type
2022-03-05 02:24:35 +00:00
|
= note: cannot satisfy `_: Animal`
```
To disambiguate and tell Rust that we want to use the implementation of
`Animal` for `Dog` as opposed to the implementation of `Animal` for some other
type, we need to use fully qualified syntax. Listing 19-21 demonstrates how to
use fully qualified syntax.
Filename: src/main.rs
```
fn main() {
println!(
"A baby dog is called a {}",
<Dog as Animal>::baby_name()
);
2022-03-05 02:24:35 +00:00
}
```
2022-09-13 16:54:09 +00:00
Listing 19-21: Using fully qualified syntax to specify that we want to call the
`baby_name` function from the `Animal` trait as implemented on `Dog`
2022-03-05 02:24:35 +00:00
Were providing Rust with a type annotation within the angle brackets, which
indicates we want to call the `baby_name` method from the `Animal` trait as
implemented on `Dog` by saying that we want to treat the `Dog` type as an
`Animal` for this function call. This code will now print what we want:
```
A baby dog is called a puppy
```
In general, fully qualified syntax is defined as follows:
```
<Type as Trait>::function(receiver_if_method, next_arg, ...);
```
For associated functions that arent methods, there would not be a `receiver`:
there would only be the list of other arguments. You could use fully qualified
syntax everywhere that you call functions or methods. However, youre allowed
to omit any part of this syntax that Rust can figure out from other information
in the program. You only need to use this more verbose syntax in cases where
there are multiple implementations that use the same name and Rust needs help
to identify which implementation you want to call.
2022-08-27 23:23:23 +00:00
### Using Supertraits
2022-03-05 02:24:35 +00:00
2022-08-27 23:23:23 +00:00
Sometimes you might write a trait definition that depends on another trait: for
a type to implement the first trait, you want to require that type to also
2022-05-23 00:43:00 +00:00
implement the second trait. You would do this so that your trait definition can
make use of the associated items of the second trait. The trait your trait
definition is relying on is called a *supertrait* of your trait.
2022-03-05 02:24:35 +00:00
For example, lets say we want to make an `OutlinePrint` trait with an
2022-08-27 23:23:23 +00:00
`outline_print` method that will print a given value formatted so that its
2022-06-15 00:07:51 +00:00
framed in asterisks. That is, given a `Point` struct that implements the
2022-08-27 23:19:24 +00:00
standard library trait `Display` to result in `(x, y)`, when we call
`outline_print` on a `Point` instance that has `1` for `x` and `3` for `y`, it
should print the following:
2022-03-05 02:24:35 +00:00
```
**********
* *
* (1, 3) *
* *
**********
```
2022-05-23 00:43:00 +00:00
In the implementation of the `outline_print` method, we want to use the
`Display` traits functionality. Therefore, we need to specify that the
`OutlinePrint` trait will work only for types that also implement `Display` and
provide the functionality that `OutlinePrint` needs. We can do that in the
trait definition by specifying `OutlinePrint: Display`. This technique is
similar to adding a trait bound to the trait. Listing 19-22 shows an
implementation of the `OutlinePrint` trait.
2022-03-05 02:24:35 +00:00
Filename: src/main.rs
```
use std::fmt;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
trait OutlinePrint: fmt::Display {
fn outline_print(&self) {
let output = self.to_string();
let len = output.len();
println!("{}", "*".repeat(len + 4));
println!("*{}*", " ".repeat(len + 2));
println!("* {} *", output);
println!("*{}*", " ".repeat(len + 2));
println!("{}", "*".repeat(len + 4));
}
}
```
2022-09-13 16:54:09 +00:00
Listing 19-22: Implementing the `OutlinePrint` trait that requires the
functionality from `Display`
2022-03-05 02:24:35 +00:00
Because weve specified that `OutlinePrint` requires the `Display` trait, we
can use the `to_string` function that is automatically implemented for any type
that implements `Display`. If we tried to use `to_string` without adding a
colon and specifying the `Display` trait after the trait name, wed get an
error saying that no method named `to_string` was found for the type `&Self` in
the current scope.
Lets see what happens when we try to implement `OutlinePrint` on a type that
doesnt implement `Display`, such as the `Point` struct:
Filename: src/main.rs
```
struct Point {
x: i32,
y: i32,
}
impl OutlinePrint for Point {}
```
We get an error saying that `Display` is required but not implemented:
```
error[E0277]: `Point` doesn't implement `std::fmt::Display`
--> src/main.rs:20:6
|
20 | impl OutlinePrint for Point {}
| ^^^^^^^^^^^^ `Point` cannot be formatted with the default formatter
|
= help: the trait `std::fmt::Display` is not implemented for `Point`
2022-08-27 23:23:23 +00:00
= note: in format strings you may be able to use `{:?}` (or {:#?} for
pretty-print) instead
2022-03-05 02:24:35 +00:00
note: required by a bound in `OutlinePrint`
--> src/main.rs:3:21
|
3 | trait OutlinePrint: fmt::Display {
| ^^^^^^^^^^^^ required by this bound in `OutlinePrint`
```
To fix this, we implement `Display` on `Point` and satisfy the constraint that
`OutlinePrint` requires, like so:
Filename: src/main.rs
```
use std::fmt;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
impl fmt::Display for Point {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "({}, {})", self.x, self.y)
}
}
```
2022-08-27 23:23:23 +00:00
Then, implementing the `OutlinePrint` trait on `Point` will compile
2022-03-05 02:24:35 +00:00
successfully, and we can call `outline_print` on a `Point` instance to display
it within an outline of asterisks.
2022-08-27 23:23:23 +00:00
### Using the Newtype Pattern to Implement External Traits
2022-03-05 02:24:35 +00:00
2022-08-27 23:23:23 +00:00
In “Implementing a Trait on a Type” on page XX, we mentioned the orphan rule
that states were only allowed to implement a trait on a type if either the
trait or the type, or both, are local to our crate. Its possible to get around
this restriction using the *newtype pattern*, which involves creating a new
type in a tuple struct. (We covered tuple structs in “Using Tuple Structs
Without Named Fields to Create Different Types” on page XX.) The tuple struct
will have one field and be a thin wrapper around the type for which we want to
implement a trait. Then the wrapper type is local to our crate, and we can
implement the trait on the wrapper. *Newtype* is a term that originates from
the Haskell programming language. There is no runtime performance penalty for
using this pattern, and the wrapper type is elided at compile time.
2022-03-05 02:24:35 +00:00
As an example, lets say we want to implement `Display` on `Vec<T>`, which the
orphan rule prevents us from doing directly because the `Display` trait and the
`Vec<T>` type are defined outside our crate. We can make a `Wrapper` struct
that holds an instance of `Vec<T>`; then we can implement `Display` on
`Wrapper` and use the `Vec<T>` value, as shown in Listing 19-23.
Filename: src/main.rs
```
use std::fmt;
struct Wrapper(Vec<String>);
impl fmt::Display for Wrapper {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "[{}]", self.0.join(", "))
}
}
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
fn main() {
let w = Wrapper(vec![
String::from("hello"),
String::from("world"),
]);
println!("w = {w}");
2022-03-05 02:24:35 +00:00
}
```
2022-09-13 16:54:09 +00:00
Listing 19-23: Creating a `Wrapper` type around `Vec<String>` to implement
`Display`
2022-03-05 02:24:35 +00:00
2022-08-27 23:23:23 +00:00
The implementation of `Display` uses `self.0` to access the inner `Vec<T>`
2022-03-05 02:24:35 +00:00
because `Wrapper` is a tuple struct and `Vec<T>` is the item at index 0 in the
tuple. Then we can use the functionality of the `Display` type on `Wrapper`.
The downside of using this technique is that `Wrapper` is a new type, so it
doesnt have the methods of the value its holding. We would have to implement
all the methods of `Vec<T>` directly on `Wrapper` such that the methods
delegate to `self.0`, which would allow us to treat `Wrapper` exactly like a
`Vec<T>`. If we wanted the new type to have every method the inner type has,
2022-08-29 19:25:55 +00:00
implementing the `Deref` trait on the `Wrapper` to return the inner type would
be a solution (we discussed implementing the `Deref` trait in “Treating Smart
Pointers Like Regular References with Deref” on page XX). If we didnt want the
`Wrapper` type to have all the methods of the inner type—for example, to
restrict the `Wrapper` types behavior—we would have to implement just the
methods we do want manually.
2022-03-05 02:24:35 +00:00
2022-05-23 00:43:00 +00:00
This newtype pattern is also useful even when traits are not involved. Lets
switch focus and look at some advanced ways to interact with Rusts type system.
2022-03-05 02:24:35 +00:00
## Advanced Types
2022-05-23 00:43:00 +00:00
The Rust type system has some features that weve so far mentioned but havent
yet discussed. Well start by discussing newtypes in general as we examine why
newtypes are useful as types. Then well move on to type aliases, a feature
similar to newtypes but with slightly different semantics. Well also discuss
the `!` type and dynamically sized types.
2022-03-05 02:24:35 +00:00
### Using the Newtype Pattern for Type Safety and Abstraction
2022-09-13 16:54:09 +00:00
> Note: This section assumes youve read the earlier section “Using the Newtype
2022-08-27 23:23:23 +00:00
Pattern to Implement External Traits” on page XX.
2022-03-05 02:24:35 +00:00
2022-05-23 00:43:00 +00:00
The newtype pattern is also useful for tasks beyond those weve discussed so
far, including statically enforcing that values are never confused and
indicating the units of a value. You saw an example of using newtypes to
indicate units in Listing 19-15: recall that the `Millimeters` and `Meters`
structs wrapped `u32` values in a newtype. If we wrote a function with a
2022-08-29 19:25:55 +00:00
parameter of type `Millimeters`, we wouldnt be able to compile a program that
2022-05-23 00:43:00 +00:00
accidentally tried to call that function with a value of type `Meters` or a
plain `u32`.
2022-03-05 02:24:35 +00:00
2022-05-23 00:43:00 +00:00
We can also use the newtype pattern to abstract away some implementation
2022-03-05 02:24:35 +00:00
details of a type: the new type can expose a public API that is different from
the API of the private inner type.
Newtypes can also hide internal implementation. For example, we could provide a
`People` type to wrap a `HashMap<i32, String>` that stores a persons ID
associated with their name. Code using `People` would only interact with the
public API we provide, such as a method to add a name string to the `People`
collection; that code wouldnt need to know that we assign an `i32` ID to names
internally. The newtype pattern is a lightweight way to achieve encapsulation
2022-08-27 23:23:23 +00:00
to hide implementation details, which we discussed in “Encapsulation That Hides
Implementation Details” on page XX.
2022-03-05 02:24:35 +00:00
### Creating Type Synonyms with Type Aliases
2022-05-23 00:43:00 +00:00
Rust provides the ability to declare a *type alias* to give an existing type
another name. For this we use the `type` keyword. For example, we can create
the alias `Kilometers` to `i32` like so:
2022-03-05 02:24:35 +00:00
```
type Kilometers = i32;
```
2022-08-27 23:23:23 +00:00
Now the alias `Kilometers` is a *synonym* for `i32`; unlike the `Millimeters`
2022-03-05 02:24:35 +00:00
and `Meters` types we created in Listing 19-15, `Kilometers` is not a separate,
new type. Values that have the type `Kilometers` will be treated the same as
values of type `i32`:
```
type Kilometers = i32;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
let x: i32 = 5;
let y: Kilometers = 5;
println!("x + y = {}", x + y);
```
Because `Kilometers` and `i32` are the same type, we can add values of both
types and we can pass `Kilometers` values to functions that take `i32`
2022-08-27 23:23:23 +00:00
parameters. However, using this method, we dont get the type-checking benefits
2022-06-15 00:26:08 +00:00
that we get from the newtype pattern discussed earlier. In other words, if we
mix up `Kilometers` and `i32` values somewhere, the compiler will not give us
an error.
2022-03-05 02:24:35 +00:00
The main use case for type synonyms is to reduce repetition. For example, we
might have a lengthy type like this:
```
Box<dyn Fn() + Send + 'static>
```
Writing this lengthy type in function signatures and as type annotations all
over the code can be tiresome and error prone. Imagine having a project full of
code like that in Listing 19-24.
```
let f: Box<dyn Fn() + Send + 'static> = Box::new(|| {
println!("hi");
});
2022-03-05 02:24:35 +00:00
fn takes_long_type(f: Box<dyn Fn() + Send + 'static>) {
2022-08-27 23:23:23 +00:00
--snip--
2022-03-05 02:24:35 +00:00
}
fn returns_long_type() -> Box<dyn Fn() + Send + 'static> {
2022-08-27 23:23:23 +00:00
--snip--
2022-03-05 02:24:35 +00:00
}
```
2022-09-13 16:54:09 +00:00
Listing 19-24: Using a long type in many places
2022-03-05 02:24:35 +00:00
A type alias makes this code more manageable by reducing the repetition. In
Listing 19-25, weve introduced an alias named `Thunk` for the verbose type and
can replace all uses of the type with the shorter alias `Thunk`.
```
type Thunk = Box<dyn Fn() + Send + 'static>;
let f: Thunk = Box::new(|| println!("hi"));
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
fn takes_long_type(f: Thunk) {
2022-08-27 23:23:23 +00:00
--snip--
2022-03-05 02:24:35 +00:00
}
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
fn returns_long_type() -> Thunk {
2022-08-27 23:23:23 +00:00
--snip--
2022-03-05 02:24:35 +00:00
}
```
2022-09-13 16:54:09 +00:00
Listing 19-25: Introducing a type alias `Thunk` to reduce repetition
2022-03-05 02:24:35 +00:00
This code is much easier to read and write! Choosing a meaningful name for a
type alias can help communicate your intent as well (*thunk* is a word for code
to be evaluated at a later time, so its an appropriate name for a closure that
gets stored).
Type aliases are also commonly used with the `Result<T, E>` type for reducing
repetition. Consider the `std::io` module in the standard library. I/O
operations often return a `Result<T, E>` to handle situations when operations
fail to work. This library has a `std::io::Error` struct that represents all
possible I/O errors. Many of the functions in `std::io` will be returning
`Result<T, E>` where the `E` is `std::io::Error`, such as these functions in
the `Write` trait:
```
2022-08-27 23:23:23 +00:00
use std::fmt;
use std::io::Error;
2022-03-05 02:24:35 +00:00
pub trait Write {
fn write(&mut self, buf: &[u8]) -> Result<usize, Error>;
fn flush(&mut self) -> Result<(), Error>;
fn write_all(&mut self, buf: &[u8]) -> Result<(), Error>;
fn write_fmt(
&mut self,
fmt: fmt::Arguments,
) -> Result<(), Error>;
2022-03-05 02:24:35 +00:00
}
```
The `Result<..., Error>` is repeated a lot. As such, `std::io` has this type
alias declaration:
```
type Result<T> = std::result::Result<T, std::io::Error>;
```
Because this declaration is in the `std::io` module, we can use the fully
2022-05-23 00:43:00 +00:00
qualified alias `std::io::Result<T>`; that is, a `Result<T, E>` with the `E`
2022-03-05 02:24:35 +00:00
filled in as `std::io::Error`. The `Write` trait function signatures end up
looking like this:
```
pub trait Write {
fn write(&mut self, buf: &[u8]) -> Result<usize>;
fn flush(&mut self) -> Result<()>;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
fn write_all(&mut self, buf: &[u8]) -> Result<()>;
fn write_fmt(&mut self, fmt: fmt::Arguments) -> Result<()>;
}
```
The type alias helps in two ways: it makes code easier to write *and* it gives
us a consistent interface across all of `std::io`. Because its an alias, its
just another `Result<T, E>`, which means we can use any methods that work on
`Result<T, E>` with it, as well as special syntax like the `?` operator.
2022-08-27 23:23:23 +00:00
### The Never Type That Never Returns
2022-03-05 02:24:35 +00:00
Rust has a special type named `!` thats known in type theory lingo as the
*empty type* because it has no values. We prefer to call it the *never type*
because it stands in the place of the return type when a function will never
return. Here is an example:
```
2022-09-13 16:54:09 +00:00
fn bar() -> ! {
2022-08-27 23:23:23 +00:00
--snip--
2022-03-05 02:24:35 +00:00
}
```
This code is read as “the function `bar` returns never.” Functions that return
2022-08-27 23:23:23 +00:00
never are called *diverging functions*. We cant create values of the type `!`,
2022-03-05 02:24:35 +00:00
so `bar` can never possibly return.
But what use is a type you can never create values for? Recall the code from
2022-08-27 23:23:23 +00:00
Listing 2-5, part of the number-guessing game; weve reproduced a bit of it
2022-05-23 00:43:00 +00:00
here in Listing 19-26.
2022-03-05 02:24:35 +00:00
```
let guess: u32 = match guess.trim().parse() {
Ok(num) => num,
Err(_) => continue,
};
```
2022-09-13 16:54:09 +00:00
Listing 19-26: A `match` with an arm that ends in `continue`
2022-03-05 02:24:35 +00:00
2022-08-27 23:23:23 +00:00
At the time, we skipped over some details in this code. In “The match Control
Flow Construct” on page XX, we discussed that `match` arms must all return the
same type. So, for example, the following code doesnt work:
2022-03-05 02:24:35 +00:00
```
let guess = match guess.trim().parse() {
Ok(_) => 5,
Err(_) => "hello",
};
```
The type of `guess` in this code would have to be an integer *and* a string,
and Rust requires that `guess` have only one type. So what does `continue`
return? How were we allowed to return a `u32` from one arm and have another arm
that ends with `continue` in Listing 19-26?
As you might have guessed, `continue` has a `!` value. That is, when Rust
computes the type of `guess`, it looks at both match arms, the former with a
value of `u32` and the latter with a `!` value. Because `!` can never have a
value, Rust decides that the type of `guess` is `u32`.
The formal way of describing this behavior is that expressions of type `!` can
be coerced into any other type. Were allowed to end this `match` arm with
`continue` because `continue` doesnt return a value; instead, it moves control
back to the top of the loop, so in the `Err` case, we never assign a value to
`guess`.
2022-05-23 00:43:00 +00:00
The never type is useful with the `panic!` macro as well. Recall the `unwrap`
function that we call on `Option<T>` values to produce a value or panic with
this definition:
2022-03-05 02:24:35 +00:00
```
impl<T> Option<T> {
pub fn unwrap(self) -> T {
match self {
Some(val) => val,
None => panic!(
"called `Option::unwrap()` on a `None` value"
),
2022-03-05 02:24:35 +00:00
}
}
}
```
In this code, the same thing happens as in the `match` in Listing 19-26: Rust
sees that `val` has the type `T` and `panic!` has the type `!`, so the result
of the overall `match` expression is `T`. This code works because `panic!`
doesnt produce a value; it ends the program. In the `None` case, we wont be
returning a value from `unwrap`, so this code is valid.
One final expression that has the type `!` is a `loop`:
```
print!("forever ");
loop {
print!("and ever ");
}
```
Here, the loop never ends, so `!` is the value of the expression. However, this
wouldnt be true if we included a `break`, because the loop would terminate
when it got to the `break`.
2022-08-27 23:23:23 +00:00
### Dynamically Sized Types and the Sized Trait
2022-03-05 02:24:35 +00:00
2022-05-23 00:43:00 +00:00
Rust needs to know certain details about its types, such as how much space to
allocate for a value of a particular type. This leaves one corner of its type
system a little confusing at first: the concept of *dynamically sized types*.
Sometimes referred to as *DSTs* or *unsized types*, these types let us write
code using values whose size we can know only at runtime.
2022-03-05 02:24:35 +00:00
Lets dig into the details of a dynamically sized type called `str`, which
weve been using throughout the book. Thats right, not `&str`, but `str` on
its own, is a DST. We cant know how long the string is until runtime, meaning
we cant create a variable of type `str`, nor can we take an argument of type
`str`. Consider the following code, which does not work:
```
let s1: str = "Hello there!";
let s2: str = "How's it going?";
```
Rust needs to know how much memory to allocate for any value of a particular
type, and all values of a type must use the same amount of memory. If Rust
allowed us to write this code, these two `str` values would need to take up the
same amount of space. But they have different lengths: `s1` needs 12 bytes of
storage and `s2` needs 15. This is why its not possible to create a variable
holding a dynamically sized type.
So what do we do? In this case, you already know the answer: we make the types
2022-08-27 23:23:23 +00:00
of `s1` and `s2` a `&str` rather than a `str`. Recall from “String Slices” on
page XX that the slice data structure just stores the starting position and the
length of the slice. So, although a `&T` is a single value that stores the
memory address of where the `T` is located, a `&str` is *two* values: the
address of the `str` and its length. As such, we can know the size of a `&str`
value at compile time: its twice the length of a `usize`. That is, we always
know the size of a `&str`, no matter how long the string it refers to is. In
general, this is the way in which dynamically sized types are used in Rust:
they have an extra bit of metadata that stores the size of the dynamic
2022-05-23 00:43:00 +00:00
information. The golden rule of dynamically sized types is that we must always
put values of dynamically sized types behind a pointer of some kind.
2022-03-05 02:24:35 +00:00
We can combine `str` with all kinds of pointers: for example, `Box<str>` or
`Rc<str>`. In fact, youve seen this before but with a different dynamically
sized type: traits. Every trait is a dynamically sized type we can refer to by
2022-08-27 23:23:23 +00:00
using the name of the trait. In “Using Trait Objects That Allow for Values of
Different Types” on page XX, we mentioned that to use traits as trait objects,
we must put them behind a pointer, such as `&dyn Trait` or `Box<dyn Trait>`
(`Rc<dyn Trait>` would work too).
2022-03-05 02:24:35 +00:00
2022-05-23 00:43:00 +00:00
To work with DSTs, Rust provides the `Sized` trait to determine whether or not
a types size is known at compile time. This trait is automatically implemented
for everything whose size is known at compile time. In addition, Rust
implicitly adds a bound on `Sized` to every generic function. That is, a
generic function definition like this:
2022-03-05 02:24:35 +00:00
```
fn generic<T>(t: T) {
2022-08-27 23:23:23 +00:00
--snip--
}
```
2022-09-13 16:54:09 +00:00
is actually treated as though we had written this:
2022-08-27 23:23:23 +00:00
```
2022-09-13 16:54:09 +00:00
fn generic<T: Sized>(t: T) {
2022-08-27 23:23:23 +00:00
--snip--
2022-03-05 02:24:35 +00:00
}
```
By default, generic functions will work only on types that have a known size at
compile time. However, you can use the following special syntax to relax this
restriction:
```
fn generic<T: ?Sized>(t: &T) {
2022-08-27 23:23:23 +00:00
--snip--
2022-03-05 02:24:35 +00:00
}
```
A trait bound on `?Sized` means “`T` may or may not be `Sized`” and this
notation overrides the default that generic types must have a known size at
compile time. The `?Trait` syntax with this meaning is only available for
`Sized`, not any other traits.
Also note that we switched the type of the `t` parameter from `T` to `&T`.
Because the type might not be `Sized`, we need to use it behind some kind of
pointer. In this case, weve chosen a reference.
Next, well talk about functions and closures!
## Advanced Functions and Closures
This section explores some advanced features related to functions and closures,
including function pointers and returning closures.
### Function Pointers
Weve talked about how to pass closures to functions; you can also pass regular
functions to functions! This technique is useful when you want to pass a
2022-05-23 00:43:00 +00:00
function youve already defined rather than defining a new closure. Functions
2022-08-27 23:23:23 +00:00
coerce to the type `fn` (with a lowercase *f*), not to be confused with the
`Fn` closure trait. The `fn` type is called a *function pointer*. Passing
functions with function pointers will allow you to use functions as arguments
to other functions.
2022-05-23 00:43:00 +00:00
The syntax for specifying that a parameter is a function pointer is similar to
that of closures, as shown in Listing 19-27, where weve defined a function
2022-08-27 23:23:23 +00:00
`add_one` that adds 1 to its parameter. The function `do_twice` takes two
2022-05-23 00:43:00 +00:00
parameters: a function pointer to any function that takes an `i32` parameter
and returns an `i32`, and one `i32 value`. The `do_twice` function calls the
function `f` twice, passing it the `arg` value, then adds the two function call
results together. The `main` function calls `do_twice` with the arguments
`add_one` and `5`.
2022-03-05 02:24:35 +00:00
Filename: src/main.rs
```
fn add_one(x: i32) -> i32 {
x + 1
}
fn do_twice(f: fn(i32) -> i32, arg: i32) -> i32 {
f(arg) + f(arg)
}
fn main() {
let answer = do_twice(add_one, 5);
println!("The answer is: {answer}");
2022-03-05 02:24:35 +00:00
}
```
2022-09-13 16:54:09 +00:00
Listing 19-27: Using the `fn` type to accept a function pointer as an argument
2022-03-05 02:24:35 +00:00
This code prints `The answer is: 12`. We specify that the parameter `f` in
`do_twice` is an `fn` that takes one parameter of type `i32` and returns an
`i32`. We can then call `f` in the body of `do_twice`. In `main`, we can pass
the function name `add_one` as the first argument to `do_twice`.
Unlike closures, `fn` is a type rather than a trait, so we specify `fn` as the
parameter type directly rather than declaring a generic type parameter with one
of the `Fn` traits as a trait bound.
Function pointers implement all three of the closure traits (`Fn`, `FnMut`, and
2022-05-23 00:43:00 +00:00
`FnOnce`), meaning you can always pass a function pointer as an argument for a
2022-03-05 02:24:35 +00:00
function that expects a closure. Its best to write functions using a generic
type and one of the closure traits so your functions can accept either
functions or closures.
2022-05-23 00:43:00 +00:00
That said, one example of where you would want to only accept `fn` and not
closures is when interfacing with external code that doesnt have closures: C
functions can accept functions as arguments, but C doesnt have closures.
2022-03-05 02:24:35 +00:00
As an example of where you could use either a closure defined inline or a named
2022-05-23 00:43:00 +00:00
function, lets look at a use of the `map` method provided by the `Iterator`
2022-08-27 23:19:24 +00:00
trait in the standard library. To use the `map` function to turn a vector of
numbers into a vector of strings, we could use a closure, like this:
2022-03-05 02:24:35 +00:00
```
let list_of_numbers = vec![1, 2, 3];
let list_of_strings: Vec<String> = list_of_numbers
.iter()
.map(|i| i.to_string())
.collect();
2022-03-05 02:24:35 +00:00
```
Or we could name a function as the argument to `map` instead of the closure,
like this:
```
let list_of_numbers = vec![1, 2, 3];
let list_of_strings: Vec<String> = list_of_numbers
.iter()
.map(ToString::to_string)
.collect();
2022-03-05 02:24:35 +00:00
```
2022-08-27 23:23:23 +00:00
Note that we must use the fully qualified syntax that we talked about in
“Advanced Traits” on page XX because there are multiple functions available
2022-05-23 00:43:00 +00:00
named `to_string`.
2022-06-15 00:07:51 +00:00
2022-08-27 23:23:23 +00:00
Here, were using the `to_string` function defined in the `ToString` trait,
which the standard library has implemented for any type that implements
`Display`.
2022-03-05 02:24:35 +00:00
2022-08-27 23:23:23 +00:00
Recall from “Enum Values” on page XX that the name of each enum variant that we
define also becomes an initializer function. We can use these initializer
functions as function pointers that implement the closure traits, which means
we can specify the initializer functions as arguments for methods that take
closures, like so:
2022-03-05 02:24:35 +00:00
```
enum Status {
Value(u32),
Stop,
}
let list_of_statuses: Vec<Status> = (0u32..20)
.map(Status::Value)
.collect();
2022-03-05 02:24:35 +00:00
```
2022-08-27 23:23:23 +00:00
Here, we create `Status::Value` instances using each `u32` value in the range
2022-03-05 02:24:35 +00:00
that `map` is called on by using the initializer function of `Status::Value`.
2022-08-27 23:23:23 +00:00
Some people prefer this style and some people prefer to use closures. They
2022-03-05 02:24:35 +00:00
compile to the same code, so use whichever style is clearer to you.
### Returning Closures
Closures are represented by traits, which means you cant return closures
directly. In most cases where you might want to return a trait, you can instead
use the concrete type that implements the trait as the return value of the
2022-05-23 00:43:00 +00:00
function. However, you cant do that with closures because they dont have a
2022-03-05 02:24:35 +00:00
concrete type that is returnable; youre not allowed to use the function
pointer `fn` as a return type, for example.
The following code tries to return a closure directly, but it wont compile:
```
fn returns_closure() -> dyn Fn(i32) -> i32 {
|x| x + 1
}
```
The compiler error is as follows:
```
error[E0746]: return type cannot have an unboxed trait object
--> src/lib.rs:1:25
|
1 | fn returns_closure() -> dyn Fn(i32) -> i32 {
2022-08-27 23:23:23 +00:00
| ^^^^^^^^^^^^^^^^^^ doesn't have a size known at
compile-time
2022-03-05 02:24:35 +00:00
|
2022-08-27 23:23:23 +00:00
= note: for information on `impl Trait`, see
<https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-
implement-traits>
2022-08-27 23:23:23 +00:00
help: use `impl Fn(i32) -> i32` as the return type, as all return paths are of
type `[closure@src/lib.rs:2:5: 2:14]`, which implements `Fn(i32) -> i32`
2022-03-05 02:24:35 +00:00
|
1 | fn returns_closure() -> impl Fn(i32) -> i32 {
| ~~~~~~~~~~~~~~~~~~~
```
The error references the `Sized` trait again! Rust doesnt know how much space
it will need to store the closure. We saw a solution to this problem earlier.
We can use a trait object:
```
fn returns_closure() -> Box<dyn Fn(i32) -> i32> {
Box::new(|x| x + 1)
}
```
2022-08-27 23:23:23 +00:00
This code will compile just fine. For more about trait objects, refer to “Using
Trait Objects That Allow for Values of Different Types” on page XX.
2022-03-05 02:24:35 +00:00
Next, lets look at macros!
## Macros
Weve used macros like `println!` throughout this book, but we havent fully
explored what a macro is and how it works. The term *macro* refers to a family
of features in Rust: *declarative* macros with `macro_rules!` and three kinds
of *procedural* macros:
* Custom `#[derive]` macros that specify code added with the `derive` attribute
2022-08-27 23:23:23 +00:00
used on structs and enums
2022-03-05 02:24:35 +00:00
* Attribute-like macros that define custom attributes usable on any item
* Function-like macros that look like function calls but operate on the tokens
2022-08-27 23:23:23 +00:00
specified as their argument
2022-09-13 16:54:09 +00:00
2022-03-05 02:24:35 +00:00
Well talk about each of these in turn, but first, lets look at why we even
need macros when we already have functions.
### The Difference Between Macros and Functions
Fundamentally, macros are a way of writing code that writes other code, which
is known as *metaprogramming*. In Appendix C, we discuss the `derive`
attribute, which generates an implementation of various traits for you. Weve
also used the `println!` and `vec!` macros throughout the book. All of these
macros *expand* to produce more code than the code youve written manually.
Metaprogramming is useful for reducing the amount of code you have to write and
maintain, which is also one of the roles of functions. However, macros have
2022-08-27 23:23:23 +00:00
some additional powers that functions dont have.
2022-03-05 02:24:35 +00:00
A function signature must declare the number and type of parameters the
function has. Macros, on the other hand, can take a variable number of
parameters: we can call `println!("hello")` with one argument or
`println!("hello {}", name)` with two arguments. Also, macros are expanded
before the compiler interprets the meaning of the code, so a macro can, for
example, implement a trait on a given type. A function cant, because it gets
called at runtime and a trait needs to be implemented at compile time.
The downside to implementing a macro instead of a function is that macro
definitions are more complex than function definitions because youre writing
Rust code that writes Rust code. Due to this indirection, macro definitions are
generally more difficult to read, understand, and maintain than function
definitions.
Another important difference between macros and functions is that you must
define macros or bring them into scope *before* you call them in a file, as
opposed to functions you can define anywhere and call anywhere.
2022-08-27 23:23:23 +00:00
### Declarative Macros with macro_rules! for General Metaprogramming
2022-03-05 02:24:35 +00:00
2022-05-23 00:43:00 +00:00
The most widely used form of macros in Rust is the *declarative macro*. These
are also sometimes referred to as “macros by example,” “`macro_rules!` macros,”
or just plain “macros.” At their core, declarative macros allow you to write
2022-03-05 02:24:35 +00:00
something similar to a Rust `match` expression. As discussed in Chapter 6,
`match` expressions are control structures that take an expression, compare the
2022-08-27 23:23:23 +00:00
resultant value of the expression to patterns, and then run the code associated
2022-03-05 02:24:35 +00:00
with the matching pattern. Macros also compare a value to patterns that are
associated with particular code: in this situation, the value is the literal
Rust source code passed to the macro; the patterns are compared with the
structure of that source code; and the code associated with each pattern, when
matched, replaces the code passed to the macro. This all happens during
compilation.
To define a macro, you use the `macro_rules!` construct. Lets explore how to
use `macro_rules!` by looking at how the `vec!` macro is defined. Chapter 8
covered how we can use the `vec!` macro to create a new vector with particular
values. For example, the following macro creates a new vector containing three
integers:
```
let v: Vec<u32> = vec![1, 2, 3];
```
We could also use the `vec!` macro to make a vector of two integers or a vector
of five string slices. We wouldnt be able to use a function to do the same
because we wouldnt know the number or type of values up front.
Listing 19-28 shows a slightly simplified definition of the `vec!` macro.
Filename: src/lib.rs
```
2022-08-27 23:23:23 +00:00
1 #[macro_export]
2 macro_rules! vec {
3 ( $( $x:expr ),* ) => {
{
let mut temp_vec = Vec::new();
4 $(
5 temp_vec.push(6 $x);
2022-03-05 02:24:35 +00:00
)*
2022-08-27 23:23:23 +00:00
7 temp_vec
2022-03-05 02:24:35 +00:00
}
};
}
```
2022-09-13 16:54:09 +00:00
Listing 19-28: A simplified version of the `vec!` macro definition
2022-03-05 02:24:35 +00:00
2022-09-13 16:54:09 +00:00
> Note: The actual definition of the `vec!` macro in the standard library
2022-08-27 23:23:23 +00:00
includes code to pre-allocate the correct amount of memory up front. That code
is an optimization that we dont include here, to make the example simpler.
2022-03-05 02:24:35 +00:00
The `#[macro_export]` annotation [1] indicates that this macro should be made
available whenever the crate in which the macro is defined is brought into
scope. Without this annotation, the macro cant be brought into scope.
We then start the macro definition with `macro_rules!` and the name of the
macro were defining *without* the exclamation mark [2]. The name, in this case
`vec`, is followed by curly brackets denoting the body of the macro definition.
The structure in the `vec!` body is similar to the structure of a `match`
expression. Here we have one arm with the pattern `( $( $x:expr ),* )`,
followed by `=>` and the block of code associated with this pattern [3]. If the
pattern matches, the associated block of code will be emitted. Given that this
is the only pattern in this macro, there is only one valid way to match; any
other pattern will result in an error. More complex macros will have more than
one arm.
2022-08-27 23:23:23 +00:00
Valid pattern syntax in macro definitions is different from the pattern syntax
2022-03-05 02:24:35 +00:00
covered in Chapter 18 because macro patterns are matched against Rust code
structure rather than values. Lets walk through what the pattern pieces in
Listing 19-28 mean; for the full macro pattern syntax, see the Rust Reference
at *https://doc.rust-lang.org/reference/macros-by-example.html*.
2022-08-27 23:23:23 +00:00
First we use a set of parentheses to encompass the whole pattern. We use a
2022-05-23 00:43:00 +00:00
dollar sign (`$`) to declare a variable in the macro system that will contain
the Rust code matching the pattern. The dollar sign makes it clear this is a
2022-08-27 23:23:23 +00:00
macro variable as opposed to a regular Rust variable. Next comes a set of
parentheses that captures values that match the pattern within the parentheses
for use in the replacement code. Within `$()` is `$x:expr`, which matches any
Rust expression and gives the expression the name `$x`.
2022-03-05 02:24:35 +00:00
The comma following `$()` indicates that a literal comma separator character
could optionally appear after the code that matches the code in `$()`. The `*`
specifies that the pattern matches zero or more of whatever precedes the `*`.
When we call this macro with `vec![1, 2, 3];`, the `$x` pattern matches three
times with the three expressions `1`, `2`, and `3`.
Now lets look at the pattern in the body of the code associated with this arm:
2022-09-13 16:54:09 +00:00
`temp_vec.push()` [5] within `$()* at [4] and [7] is generated for each part
2022-08-27 23:23:23 +00:00
that matches `$()` in the pattern zero or more times depending on how many
times the pattern matches. The `$x` [6] is replaced with each expression
matched. When we call this macro with `vec![1, 2, 3];`, the code generated that
replaces this macro call will be the following:
2022-03-05 02:24:35 +00:00
```
2022-09-13 16:54:09 +00:00
{
2022-03-05 02:24:35 +00:00
let mut temp_vec = Vec::new();
temp_vec.push(1);
temp_vec.push(2);
temp_vec.push(3);
temp_vec
}
```
Weve defined a macro that can take any number of arguments of any type and can
generate code to create a vector containing the specified elements.
2022-06-15 00:26:08 +00:00
To learn more about how to write macros, consult the online documentation or
other resources, such as “The Little Book of Rust Macros” at
2022-08-27 23:23:23 +00:00
*https://veykril.github.io/tlborm* started by Daniel Keep and continued by
2022-06-15 00:26:08 +00:00
Lukas Wirth.
2022-03-05 02:24:35 +00:00
### Procedural Macros for Generating Code from Attributes
2022-08-29 19:25:55 +00:00
The second form of macros is the procedural macro, which acts more like a
function (and is a type of procedure). *Procedural macros* accept some code as
an input, operate on that code, and produce some code as an output rather than
2022-05-23 00:43:00 +00:00
matching against patterns and replacing the code with other code as declarative
macros do. The three kinds of procedural macros are custom `derive`,
2022-05-23 00:43:00 +00:00
attribute-like, and function-like, and all work in a similar fashion.
2022-03-05 02:24:35 +00:00
When creating procedural macros, the definitions must reside in their own crate
with a special crate type. This is for complex technical reasons that we hope
2022-05-23 00:43:00 +00:00
to eliminate in the future. In Listing 19-29, we show how to define a
procedural macro, where `some_attribute` is a placeholder for using a specific
2022-03-05 02:24:35 +00:00
macro variety.
Filename: src/lib.rs
```
use proc_macro::TokenStream;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
#[some_attribute]
pub fn some_name(input: TokenStream) -> TokenStream {
}
```
2022-09-13 16:54:09 +00:00
Listing 19-29: An example of defining a procedural macro
2022-03-05 02:24:35 +00:00
The function that defines a procedural macro takes a `TokenStream` as an input
and produces a `TokenStream` as an output. The `TokenStream` type is defined by
the `proc_macro` crate that is included with Rust and represents a sequence of
tokens. This is the core of the macro: the source code that the macro is
operating on makes up the input `TokenStream`, and the code the macro produces
is the output `TokenStream`. The function also has an attribute attached to it
that specifies which kind of procedural macro were creating. We can have
multiple kinds of procedural macros in the same crate.
Lets look at the different kinds of procedural macros. Well start with a
custom `derive` macro and then explain the small dissimilarities that make the
2022-03-05 02:24:35 +00:00
other forms different.
2022-08-27 23:23:23 +00:00
### How to Write a Custom derive Macro
2022-03-05 02:24:35 +00:00
Lets create a crate named `hello_macro` that defines a trait named
`HelloMacro` with one associated function named `hello_macro`. Rather than
2022-05-23 00:43:00 +00:00
making our users implement the `HelloMacro` trait for each of their types,
well provide a procedural macro so users can annotate their type with
2022-03-05 02:24:35 +00:00
`#[derive(HelloMacro)]` to get a default implementation of the `hello_macro`
function. The default implementation will print `Hello, Macro! My name is`
TypeName`!` where TypeName is the name of the type on which this trait has been
defined. In other words, well write a crate that enables another programmer to
write code like Listing 19-30 using our crate.
2022-03-05 02:24:35 +00:00
Filename: src/main.rs
```
use hello_macro::HelloMacro;
use hello_macro_derive::HelloMacro;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
#[derive(HelloMacro)]
struct Pancakes;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
fn main() {
Pancakes::hello_macro();
}
```
2022-09-13 16:54:09 +00:00
Listing 19-30: The code a user of our crate will be able to write when using
our procedural macro
2022-03-05 02:24:35 +00:00
This code will print `Hello, Macro! My name is Pancakes!` when were done. The
first step is to make a new library crate, like this:
```
$ cargo new hello_macro --lib
```
Next, well define the `HelloMacro` trait and its associated function:
Filename: src/lib.rs
```
pub trait HelloMacro {
fn hello_macro();
}
```
We have a trait and its function. At this point, our crate user could implement
the trait to achieve the desired functionality, like so:
```
use hello_macro::HelloMacro;
struct Pancakes;
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
impl HelloMacro for Pancakes {
fn hello_macro() {
println!("Hello, Macro! My name is Pancakes!");
}
}
fn main() {
Pancakes::hello_macro();
}
```
However, they would need to write the implementation block for each type they
wanted to use with `hello_macro`; we want to spare them from having to do this
work.
Additionally, we cant yet provide the `hello_macro` function with default
implementation that will print the name of the type the trait is implemented
on: Rust doesnt have reflection capabilities, so it cant look up the types
name at runtime. We need a macro to generate code at compile time.
The next step is to define the procedural macro. At the time of this writing,
procedural macros need to be in their own crate. Eventually, this restriction
might be lifted. The convention for structuring crates and macro crates is as
follows: for a crate named foo, a custom `derive` procedural macro crate is
called foo`_derive`. Lets start a new crate called `hello_macro_derive` inside
2022-03-05 02:24:35 +00:00
our `hello_macro` project:
```
$ cargo new hello_macro_derive --lib
```
Our two crates are tightly related, so we create the procedural macro crate
within the directory of our `hello_macro` crate. If we change the trait
definition in `hello_macro`, well have to change the implementation of the
procedural macro in `hello_macro_derive` as well. The two crates will need to
be published separately, and programmers using these crates will need to add
both as dependencies and bring them both into scope. We could instead have the
`hello_macro` crate use `hello_macro_derive` as a dependency and re-export the
procedural macro code. However, the way weve structured the project makes it
possible for programmers to use `hello_macro` even if they dont want the
`derive` functionality.
We need to declare the `hello_macro_derive` crate as a procedural macro crate.
Well also need functionality from the `syn` and `quote` crates, as youll see
in a moment, so we need to add them as dependencies. Add the following to the
*Cargo.toml* file for `hello_macro_derive`:
Filename: hello_macro_derive/Cargo.toml
```
[lib]
proc-macro = true
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
[dependencies]
syn = "1.0"
quote = "1.0"
```
To start defining the procedural macro, place the code in Listing 19-31 into
your *src/lib.rs* file for the `hello_macro_derive` crate. Note that this code
wont compile until we add a definition for the `impl_hello_macro` function.
Filename: hello_macro_derive/src/lib.rs
```
use proc_macro::TokenStream;
use quote::quote;
use syn;
#[proc_macro_derive(HelloMacro)]
pub fn hello_macro_derive(input: TokenStream) -> TokenStream {
// Construct a representation of Rust code as a syntax tree
// that we can manipulate
let ast = syn::parse(input).unwrap();
2022-08-27 23:23:23 +00:00
2022-03-05 02:24:35 +00:00
// Build the trait implementation
impl_hello_macro(&ast)
}
```
2022-09-13 16:54:09 +00:00
Listing 19-31: Code that most procedural macro crates will require in order to
process Rust code
2022-03-05 02:24:35 +00:00
Notice that weve split the code into the `hello_macro_derive` function, which
is responsible for parsing the `TokenStream`, and the `impl_hello_macro`
function, which is responsible for transforming the syntax tree: this makes
writing a procedural macro more convenient. The code in the outer function
(`hello_macro_derive` in this case) will be the same for almost every
procedural macro crate you see or create. The code you specify in the body of
the inner function (`impl_hello_macro` in this case) will be different
depending on your procedural macros purpose.
Weve introduced three new crates: `proc_macro`, `syn` (available from
*https://crates.io/crates/syn*), and `quote` (available from
*https://crates.io/crates/quote*). The `proc_macro` crate comes with Rust, so
we didnt need to add that to the dependencies in *Cargo.toml*. The
`proc_macro` crate is the compilers API that allows us to read and manipulate
Rust code from our code.
The `syn` crate parses Rust code from a string into a data structure that we
can perform operations on. The `quote` crate turns `syn` data structures back
into Rust code. These crates make it much simpler to parse any sort of Rust
code we might want to handle: writing a full parser for Rust code is no simple
task.
The `hello_macro_derive` function will be called when a user of our library
specifies `#[derive(HelloMacro)]` on a type. This is possible because weve
annotated the `hello_macro_derive` function here with `proc_macro_derive` and
2022-05-23 00:43:00 +00:00
specified the name `HelloMacro`, which matches our trait name; this is the
2022-03-05 02:24:35 +00:00
convention most procedural macros follow.
The `hello_macro_derive` function first converts the `input` from a
`TokenStream` to a data structure that we can then interpret and perform
operations on. This is where `syn` comes into play. The `parse` function in
`syn` takes a `TokenStream` and returns a `DeriveInput` struct representing the
parsed Rust code. Listing 19-32 shows the relevant parts of the `DeriveInput`
2022-08-27 23:23:23 +00:00
struct we get from parsing the `struct Pancakes;` string.
2022-03-05 02:24:35 +00:00
```
DeriveInput {
2022-08-27 23:23:23 +00:00
--snip--
2022-03-05 02:24:35 +00:00
ident: Ident {
ident: "Pancakes",
span: #0 bytes(95..103)
},
data: Struct(
DataStruct {
struct_token: Struct,
fields: Unit,
semi_token: Some(
Semi
)
}
)
}
```
2022-09-13 16:54:09 +00:00
Listing 19-32: The `DeriveInput` instance we get when parsing the code that has
the macros attribute in Listing 19-30
2022-03-05 02:24:35 +00:00
The fields of this struct show that the Rust code weve parsed is a unit struct
2022-08-29 19:25:55 +00:00
with the `ident` (*identifier*, meaning the name) of `Pancakes`. There are more
2022-03-05 02:24:35 +00:00
fields on this struct for describing all sorts of Rust code; check the `syn`
documentation for `DeriveInput` at
*https://docs.rs/syn/1.0/syn/struct.DeriveInput.html* for more information.
Soon well define the `impl_hello_macro` function, which is where well build
the new Rust code we want to include. But before we do, note that the output
for our `derive` macro is also a `TokenStream`. The returned `TokenStream` is
2022-03-05 02:24:35 +00:00
added to the code that our crate users write, so when they compile their crate,
theyll get the extra functionality that we provide in the modified
`TokenStream`.
You might have noticed that were calling `unwrap` to cause the
`hello_macro_derive` function to panic if the call to the `syn::parse` function
fails here. Its necessary for our procedural macro to panic on errors because
`proc_macro_derive` functions must return `TokenStream` rather than `Result` to
conform to the procedural macro API. Weve simplified this example by using
`unwrap`; in production code, you should provide more specific error messages
about what went wrong by using `panic!` or `expect`.
Now that we have the code to turn the annotated Rust code from a `TokenStream`
into a `DeriveInput` instance, lets generate the code that implements the
`HelloMacro` trait on the annotated type, as shown in Listing 19-33.
Filename: hello_macro_derive/src/lib.rs
```
fn impl_hello_macro(ast: &syn::DeriveInput) -> TokenStream {
let name = &ast.ident;
let gen = quote! {
impl HelloMacro for #name {
fn hello_macro() {
println!(
"Hello, Macro! My name is {}!",
stringify!(#name)
);
2022-03-05 02:24:35 +00:00
}
}
};
gen.into()
}
```
2022-09-13 16:54:09 +00:00
Listing 19-33: Implementing the `HelloMacro` trait using the parsed Rust code
2022-03-05 02:24:35 +00:00
We get an `Ident` struct instance containing the name (identifier) of the
annotated type using `ast.ident`. The struct in Listing 19-32 shows that when
we run the `impl_hello_macro` function on the code in Listing 19-30, the
2022-08-27 23:23:23 +00:00
`ident` we get will have the `ident` field with a value of `"Pancakes"`. Thus
2022-03-05 02:24:35 +00:00
the `name` variable in Listing 19-33 will contain an `Ident` struct instance
that, when printed, will be the string `"Pancakes"`, the name of the struct in
Listing 19-30.
The `quote!` macro lets us define the Rust code that we want to return. The
compiler expects something different to the direct result of the `quote!`
macros execution, so we need to convert it to a `TokenStream`. We do this by
calling the `into` method, which consumes this intermediate representation and
returns a value of the required `TokenStream` type.
The `quote!` macro also provides some very cool templating mechanics: we can
enter `#name`, and `quote!` will replace it with the value in the variable
`name`. You can even do some repetition similar to the way regular macros work.
Check out the `quote` crates docs at *https://docs.rs/quote* for a thorough
introduction.
We want our procedural macro to generate an implementation of our `HelloMacro`
trait for the type the user annotated, which we can get by using `#name`. The
2022-08-27 23:23:23 +00:00
trait implementation has the one function `hello_macro`, whose body contains
the functionality we want to provide: printing `Hello, Macro! My name is` and
then the name of the annotated type.
2022-03-05 02:24:35 +00:00
The `stringify!` macro used here is built into Rust. It takes a Rust
expression, such as `1 + 2`, and at compile time turns the expression into a
2022-08-27 23:23:23 +00:00
string literal, such as `"1 + 2"`. This is different from `format!` or
2022-03-05 02:24:35 +00:00
`println!`, macros which evaluate the expression and then turn the result into
a `String`. There is a possibility that the `#name` input might be an
expression to print literally, so we use `stringify!`. Using `stringify!` also
saves an allocation by converting `#name` to a string literal at compile time.
At this point, `cargo build` should complete successfully in both `hello_macro`
and `hello_macro_derive`. Lets hook up these crates to the code in Listing
19-30 to see the procedural macro in action! Create a new binary project in
your *projects* directory using `cargo new pancakes`. We need to add
`hello_macro` and `hello_macro_derive` as dependencies in the `pancakes`
crates *Cargo.toml*. If youre publishing your versions of `hello_macro` and
2022-08-27 23:23:23 +00:00
`hello_macro_derive` to *https://crates.io*, they would be regular
2022-03-05 02:24:35 +00:00
dependencies; if not, you can specify them as `path` dependencies as follows:
```
[dependencies]
hello_macro = { path = "../hello_macro" }
hello_macro_derive = { path = "../hello_macro/hello_macro_derive" }
```
Put the code in Listing 19-30 into *src/main.rs*, and run `cargo run`: it
should print `Hello, Macro! My name is Pancakes!` The implementation of the
`HelloMacro` trait from the procedural macro was included without the
`pancakes` crate needing to implement it; the `#[derive(HelloMacro)]` added the
trait implementation.
Next, lets explore how the other kinds of procedural macros differ from custom
`derive` macros.
2022-03-05 02:24:35 +00:00
2022-08-27 23:23:23 +00:00
### Attribute-like Macros
2022-03-05 02:24:35 +00:00
Attribute-like macros are similar to custom `derive` macros, but instead of
2022-03-05 02:24:35 +00:00
generating code for the `derive` attribute, they allow you to create new
attributes. Theyre also more flexible: `derive` only works for structs and
enums; attributes can be applied to other items as well, such as functions.
2022-08-29 19:25:55 +00:00
Heres an example of using an attribute-like macro. Say you have an attribute
2022-03-05 02:24:35 +00:00
named `route` that annotates functions when using a web application framework:
```
#[route(GET, "/")]
fn index() {
```
This `#[route]` attribute would be defined by the framework as a procedural
macro. The signature of the macro definition function would look like this:
```
#[proc_macro_attribute]
pub fn route(
attr: TokenStream,
item: TokenStream
) -> TokenStream {
2022-03-05 02:24:35 +00:00
```
Here, we have two parameters of type `TokenStream`. The first is for the
contents of the attribute: the `GET, "/"` part. The second is the body of the
item the attribute is attached to: in this case, `fn index() {}` and the rest
of the functions body.
Other than that, attribute-like macros work the same way as custom `derive`
2022-03-05 02:24:35 +00:00
macros: you create a crate with the `proc-macro` crate type and implement a
function that generates the code you want!
2022-08-27 23:23:23 +00:00
### Function-like Macros
2022-03-05 02:24:35 +00:00
Function-like macros define macros that look like function calls. Similarly to
`macro_rules!` macros, theyre more flexible than functions; for example, they
2022-08-27 23:23:23 +00:00
can take an unknown number of arguments. However, `macro_rules!` macros can
only be defined using the match-like syntax we discussed in “Declarative Macros
with macro_rules! for General Metaprogramming” on page XX. Function-like macros
2022-08-29 19:25:55 +00:00
take a `TokenStream` parameter, and their definition manipulates that
2022-08-27 23:23:23 +00:00
`TokenStream` using Rust code as the other two types of procedural macros do.
An example of a function-like macro is an `sql!` macro that might be called
like so:
2022-03-05 02:24:35 +00:00
```
let sql = sql!(SELECT * FROM posts WHERE id=1);
```
This macro would parse the SQL statement inside it and check that its
syntactically correct, which is much more complex processing than a
`macro_rules!` macro can do. The `sql!` macro would be defined like this:
```
#[proc_macro]
pub fn sql(input: TokenStream) -> TokenStream {
```
This definition is similar to the custom `derive` macros signature: we receive
2022-03-05 02:24:35 +00:00
the tokens that are inside the parentheses and return the code we wanted to
generate.
## Summary
2022-05-23 00:43:00 +00:00
Whew! Now you have some Rust features in your toolbox that you likely wont use
often, but youll know theyre available in very particular circumstances.
Weve introduced several complex topics so that when you encounter them in
2022-08-27 23:23:23 +00:00
error message suggestions or in other peoples code, youll be able to
2022-05-23 00:43:00 +00:00
recognize these concepts and syntax. Use this chapter as a reference to guide
you to solutions.
2022-03-05 02:24:35 +00:00
Next, well put everything weve discussed throughout the book into practice
and do one more project!
2022-08-27 23:23:23 +00:00