book/nostarch/chapter12.md

63 KiB
Raw Permalink Blame History

[TOC]

An I/O Project: Building a Command Line Program

This chapter is a recap of the many skills youve learned so far and an exploration of a few more standard library features. Well build a command line tool that interacts with file and command line input/output to practice some of the Rust concepts you now have under your belt.

Rusts speed, safety, single binary output, and cross-platform support make it an ideal language for creating command line tools, so for our project, well make our own version of the classic command line search tool grep (globally search a regular expression and print). In the simplest use case, grep searches a specified file for a specified string. To do so, grep takes as its arguments a file path and a string. Then it reads the file, finds lines in that file that contain the string argument, and prints those lines.

Along the way, well show how to make our command line tool use the terminal features that many other command line tools use. Well read the value of an environment variable to allow the user to configure the behavior of our tool. Well also print error messages to the standard error console stream (stderr) instead of standard output (stdout) so that, for example, the user can redirect successful output to a file while still seeing error messages onscreen.

One Rust community member, Andrew Gallant, has already created a fully featured, very fast version of grep, called ripgrep. By comparison, our version will be fairly simple, but this chapter will give you some of the background knowledge you need to understand a real-world project such as ripgrep.

Our grep project will combine a number of concepts youve learned so far:

  • Organizing code (Chapter 7)
  • Using vectors and strings (Chapter 8)
  • Handling errors (Chapter 9)
  • Using traits and lifetimes where appropriate (Chapter 10)
  • Writing tests (Chapter 11)

Well also briefly introduce closures, iterators, and trait objects, which Chapter 13 and Chapter 17 will cover in detail.

Accepting Command Line Arguments

Lets create a new project with, as always, cargo new. Well call our project minigrep to distinguish it from the grep tool that you might already have on your system.

$ cargo new minigrep
     Created binary (application) `minigrep` project
$ cd minigrep

The first task is to make minigrep accept its two command line arguments: the file path and a string to search for. That is, we want to be able to run our program with cargo run, two hyphens to indicate the following arguments are for our program rather than for cargo, a string to search for, and a path to a file to search in, like so:

$ cargo run -- searchstring example-filename.txt

Right now, the program generated by cargo new cannot process arguments we give it. Some existing libraries on https://crates.io can help with writing a program that accepts command line arguments, but because youre just learning this concept, lets implement this capability ourselves.

Reading the Argument Values

To enable minigrep to read the values of command line arguments we pass to it, well need the std::env::args function provided in Rusts standard library. This function returns an iterator of the command line arguments passed to minigrep. Well cover iterators fully in Chapter 13. For now, you only need to know two details about iterators: iterators produce a series of values, and we can call the collect method on an iterator to turn it into a collection, such as a vector, that contains all the elements the iterator produces.

The code in Listing 12-1 allows your minigrep program to read any command line arguments passed to it, and then collect the values into a vector.

Filename: src/main.rs

use std::env;

fn main() {
    let args: Vec<String> = env::args().collect();
    dbg!(args);
}

Listing 12-1: Collecting the command line arguments into a vector and printing them

First we bring the std::env module into scope with a use statement so we can use its args function. Notice that the std::env::args function is nested in two levels of modules. As we discussed in Chapter 7, in cases where the desired function is nested in more than one module, weve chosen to bring the parent module into scope rather than the function. By doing so, we can easily use other functions from std::env. Its also less ambiguous than adding use std::env::args and then calling the function with just args, because args might easily be mistaken for a function thats defined in the current module.

The args Function and Invalid Unicode

Note that std::env::args will panic if any argument contains invalid Unicode. If your program needs to accept arguments containing invalid Unicode, use std::env::args_os instead. That function returns an iterator that produces OsString values instead of String values. Weve chosen to use std::env::args here for simplicity because OsString values differ per platform and are more complex to work with than String values.

On the first line of main, we call env::args, and we immediately use collect to turn the iterator into a vector containing all the values produced by the iterator. We can use the collect function to create many kinds of collections, so we explicitly annotate the type of args to specify that we want a vector of strings. Although you very rarely need to annotate types in Rust, collect is one function you do often need to annotate because Rust isnt able to infer the kind of collection you want.

Finally, we print the vector using the debug macro. Lets try running the code first with no arguments and then with two arguments:

$ cargo run
--snip--
[src/main.rs:5] args = [
    "target/debug/minigrep",
]
$ cargo run -- needle haystack
--snip--
[src/main.rs:5] args = [
    "target/debug/minigrep",
    "needle",
    "haystack",
]

Notice that the first value in the vector is "target/debug/minigrep", which is the name of our binary. This matches the behavior of the arguments list in C, letting programs use the name by which they were invoked in their execution. Its often convenient to have access to the program name in case you want to print it in messages or change the behavior of the program based on what command line alias was used to invoke the program. But for the purposes of this chapter, well ignore it and save only the two arguments we need.

Saving the Argument Values in Variables

The program is currently able to access the values specified as command line arguments. Now we need to save the values of the two arguments in variables so we can use the values throughout the rest of the program. We do that in Listing 12-2.

Filename: src/main.rs

use std::env;

fn main() {
    let args: Vec<String> = env::args().collect();

    let query = &args[1];
    let file_path = &args[2];

    println!("Searching for {}", query);
    println!("In file {}", file_path);
}

Listing 12-2: Creating variables to hold the query argument and file path argument

As we saw when we printed the vector, the programs name takes up the first value in the vector at args[0], so were starting arguments at index 1. The first argument minigrep takes is the string were searching for, so we put a reference to the first argument in the variable query. The second argument will be the file path, so we put a reference to the second argument in the variable file_path.

We temporarily print the values of these variables to prove that the code is working as we intend. Lets run this program again with the arguments test and sample.txt:

$ cargo run -- test sample.txt
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep test sample.txt`
Searching for test
In file sample.txt

Great, the program is working! The values of the arguments we need are being saved into the right variables. Later well add some error handling to deal with certain potential erroneous situations, such as when the user provides no arguments; for now, well ignore that situation and work on adding file-reading capabilities instead.

Reading a File

Now well add functionality to read the file specified in the file_path argument. First we need a sample file to test it with: well use a file with a small amount of text over multiple lines with some repeated words. Listing 12-3 has an Emily Dickinson poem that will work well! Create a file called poem.txt at the root level of your project, and enter the poem “Im Nobody! Who are you?”

Filename: poem.txt

I'm nobody! Who are you?
Are you nobody, too?
Then there's a pair of us - don't tell!
They'd banish us, you know.

How dreary to be somebody!
How public, like a frog
To tell your name the livelong day
To an admiring bog!

Listing 12-3: A poem by Emily Dickinson makes a good test case.

With the text in place, edit src/main.rs and add code to read the file, as shown in Listing 12-4.

Filename: src/main.rs

use std::env;
1 use std::fs;

fn main() {
    --snip--
    println!("In file {}", file_path);

  2 let contents = fs::read_to_string(file_path)
        .expect("Should have been able to read the file");

  3 println!("With text:\n{contents}");
}

Listing 12-4: Reading the contents of the file specified by the second argument

First we bring in a relevant part of the standard library with a use statement: we need std::fs to handle files [1].

In main, the new statement fs::read_to_string takes the file_path, opens that file, and returns an std::io::Result<String> of the files contents [2].

After that, we again add a temporary println! statement that prints the value of contents after the file is read, so we can check that the program is working so far [3].

Lets run this code with any string as the first command line argument (because we havent implemented the searching part yet) and the poem.txt file as the second argument:

$ cargo run -- the poem.txt
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep the poem.txt`
Searching for the
In file poem.txt
With text:
I'm nobody! Who are you?
Are you nobody, too?
Then there's a pair of us - don't tell!
They'd banish us, you know.

How dreary to be somebody!
How public, like a frog
To tell your name the livelong day
To an admiring bog!

Great! The code read and then printed the contents of the file. But the code has a few flaws. At the moment, the main function has multiple responsibilities: generally, functions are clearer and easier to maintain if each function is responsible for only one idea. The other problem is that were not handling errors as well as we could. The program is still small, so these flaws arent a big problem, but as the program grows, it will be harder to fix them cleanly. Its a good practice to begin refactoring early on when developing a program because its much easier to refactor smaller amounts of code. Well do that next.

Refactoring to Improve Modularity and Error Handling

To improve our program, well fix four problems that have to do with the programs structure and how its handling potential errors. First, our main function now performs two tasks: it parses arguments and reads files. As our program grows, the number of separate tasks the main function handles will increase. As a function gains responsibilities, it becomes more difficult to reason about, harder to test, and harder to change without breaking one of its parts. Its best to separate functionality so each function is responsible for one task.

This issue also ties into the second problem: although query and file_path are configuration variables to our program, variables like contents are used to perform the programs logic. The longer main becomes, the more variables well need to bring into scope; the more variables we have in scope, the harder it will be to keep track of the purpose of each. Its best to group the configuration variables into one structure to make their purpose clear.

The third problem is that weve used expect to print an error message when reading the file fails, but the error message just prints Should have been able to read the file. Reading a file can fail in a number of ways: for example, the file could be missing, or we might not have permission to open it. Right now, regardless of the situation, wed print the same error message for everything, which wouldnt give the user any information!

Fourth, we use expect repeatedly to handle different errors, and if the user runs our program without specifying enough arguments, theyll get an index out of bounds error from Rust that doesnt clearly explain the problem. It would be best if all the error-handling code were in one place so future maintainers had only one place to consult the code if the error-handling logic needed to change. Having all the error-handling code in one place will also ensure that were printing messages that will be meaningful to our end users.

Lets address these four problems by refactoring our project.

Separation of Concerns for Binary Projects

The organizational problem of allocating responsibility for multiple tasks to the main function is common to many binary projects. As a result, the Rust community has developed guidelines for splitting the separate concerns of a binary program when main starts getting large. This process has the following steps:

  • Split your program into a main.rs file and a lib.rs file and move your programs logic to lib.rs.
  • As long as your command line parsing logic is small, it can remain in main.rs.
  • When the command line parsing logic starts getting complicated, extract it from main.rs and move it to lib.rs.

The responsibilities that remain in the main function after this process should be limited to the following:

  • Calling the command line parsing logic with the argument values
  • Setting up any other configuration
  • Calling a run function in lib.rs
  • Handling the error if run returns an error

This pattern is about separating concerns: main.rs handles running the program and lib.rs handles all the logic of the task at hand. Because you cant test the main function directly, this structure lets you test all of your programs logic by moving it into functions in lib.rs. The code that remains in main.rs will be small enough to verify its correctness by reading it. Lets rework our program by following this process.

Extracting the Argument Parser

Well extract the functionality for parsing arguments into a function that main will call to prepare for moving the command line parsing logic to src/lib.rs*. Listing 12-5 shows the new start of main that calls a new function parse_config, which well define in src/main.rs for the moment.

Filename: src/main.rs

fn main() {
    let args: Vec<String> = env::args().collect();

    let (query, file_path) = parse_config(&args);

    --snip--
}

fn parse_config(args: &[String]) -> (&str, &str) {
    let query = &args[1];
    let file_path = &args[2];

    (query, file_path)
}

Listing 12-5: Extracting a parse_config function from main

Were still collecting the command line arguments into a vector, but instead of assigning the argument value at index 1 to the variable query and the argument value at index 2 to the variable file_path within the main function, we pass the whole vector to the parse_config function. The parse_config function then holds the logic that determines which argument goes in which variable and passes the values back to main. We still create the query and file_path variables in main, but main no longer has the responsibility of determining how the command line arguments and variables correspond.

This rework may seem like overkill for our small program, but were refactoring in small, incremental steps. After making this change, run the program again to verify that the argument parsing still works. Its good to check your progress often, to help identify the cause of problems when they occur.

Grouping Configuration Values

We can take another small step to improve the parse_config function further. At the moment, were returning a tuple, but then we immediately break that tuple into individual parts again. This is a sign that perhaps we dont have the right abstraction yet.

Another indicator that shows theres room for improvement is the config part of parse_config, which implies that the two values we return are related and are both part of one configuration value. Were not currently conveying this meaning in the structure of the data other than by grouping the two values into a tuple; well instead put the two values into one struct and give each of the struct fields a meaningful name. Doing so will make it easier for future maintainers of this code to understand how the different values relate to each other and what their purpose is.

Listing 12-6 shows the improvements to the parse_config function.

Filename: src/main.rs

fn main() {
    let args: Vec<String> = env::args().collect();

  1 let config = parse_config(&args);

    println!("Searching for {}", 2 config.query);
    println!("In file {}", 3 config.file_path);

    let contents = fs::read_to_string(4 config.file_path)
        .expect("Should have been able to read the file");

    --snip--
}

5 struct Config {
    query: String,
    file_path: String,
}

6 fn parse_config(args: &[String]) -> Config {
  7 let query = args[1].clone();
  8 let file_path = args[2].clone();

    Config { query, file_path }
}

Listing 12-6: Refactoring parse_config to return an instance of a Config struct

Weve added a struct named Config defined to have fields named query and file_path [5]. The signature of parse_config now indicates that it returns a Config value [6]. In the body of parse_config, where we used to return string slices that reference String values in args, we now define Config to contain owned String values. The args variable in main is the owner of the argument values and is only letting the parse_config function borrow them, which means wed violate Rusts borrowing rules if Config tried to take ownership of the values in args.

There are a number of ways we could manage the String data; the easiest, though somewhat inefficient, route is to call the clone method on the values [7] [8]. This will make a full copy of the data for the Config instance to own, which takes more time and memory than storing a reference to the string data. However, cloning the data also makes our code very straightforward because we dont have to manage the lifetimes of the references; in this circumstance, giving up a little performance to gain simplicity is a worthwhile trade-off.

The Trade-Offs of Using clone

Theres a tendency among many Rustaceans to avoid using clone to fix ownership problems because of its runtime cost. In Chapter 13, youll learn how to use more efficient methods in this type of situation. But for now, its okay to copy a few strings to continue making progress because youll make these copies only once and your file path and query string are very small. Its better to have a working program thats a bit inefficient than to try to hyperoptimize code on your first pass. As you become more experienced with Rust, itll be easier to start with the most efficient solution, but for now, its perfectly acceptable to call clone.

Weve updated main so it places the instance of Config returned by parse_config into a variable named config [1], and we updated the code that previously used the separate query and file_path variables so it now uses the fields on the Config struct instead [2] [3] [4].

Now our code more clearly conveys that query and file_path are related and that their purpose is to configure how the program will work. Any code that uses these values knows to find them in the config instance in the fields named for their purpose.

Creating a Constructor for Config

So far, weve extracted the logic responsible for parsing the command line arguments from main and placed it in the parse_config function. Doing so helped us see that the query and file_path values were related, and that relationship should be conveyed in our code. We then added a Config struct to name the related purpose of query and file_path and to be able to return the values names as struct field names from the parse_config function.

So now that the purpose of the parse_config function is to create a Config instance, we can change parse_config from a plain function to a function named new that is associated with the Config struct. Making this change will make the code more idiomatic. We can create instances of types in the standard library, such as String, by calling String::new. Similarly, by changing parse_config into a new function associated with Config, well be able to create instances of Config by calling Config::new. Listing 12-7 shows the changes we need to make.

Filename: src/main.rs

fn main() {
    let args: Vec<String> = env::args().collect();

  1 let config = Config::new(&args);

    --snip--
}

--snip--

2 impl Config {
  3 fn new(args: &[String]) -> Config {
        let query = args[1].clone();
        let file_path = args[2].clone();

        Config { query, file_path }
    }
}

Listing 12-7: Changing parse_config into Config::new

Weve updated main where we were calling parse_config to instead call Config::new [1]. Weve changed the name of parse_config to new [3] and moved it within an impl block [2], which associates the new function with Config. Try compiling this code again to make sure it works.

Fixing the Error Handling

Now well work on fixing our error handling. Recall that attempting to access the values in the args vector at index 1 or index 2 will cause the program to panic if the vector contains fewer than three items. Try running the program without any arguments; it will look like this:

$ cargo run
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep`
thread 'main' panicked at 'index out of bounds: the len is 1 but
the index is 1', src/main.rs:27:21
note: run with `RUST_BACKTRACE=1` environment variable to display
a backtrace

The line index out of bounds: the len is 1 but the index is 1 is an error message intended for programmers. It wont help our end users understand what they should do instead. Lets fix that now.

Improving the Error Message

In Listing 12-8, we add a check in the new function that will verify that the slice is long enough before accessing index 1 and index 2. If the slice isnt long enough, the program panics and displays a better error message.

Filename: src/main.rs

--snip--
fn new(args: &[String]) -> Config {
    if args.len() < 3 {
        panic!("not enough arguments");
    }
    --snip--

Listing 12-8: Adding a check for the number of arguments

This code is similar to the Guess::new function we wrote in Listing 9-13, where we called panic! when the value argument was out of the range of valid values. Instead of checking for a range of values here, were checking that the length of args is at least 3 and the rest of the function can operate under the assumption that this condition has been met. If args has fewer than three items, this condition will be true, and we call the panic! macro to end the program immediately.

With these extra few lines of code in new, lets run the program without any arguments again to see what the error looks like now:

$ cargo run
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep`
thread 'main' panicked at 'not enough arguments',
src/main.rs:26:13
note: run with `RUST_BACKTRACE=1` environment variable to display
a backtrace

This output is better: we now have a reasonable error message. However, we also have extraneous information we dont want to give to our users. Perhaps the technique we used in Listing 9-13 isnt the best one to use here: a call to panic! is more appropriate for a programming problem than a usage problem, as discussed in Chapter 9. Instead, well use the other technique you learned about in Chapter 9—returning a Result that indicates either success or an error.

Returning a Result Instead of Calling panic!

We can instead return a Result value that will contain a Config instance in the successful case and will describe the problem in the error case. Were also going to change the function name from new to build because many programmers expect new functions to never fail. When Config::build is communicating to main, we can use the Result type to signal there was a problem. Then we can change main to convert an Err variant into a more practical error for our users without the surrounding text about thread 'main' and RUST_BACKTRACE that a call to panic! causes.

Listing 12-9 shows the changes we need to make to the return value of the function were now calling Config::build and the body of the function needed to return a Result. Note that this wont compile until we update main as well, which well do in the next listing.

Filename: src/main.rs

impl Config {
    fn build(args: &[String]) -> Result<Config, &'static str> {
        if args.len() < 3 {
            return Err("not enough arguments");
        }

        let query = args[1].clone();
        let file_path = args[2].clone();

        Ok(Config { query, file_path })
    }
}

Listing 12-9: Returning a Result from Config::build

Our build function returns a Result with a Config instance in the success case and an &'static str in the error case. Our error values will always be string literals that have the 'static lifetime.

Weve made two changes in the body of the function: instead of calling panic! when the user doesnt pass enough arguments, we now return an Err value, and weve wrapped the Config return value in an Ok. These changes make the function conform to its new type signature.

Returning an Err value from Config::build allows the main function to handle the Result value returned from the build function and exit the process more cleanly in the error case.

Calling Config::build and Handling Errors

To handle the error case and print a user-friendly message, we need to update main to handle the Result being returned by Config::build, as shown in Listing 12-10. Well also take the responsibility of exiting the command line tool with a nonzero error code away from panic! and instead implement it by hand. A nonzero exit status is a convention to signal to the process that called our program that the program exited with an error state.

Filename: src/main.rs

1 use std::process;

fn main() {
    let args: Vec<String> = env::args().collect();

  2 let config = Config::build(&args).3 unwrap_or_else(|4 err| {
      5 println!("Problem parsing arguments: {err}");
      6 process::exit(1);
    });

    --snip--

Listing 12-10: Exiting with an error code if building a Config fails

In this listing, weve used a method we havent covered in detail yet: unwrap_or_else, which is defined on Result<T, E> by the standard library [2]. Using unwrap_or_else allows us to define some custom, non-panic! error handling. If the Result is an Ok value, this methods behavior is similar to unwrap: it returns the inner value that Ok is wrapping. However, if the value is an Err value, this method calls the code in the closure, which is an anonymous function we define and pass as an argument to unwrap_or_else [3]. Well cover closures in more detail in Chapter 13. For now, you just need to know that unwrap_or_else will pass the inner value of the Err, which in this case is the static string "not enough arguments" that we added in Listing 12-9, to our closure in the argument err that appears between the vertical pipes [4]. The code in the closure can then use the err value when it runs.

Weve added a new use line to bring process from the standard library into scope [1]. The code in the closure that will be run in the error case is only two lines: we print the err value [5] and then call process::exit [6]. The process::exit function will stop the program immediately and return the number that was passed as the exit status code. This is similar to the panic!-based handling we used in Listing 12-8, but we no longer get all the extra output. Lets try it:

$ cargo run
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.48s
     Running `target/debug/minigrep`
Problem parsing arguments: not enough arguments

Great! This output is much friendlier for our users.

Extracting Logic from main

Now that weve finished refactoring the configuration parsing, lets turn to the programs logic. As we stated in “Separation of Concerns for Binary Projects” on page XX, well extract a function named run that will hold all the logic currently in the main function that isnt involved with setting up configuration or handling errors. When were done, main will be concise and easy to verify by inspection, and well be able to write tests for all the other logic.

Listing 12-11 shows the extracted run function. For now, were just making the small, incremental improvement of extracting the function. Were still defining the function in src/main.rs.

Filename: src/main.rs

fn main() {
    --snip--

    println!("Searching for {}", config.query);
    println!("In file {}", config.file_path);

    run(config);
}

fn run(config: Config) {
    let contents = fs::read_to_string(config.file_path)
        .expect("Should have been able to read the file");

    println!("With text:\n{contents}");
}

--snip--

Listing 12-11: Extracting a run function containing the rest of the program logic

The run function now contains all the remaining logic from main, starting from reading the file. The run function takes the Config instance as an argument.

Returning Errors from the run Function

With the remaining program logic separated into the run function, we can improve the error handling, as we did with Config::build in Listing 12-9. Instead of allowing the program to panic by calling expect, the run function will return a Result<T, E> when something goes wrong. This will let us further consolidate the logic around handling errors into main in a user-friendly way. Listing 12-12 shows the changes we need to make to the signature and body of run.

Filename: src/main.rs

1 use std::error::Error;

--snip--

2 fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.file_path)3 ?;

    println!("With text:\n{contents}");

  4 Ok(())
}

Listing 12-12: Changing the run function to return Result

Weve made three significant changes here. First, we changed the return type of the run function to Result<(), Box<dyn Error>> [2]. This function previously returned the unit type, (), and we keep that as the value returned in the Ok case.

For the error type, we used the trait object Box<dyn Error> (and weve brought std::error::Error into scope with a use statement at the top [1]). Well cover trait objects in Chapter 17. For now, just know that Box<dyn Error> means the function will return a type that implements the Error trait, but we dont have to specify what particular type the return value will be. This gives us flexibility to return error values that may be of different types in different error cases. The dyn keyword is short for dynamic.

Second, weve removed the call to expect in favor of the ? operator [3], as we talked about in Chapter 9. Rather than panic! on an error, ? will return the error value from the current function for the caller to handle.

Third, the run function now returns an Ok value in the success case [4]. Weve declared the run functions success type as () in the signature, which means we need to wrap the unit type value in the Ok value. This Ok(()) syntax might look a bit strange at first, but using () like this is the idiomatic way to indicate that were calling run for its side effects only; it doesnt return a value we need.

When you run this code, it will compile but will display a warning:

warning: unused `Result` that must be used
  --> src/main.rs:19:5
   |
19 |     run(config);
   |     ^^^^^^^^^^^^
   |
   = note: `#[warn(unused_must_use)]` on by default
   = note: this `Result` may be an `Err` variant, which should be
handled

Rust tells us that our code ignored the Result value and the Result value might indicate that an error occurred. But were not checking to see whether or not there was an error, and the compiler reminds us that we probably meant to have some error-handling code here! Lets rectify that problem now.

Handling Errors Returned from run in main

Well check for errors and handle them using a technique similar to one we used with Config::build in Listing 12-10, but with a slight difference:

Filename: src/main.rs

fn main() {
    --snip--

    println!("Searching for {}", config.query);
    println!("In file {}", config.file_path);

    if let Err(e) = run(config) {
        println!("Application error: {e}");
        process::exit(1);
    }
}

We use if let rather than unwrap_or_else to check whether run returns an Err value and to call process::exit(1) if it does. The run function doesnt return a value that we want to unwrap in the same way that Config::build returns the Config instance. Because run returns () in the success case, we only care about detecting an error, so we dont need unwrap_or_else to return the unwrapped value, which would only be ().

The bodies of the if let and the unwrap_or_else functions are the same in both cases: we print the error and exit.

Splitting Code into a Library Crate

Our minigrep project is looking good so far! Now well split the src/main.rs file and put some code into the src/lib.rs file. That way, we can test the code and have a src/main.rs file with fewer responsibilities.

Lets move all the code that isnt in the main function from src/main.rs to src/lib.rs:

  • The run function definition
  • The relevant use statements
  • The definition of Config
  • The Config::build function definition

The contents of src/lib.rs should have the signatures shown in Listing 12-13 (weve omitted the bodies of the functions for brevity). Note that this wont compile until we modify src/main.rs in Listing 12-14.

Filename: src/lib.rs

use std::error::Error;
use std::fs;

pub struct Config {
    pub query: String,
    pub file_path: String,
}

impl Config {
    pub fn build(
        args: &[String],
    ) -> Result<Config, &'static str> {
        --snip--
    }
}

pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    --snip--
}

Listing 12-13: Moving Config and run into src/lib.rs

Weve made liberal use of the pub keyword: on Config, on its fields and its build method, and on the run function. We now have a library crate that has a public API we can test!

Now we need to bring the code we moved to src/lib.rs into the scope of the binary crate in src/main.rs, as shown in Listing 12-14.

Filename: src/main.rs

use std::env;
use std::process;

use minigrep::Config;

fn main() {
    --snip--
    if let Err(e) = minigrep::run(config) {
        --snip--
    }
}

Listing 12-14: Using the minigrep library crate in src/main.rs

We add a use minigrep::Config line to bring the Config type from the library crate into the binary crates scope, and we prefix the run function with our crate name. Now all the functionality should be connected and should work. Run the program with cargo run and make sure everything works correctly.

Whew! That was a lot of work, but weve set ourselves up for success in the future. Now its much easier to handle errors, and weve made the code more modular. Almost all of our work will be done in src/lib.rs from here on out.

Lets take advantage of this newfound modularity by doing something that would have been difficult with the old code but is easy with the new code: well write some tests!

Developing the Librarys Functionality with Test-Driven Development

Now that weve extracted the logic into src/lib.rs and left the argument collecting and error handling in src/main.rs, its much easier to write tests for the core functionality of our code. We can call functions directly with various arguments and check return values without having to call our binary from the command line.

In this section, well add the searching logic to the minigrep program using the test-driven development (TDD) process with the following steps:

  1. Write a test that fails and run it to make sure it fails for the reason you expect.
  2. Write or modify just enough code to make the new test pass.
  3. Refactor the code you just added or changed and make sure the tests continue to pass.
  4. Repeat from step 1!

Though its just one of many ways to write software, TDD can help drive code design. Writing the test before you write the code that makes the test pass helps to maintain high test coverage throughout the process.

Well test-drive the implementation of the functionality that will actually do the searching for the query string in the file contents and produce a list of lines that match the query. Well add this functionality in a function called search.

Writing a Failing Test

Because we dont need them anymore, lets remove the println! statements from src/lib.rs and src/main.rs that we used to check the programs behavior. Then, in src/lib.rs, well add a tests module with a test function, as we did in Chapter 11. The test function specifies the behavior we want the search function to have: it will take a query and the text to search, and it will return only the lines from the text that contain the query. Listing 12-15 shows this test, which wont compile yet.

Filename: src/lib.rs

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn one_result() {
        let query = "duct";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.";

        assert_eq!(
            vec!["safe, fast, productive."],
            search(query, contents)
        );
    }
}

Listing 12-15: Creating a failing test for the search function we wish we had

This test searches for the string "duct". The text were searching is three lines, only one of which contains "duct" (note that the backslash after the opening double quote tells Rust not to put a newline character at the beginning of the contents of this string literal). We assert that the value returned from the search function contains only the line we expect.

We arent yet able to run this test and watch it fail because the test doesnt even compile: the search function doesnt exist yet! In accordance with TDD principles, well add just enough code to get the test to compile and run by adding a definition of the search function that always returns an empty vector, as shown in Listing 12-16. Then the test should compile and fail because an empty vector doesnt match a vector containing the line "safe, fast, productive.".

Filename: src/lib.rs

pub fn search<'a>(
    query: &str,
    contents: &'a str,
) -> Vec<&'a str> {
    vec![]
}

Listing 12-16: Defining just enough of the search function so our test will compile

Notice that we need to define an explicit lifetime 'a in the signature of search and use that lifetime with the contents argument and the return value. Recall in Chapter 10 that the lifetime parameters specify which argument lifetime is connected to the lifetime of the return value. In this case, we indicate that the returned vector should contain string slices that reference slices of the argument contents (rather than the argument query).

In other words, we tell Rust that the data returned by the search function will live as long as the data passed into the search function in the contents argument. This is important! The data referenced by a slice needs to be valid for the reference to be valid; if the compiler assumes were making string slices of query rather than contents, it will do its safety checking incorrectly.

If we forget the lifetime annotations and try to compile this function, well get this error:

error[E0106]: missing lifetime specifier
  --> src/lib.rs:31:10
   |
29 |     query: &str,
   |            ----
30 |     contents: &str,
   |               ----
31 | ) -> Vec<&str> {
   |          ^ expected named lifetime parameter
   |
   = help: this function's return type contains a borrowed value, but the
signature does not say whether it is borrowed from `query` or `contents`
help: consider introducing a named lifetime parameter
   |
28 ~ pub fn search<'a>(
29 ~     query: &'a str,
30 ~     contents: &'a str,
31 ~ ) -> Vec<&'a str> {
   |

Rust cant possibly know which of the two arguments we need, so we need to tell it explicitly. Because contents is the argument that contains all of our text and we want to return the parts of that text that match, we know contents is the argument that should be connected to the return value using the lifetime syntax.

Other programming languages dont require you to connect arguments to return values in the signature, but this practice will get easier over time. You might want to compare this example with the examples in “Validating References with Lifetimes” on page XX.

Now lets run the test:

$ cargo test
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished test [unoptimized + debuginfo] target(s) in 0.97s
     Running unittests src/lib.rs (target/debug/deps/minigrep-9cd200e5fac0fc94)

running 1 test
test tests::one_result ... FAILED

failures:

---- tests::one_result stdout ----
thread 'tests::one_result' panicked at 'assertion failed: `(left == right)`
  left: `["safe, fast, productive."]`,
 right: `[]`', src/lib.rs:47:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::one_result

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out;
finished in 0.00s

error: test failed, to rerun pass '--lib'

Great, the test fails, exactly as we expected. Lets get the test to pass!

Writing Code to Pass the Test

Currently, our test is failing because we always return an empty vector. To fix that and implement search, our program needs to follow these steps:

  1. Iterate through each line of the contents.
  2. Check whether the line contains our query string.
  3. If it does, add it to the list of values were returning.
  4. If it doesnt, do nothing.
  5. Return the list of results that match.

Lets work through each step, starting with iterating through lines.

Iterating Through Lines with the lines Method

Rust has a helpful method to handle line-by-line iteration of strings, conveniently named lines, that works as shown in Listing 12-17. Note that this wont compile yet.

Filename: src/lib.rs

pub fn search<'a>(
    query: &str,
    contents: &'a str,
) -> Vec<&'a str> {
    for line in contents.lines() {
        // do something with line
    }
}

Listing 12-17: Iterating through each line in contents

The lines method returns an iterator. Well talk about iterators in depth in Chapter 13, but recall that you saw this way of using an iterator in Listing 3-5, where we used a for loop with an iterator to run some code on each item in a collection.

Searching Each Line for the Query

Next, well check whether the current line contains our query string. Fortunately, strings have a helpful method named contains that does this for us! Add a call to the contains method in the search function, as shown in Listing 12-18. Note that this still wont compile yet.

Filename: src/lib.rs

pub fn search<'a>(
    query: &str,
    contents: &'a str,
) -> Vec<&'a str> {
    for line in contents.lines() {
        if line.contains(query) {
            // do something with line
        }
    }
}

Listing 12-18: Adding functionality to see whether the line contains the string in query

At the moment, were building up functionality. To get the code to compile, we need to return a value from the body as we indicated we would in the function signature.

Storing Matching Lines

To finish this function, we need a way to store the matching lines that we want to return. For that, we can make a mutable vector before the for loop and call the push method to store a line in the vector. After the for loop, we return the vector, as shown in Listing 12-19.

Filename: src/lib.rs

pub fn search<'a>(
    query: &str,
    contents: &'a str,
) -> Vec<&'a str> {
    let mut results = Vec::new();

    for line in contents.lines() {
        if line.contains(query) {
            results.push(line);
        }
    }

    results
}

Listing 12-19: Storing the lines that match so we can return them

Now the search function should return only the lines that contain query, and our test should pass. Lets run the test:

$ cargo test
--snip--
running 1 test
test tests::one_result ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0
filtered out; finished in 0.00s

Our test passed, so we know it works!

At this point, we could consider opportunities for refactoring the implementation of the search function while keeping the tests passing to maintain the same functionality. The code in the search function isnt too bad, but it doesnt take advantage of some useful features of iterators. Well return to this example in Chapter 13, where well explore iterators in detail, and look at how to improve it.

Using the search Function in the run Function

Now that the search function is working and tested, we need to call search from our run function. We need to pass the config.query value and the contents that run reads from the file to the search function. Then run will print each line returned from search:

Filename: src/lib.rs

pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.file_path)?;

    for line in search(&config.query, &contents) {
        println!("{line}");
    }

    Ok(())
}

Were still using a for loop to return each line from search and print it.

Now the entire program should work! Lets try it out, first with a word that should return exactly one line from the Emily Dickinson poem: frog.

$ cargo run -- frog poem.txt
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.38s
     Running `target/debug/minigrep frog poem.txt`
How public, like a frog

Cool! Now lets try a word that will match multiple lines, like body:

$ cargo run -- body poem.txt
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep body poem.txt`
I'm nobody! Who are you?
Are you nobody, too?
How dreary to be somebody!

And finally, lets make sure that we dont get any lines when we search for a word that isnt anywhere in the poem, such as monomorphization:

$ cargo run -- monomorphization poem.txt
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep monomorphization poem.txt`

Excellent! Weve built our own mini version of a classic tool and learned a lot about how to structure applications. Weve also learned a bit about file input and output, lifetimes, testing, and command line parsing.

To round out this project, well briefly demonstrate how to work with environment variables and how to print to standard error, both of which are useful when youre writing command line programs.

Working with Environment Variables

Well improve minigrep by adding an extra feature: an option for case-insensitive searching that the user can turn on via an environment variable. We could make this feature a command line option and require that users enter it each time they want it to apply, but by instead making it an environment variable, we allow our users to set the environment variable once and have all their searches be case insensitive in that terminal session.

Writing a Failing Test for the Case-Insensitive search Function

We first add a new search_case_insensitive function that will be called when the environment variable has a value. Well continue to follow the TDD process, so the first step is again to write a failing test. Well add a new test for the new search_case_insensitive function and rename our old test from one_result to case_sensitive to clarify the differences between the two tests, as shown in Listing 12-20.

Filename: src/lib.rs

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn case_sensitive() {
        let query = "duct";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.
Duct tape.";

        assert_eq!(
            vec!["safe, fast, productive."],
            search(query, contents)
        );
    }

    #[test]
    fn case_insensitive() {
        let query = "rUsT";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.
Trust me.";

        assert_eq!(
            vec!["Rust:", "Trust me."],
            search_case_insensitive(query, contents)
        );
    }
}

Listing 12-20: Adding a new failing test for the case-insensitive function were about to add

Note that weve edited the old tests contents too. Weve added a new line with the text "Duct tape." using a capital D that shouldnt match the query "duct" when were searching in a case-sensitive manner. Changing the old test in this way helps ensure that we dont accidentally break the case-sensitive search functionality that weve already implemented. This test should pass now and should continue to pass as we work on the case-insensitive search.

The new test for the case-insensitive search uses "rUsT" as its query. In the search_case_insensitive function were about to add, the query "rUsT" should match the line containing "Rust:" with a capital R and match the line "Trust me." even though both have different casing from the query. This is our failing test, and it will fail to compile because we havent yet defined the search_case_insensitive function. Feel free to add a skeleton implementation that always returns an empty vector, similar to the way we did for the search function in Listing 12-16 to see the test compile and fail.

Implementing the search_case_insensitive Function

The search_case_insensitive function, shown in Listing 12-21, will be almost the same as the search function. The only difference is that well lowercase the query and each line so that whatever the case of the input arguments, theyll be the same case when we check whether the line contains the query.

Filename: src/lib.rs

pub fn search_case_insensitive<'a>(
    query: &str,
    contents: &'a str,
) -> Vec<&'a str> {
  1 let query = query.to_lowercase();
    let mut results = Vec::new();

    for line in contents.lines() {
        if 2 line.to_lowercase().contains(3 &query) {
            results.push(line);
        }
    }

    results
}

Listing 12-21: Defining the search_case_insensitive function to lowercase the query and the line before comparing them

First we lowercase the query string and store it in a shadowed variable with the same name [1]. Calling to_lowercase on the query is necessary so that no matter whether the users query is "rust", "RUST", "Rust", or "rUsT", well treat the query as if it were "rust" and be insensitive to the case. While to_lowercase will handle basic Unicode, it wont be 100% accurate. If we were writing a real application, wed want to do a bit more work here, but this section is about environment variables, not Unicode, so well leave it at that here.

Note that query is now a String rather than a string slice because calling to_lowercase creates new data rather than referencing existing data. Say the query is "rUsT", as an example: that string slice doesnt contain a lowercase u or t for us to use, so we have to allocate a new String containing "rust". When we pass query as an argument to the contains method now, we need to add an ampersand [3] because the signature of contains is defined to take a string slice.

Next, we add a call to to_lowercase on each line to lowercase all characters [2]. Now that weve converted line and query to lowercase, well find matches no matter what the case of the query is.

Lets see if this implementation passes the tests:

running 2 tests
test tests::case_insensitive ... ok
test tests::case_sensitive ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0
filtered out; finished in 0.00s

Great! They passed. Now, lets call the new search_case_insensitive function from the run function. First well add a configuration option to the Config struct to switch between case-sensitive and case-insensitive search. Adding this field will cause compiler errors because we arent initializing this field anywhere yet:

Filename: src/lib.rs

pub struct Config {
    pub query: String,
    pub file_path: String,
    pub ignore_case: bool,
}

We added the ignore_case field that holds a Boolean. Next, we need the run function to check the ignore_case fields value and use that to decide whether to call the search function or the search_case_insensitive function, as shown in Listing 12-22. This still wont compile yet.

Filename: src/lib.rs

pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.file_path)?;

    let results = if config.ignore_case {
        search_case_insensitive(&config.query, &contents)
    } else {
        search(&config.query, &contents)
    };

    for line in results {
        println!("{line}");
    }

    Ok(())
}

Listing 12-22: Calling either search or search_case_insensitive based on the value in config.ignore_case

Finally, we need to check for the environment variable. The functions for working with environment variables are in the env module in the standard library, so we bring that module into scope at the top of src/lib.rs. Then well use the var function from the env module to check to see if any value has been set for an environment variable named IGNORE_CASE, as shown in Listing 12-23.

Filename: src/lib.rs

use std::env;
--snip--

impl Config {
    pub fn build(
        args: &[String]
    ) -> Result<Config, &'static str> {
        if args.len() < 3 {
            return Err("not enough arguments");
        }

        let query = args[1].clone();
        let file_path = args[2].clone();

        let ignore_case = env::var("IGNORE_CASE").is_ok();

        Ok(Config {
            query,
            file_path,
            ignore_case,
        })
    }
}

Listing 12-23: Checking for any value in an environment variable named IGNORE_CASE

Here, we create a new variable, ignore_case. To set its value, we call the env::var function and pass it the name of the IGNORE_CASE environment variable. The env::var function returns a Result that will be the successful Ok variant that contains the value of the environment variable if the environment variable is set to any value. It will return the Err variant if the environment variable is not set.

Were using the is_ok method on the Result to check whether the environment variable is set, which means the program should do a case-insensitive search. If the IGNORE_CASE environment variable isnt set to anything, is_ok will return false and the program will perform a case-sensitive search. We dont care about the value of the environment variable, just whether its set or unset, so were checking is_ok rather than using unwrap, expect, or any of the other methods weve seen on Result.

We pass the value in the ignore_case variable to the Config instance so the run function can read that value and decide whether to call search_case_insensitive or search, as we implemented in Listing 12-22.

Lets give it a try! First well run our program without the environment variable set and with the query to, which should match any line that contains the word to in all lowercase:

$ cargo run -- to poem.txt
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep to poem.txt`
Are you nobody, too?
How dreary to be somebody!

Looks like that still works! Now lets run the program with IGNORE_CASE set to 1 but with the same query to:

$ IGNORE_CASE=1 cargo run -- to poem.txt

If youre using PowerShell, you will need to set the environment variable and run the program as separate commands:

PS> $Env:IGNORE_CASE=1; cargo run -- to poem.txt

This will make IGNORE_CASE persist for the remainder of your shell session. It can be unset with the Remove-Item cmdlet:

PS> Remove-Item Env:IGNORE_CASE

We should get lines that contain to that might have uppercase letters:

Are you nobody, too?
How dreary to be somebody!
To tell your name the livelong day
To an admiring bog!

Excellent, we also got lines containing To! Our minigrep program can now do case-insensitive searching controlled by an environment variable. Now you know how to manage options set using either command line arguments or environment variables.

Some programs allow arguments and environment variables for the same configuration. In those cases, the programs decide that one or the other takes precedence. For another exercise on your own, try controlling case sensitivity through either a command line argument or an environment variable. Decide whether the command line argument or the environment variable should take precedence if the program is run with one set to case sensitive and one set to ignore case.

The std::env module contains many more useful features for dealing with environment variables: check out its documentation to see what is available.

Writing Error Messages to Standard Error Instead of Standard Output

At the moment, were writing all of our output to the terminal using the println! macro. In most terminals, there are two kinds of output: standard output (stdout) for general information and standard error (stderr) for error messages. This distinction enables users to choose to direct the successful output of a program to a file but still print error messages to the screen.

The println! macro is only capable of printing to standard output, so we have to use something else to print to standard error.

Checking Where Errors Are Written

First lets observe how the content printed by minigrep is currently being written to standard output, including any error messages we want to write to standard error instead. Well do that by redirecting the standard output stream to a file while intentionally causing an error. We wont redirect the standard error stream, so any content sent to standard error will continue to display on the screen.

Command line programs are expected to send error messages to the standard error stream so we can still see error messages on the screen even if we redirect the standard output stream to a file. Our program is not currently well behaved: were about to see that it saves the error message output to a file instead!

To demonstrate this behavior, well run the program with > and the file path, output.txt, that we want to redirect the standard output stream to. We wont pass any arguments, which should cause an error:

$ cargo run > output.txt

The > syntax tells the shell to write the contents of standard output to output.txt instead of the screen. We didnt see the error message we were expecting printed to the screen, so that means it must have ended up in the file. This is what output.txt contains:

Problem parsing arguments: not enough arguments

Yup, our error message is being printed to standard output. Its much more useful for error messages like this to be printed to standard error so only data from a successful run ends up in the file. Well change that.

Printing Errors to Standard Error

Well use the code in Listing 12-24 to change how error messages are printed. Because of the refactoring we did earlier in this chapter, all the code that prints error messages is in one function, main. The standard library provides the eprintln! macro that prints to the standard error stream, so lets change the two places we were calling println! to print errors to use eprintln! instead.

Filename: src/main.rs

fn main() {
    let args: Vec<String> = env::args().collect();

    let config = Config::build(&args).unwrap_or_else(|err| {
        eprintln!("Problem parsing arguments: {err}");
        process::exit(1);
    });

    if let Err(e) = minigrep::run(config) {
        eprintln!("Application error: {e}");
        process::exit(1);
    }
}

Listing 12-24: Writing error messages to standard error instead of standard output using eprintln!

Lets now run the program again in the same way, without any arguments and redirecting standard output with >:

$ cargo run > output.txt
Problem parsing arguments: not enough arguments

Now we see the error onscreen and output.txt contains nothing, which is the behavior we expect of command line programs.

Lets run the program again with arguments that dont cause an error but still redirect standard output to a file, like so:

$ cargo run -- to poem.txt > output.txt

We wont see any output to the terminal, and output.txt will contain our results:

Filename: output.txt

Are you nobody, too?
How dreary to be somebody!

This demonstrates that were now using standard output for successful output and standard error for error output as appropriate.

Summary

This chapter recapped some of the major concepts youve learned so far and covered how to perform common I/O operations in Rust. By using command line arguments, files, environment variables, and the eprintln! macro for printing errors, youre now prepared to write command line applications. Combined with the concepts in previous chapters, your code will be well organized, store data effectively in the appropriate data structures, handle errors nicely, and be well tested.

Next, well explore some Rust features that were influenced by functional languages: closures and iterators.