rustls/BENCHMARKING.md

4.0 KiB

Benchmarking

This repository includes benchmarks for multiple use cases. They are described below, along with information on how to run them.

Throughput and memory usage benchmarks

These benchmarks measure the throughput and memory footprint you get from rustls. They have been used in the past to compare performance against OpenSSL (see the results of December 2023 and July 2019). You can also use them to evaluate rustls' performance on different hardware (e.g. a bare-metal server with support for AVX-512 instructions vs. a cloud VM with a consumer-grade CPU).

The measured aspects are:

  1. Bulk data transfer throughput in MiB/s;
  2. Handshake throughput (full, session id, tickets) in handshakes per second;
  3. Memory usage per connection.

If you are interested in comparing against OpenSSL, check out the twin OpenSSL benchmarks, which produce similar measurements.

Building

The benchmarks are implemented in the form of "example code" in rustls/examples/internal/bench.rs. Use cargo build --release -p rustls --example bench to obtain the corresponding binary (you can toggle conditionally compiled code with the --no-default-features and --features flags).

Note: while cargo build --release --example bench also works, it results in surprises when used together with --no-default-features because of how Cargo's feature unification works (some features get enabled automatically by other subcrates).

Running

There is a makefile in admin/bench-measure.mk providing useful commands to facilitate benchmarking:

  • make measure: runs bulk transfer and handshake throughput benchmarks using a predefined list of cipher suites.
  • make memory: measures memory usage for different amounts of connections.

You can inspect the makefile to get an idea of the command line arguments accepted by bench. With the right arguments, you can run benchmarks for other cipher suites (through cargo run --release or by directly launching the compiled binary).

Reducing noise

We usually extend the duration of the benchmarks in an attempt to neutralize the effect of cold CPU and page caches, giving us more accurate results. This is done through the BENCH_MULTIPLIER environment variable, which tells the benchmark runner to multiply the amount of work done. For instance, BENCH_MULTIPLIER=8 will ensure we do 8 times the work.

Additional ways to reduce noise are:

  • Disabling ASLR (through setarch -R).
  • Disabling CPU dynamic frequency scaling (usually on the BIOS/UEFI level).
  • Disabling CPU hyper-threading (usually on the BIOS/UEFI level).
  • Setting the Linux CPU governor to performance for all cores.
  • Running the benchmarks multiple times (e.g. 30) and taking the median for each scenario (the December 2023 results include Python code doing this).

CI benchmarks

These benchmarks are meant to provide automated and accurate feedback on a PR's performance impact compared to the main branch. By automating them we ensure they are regularly used, by keeping them accurate we ensure they are actionable (i.e. too much noise would train reviewers to ignore the information).

The benchmarks themselves are located under ci-bench, together with a detailed readme (including instructions on how to run them locally). The automated runner lives in its own repository and is deployed to a bare-metal machine to ensure low-noise results.

Nightly benchmarks

There are some #[bench] benchmarks spread throughout the codebase. We do not use them systematically, but they help understand the performance of smaller pieces of code (one or two functions), which would be difficult to see when the unit-of-benchmark is an entire handshake.

These benchmarks require a nightly compiler. If you are using rustup, you can run them with RUSTFLAGS=--cfg=bench cargo +nightly bench