Add BENCHMARKING.md

This file is meant as an entry point for users and contributors who are
interested in benchmarking rustls. It is linked from the readme so
people can find it easily.

Closes #1478 and #1685
This commit is contained in:
Adolfo Ochagavía 2023-12-19 11:36:39 +01:00 committed by Joe Birr-Pixton
parent 23167ecad6
commit 93228ebdbf
2 changed files with 85 additions and 1 deletions

83
BENCHMARKING.md Normal file
View File

@ -0,0 +1,83 @@
# Benchmarking
This repository includes benchmarks for multiple use cases. They are described below, along with
information on how to run them.
## Throughput and memory usage benchmarks
These benchmarks measure the throughput and memory footprint you get from rustls. They have been
used in the past to compare performance against OpenSSL (see the results of [December
2023](https://github.com/aochagavia/rustls-bench-results) and [July
2019](https://jbp.io/2019/07/01/rustls-vs-openssl-performance.html)). You can also use them to
evaluate rustls' performance on different hardware (e.g. a bare-metal server with support for
AVX-512 instructions vs. a cloud VM with a consumer-grade CPU).
The measured aspects are:
1. Bulk data transfer throughput in MiB/s;
2. Handshake throughput (full, session id, tickets) in handshakes per second;
3. Memory usage per connection.
If you are interested in comparing against OpenSSL, check out the [twin OpenSSL
benchmarks](https://github.com/ctz/openssl-bench), which produce similar measurements.
#### Building
The benchmarks are implemented in the form of "example code" in `rustls/examples/internal/bench.rs`.
Use `cargo build --release -p rustls --example bench` to obtain the corresponding binary (you can
toggle conditionally compiled code with the `--no-default-features` and `--features` flags).
Note: while `cargo build --release --example bench` also works, it results in surprises when used
together with `--no-default-features` because of how Cargo's feature unification works (some
features get enabled automatically by other subcrates).
#### Running
There is a makefile in [admin/bench-measure.mk](admin/bench-measure.mk) providing useful commands to
facilitate benchmarking:
- `make measure`: runs bulk transfer and handshake throughput benchmarks using a predefined list of
cipher suites.
- `make memory`: measures memory usage for different amounts of connections.
You can inspect the makefile to get an idea of the command line arguments accepted by `bench`. With
the right arguments, you can run benchmarks for other cipher suites (through `cargo run --release`
or by directly launching the compiled binary).
#### Reducing noise
We usually extend the duration of the benchmarks in an attempt to neutralize the effect of cold CPU
and page caches, giving us more accurate results. This is done through the `BENCH_MULTIPLIER`
environment variable, which tells the benchmark runner to multiply the amount of work done. For
instance, `BENCH_MULTIPLIER=8` will ensure we do 8 times the work.
Additional ways to reduce noise are:
- Disabling ASLR (through `setarch -R`).
- Disabling CPU dynamic frequency scaling (usually on the BIOS/UEFI level).
- Disabling CPU hyper-threading (usually on the BIOS/UEFI level).
- Setting the Linux CPU governor to performance for all cores.
- Running the benchmarks multiple times (e.g. 30) and taking the median for each scenario (the
[December 2023 results](https://github.com/aochagavia/rustls-bench-results) include Python code
doing this).
## CI benchmarks
These benchmarks are meant to provide _automated_ and _accurate_ feedback on a PR's performance
impact compared to the main branch. By automating them we ensure they are regularly used, by keeping
them accurate we ensure they are actionable (i.e. too much noise would train reviewers to ignore the
information).
The benchmarks themselves are located under [ci-bench](ci-bench), together with a detailed readme
(including instructions on how to run them locally). The automated runner lives in its own
[repository](https://github.com/rustls/rustls-bench-app) and is deployed to a bare-metal machine to
ensure low-noise results.
## Nightly benchmarks
There are some `#[bench]` benchmarks spread throughout the codebase. We do not use them
systematically, but they help understand the performance of smaller pieces of code (one or two
functions), which would be difficult to see when the unit-of-benchmark is an entire handshake.
These benchmarks require a nightly compiler. If you are using `rustup`, you can run them with
`RUSTFLAGS=--cfg=bench cargo +nightly bench`

View File

@ -12,7 +12,8 @@ Rustls is used in production at many organizations and projects. We aim to maint
reasonable API surface stability but the API may evolve as we make changes to accomodate
new features or performance improvements.
We have a [roadmap](ROADMAP.md) for our future plans.
We have a [roadmap](ROADMAP.md) for our future plans. We also have [benchmarks](BENCHMARKING.md) to
prevent performance regressions and to let you evaluate rustls on your target hardware.
If you'd like to help out, please see [CONTRIBUTING.md](CONTRIBUTING.md).