rustls/BENCHMARKING.md

# Benchmarking

This repository includes benchmarks for multiple use cases. They are described below, along with
information on how to run them.

## Throughput and memory usage benchmarks

These benchmarks measure the throughput and memory footprint you get from rustls. They have been
used in the past to compare performance against OpenSSL (see the results of [December
2023](https://github.com/aochagavia/rustls-bench-results) and [July
2019](https://jbp.io/2019/07/01/rustls-vs-openssl-performance.html)). You can also use them to
evaluate rustls' performance on different hardware (e.g. a bare-metal server with support for
AVX-512 instructions vs. a cloud VM with a consumer-grade CPU).

The measured aspects are:

1. Bulk data transfer throughput in MiB/s;
2. Handshake throughput (full, session id, tickets) in handshakes per second;
3. Memory usage per connection.

If you are interested in comparing against OpenSSL, check out the [twin OpenSSL
benchmarks](https://github.com/ctz/openssl-bench), which produce similar measurements.

#### Building

The benchmarks are implemented in the form of "example code" in `rustls/examples/internal/bench.rs`.
Use `cargo build --release -p rustls --example bench` to obtain the corresponding binary (you can
toggle conditionally compiled code with the `--no-default-features` and `--features` flags).

Note: while `cargo build --release --example bench` also works, it results in surprises when used
together with `--no-default-features` because of how Cargo's feature unification works (some
features get enabled automatically by other subcrates).

#### Running

There is a makefile in [admin/bench-measure.mk](admin/bench-measure.mk) providing useful commands to
facilitate benchmarking:

- `make measure`: runs bulk transfer and handshake throughput benchmarks using a predefined list of
  cipher suites.
- `make memory`: measures memory usage for different amounts of connections.

You can inspect the makefile to get an idea of the command line arguments accepted by `bench`. With
the right arguments, you can run benchmarks for other cipher suites (through `cargo run --release`
or by directly launching the compiled binary).

#### Reducing noise

We usually extend the duration of the benchmarks in an attempt to neutralize the effect of cold CPU
and page caches, giving us more accurate results. This is done through the `BENCH_MULTIPLIER`
environment variable, which tells the benchmark runner to multiply the amount of work done. For
instance, `BENCH_MULTIPLIER=8` will ensure we do 8 times the work.

Additional ways to reduce noise are:

- Disabling ASLR (through `setarch -R`).
- Disabling CPU dynamic frequency scaling (usually on the BIOS/UEFI level).
- Disabling CPU hyper-threading (usually on the BIOS/UEFI level).
- Setting the Linux CPU governor to performance for all cores.
- Running the benchmarks multiple times (e.g. 30) and taking the median for each scenario (the
  [December 2023 results](https://github.com/aochagavia/rustls-bench-results) include Python code
  doing this).

## CI benchmarks

These benchmarks are meant to provide _automated_ and _accurate_ feedback on a PR's performance
impact compared to the main branch. By automating them we ensure they are regularly used, by keeping
them accurate we ensure they are actionable (i.e. too much noise would train reviewers to ignore the
information).

The benchmarks themselves are located under [ci-bench](ci-bench), together with a detailed readme
(including instructions on how to run them locally). The automated runner lives in its own
[repository](https://github.com/rustls/rustls-bench-app) and is deployed to a bare-metal machine to
ensure low-noise results.

## Nightly benchmarks

There are some `#[bench]` benchmarks spread throughout the codebase. We do not use them
systematically, but they help understand the performance of smaller pieces of code (one or two
functions), which would be difficult to see when the unit-of-benchmark is an entire handshake.

These benchmarks require a nightly compiler. If you are using `rustup`, you can run them with
`RUSTFLAGS=--cfg=bench cargo +nightly bench`
Add BENCHMARKING.md This file is meant as an entry point for users and contributors who are interested in benchmarking rustls. It is linked from the readme so people can find it easily. Closes #1478 and #1685 2023-12-19 10:36:39 +00:00			`# Benchmarking`

			`This repository includes benchmarks for multiple use cases. They are described below, along with`
			`information on how to run them.`

			`## Throughput and memory usage benchmarks`

			`These benchmarks measure the throughput and memory footprint you get from rustls. They have been`
			`used in the past to compare performance against OpenSSL (see the results of [December`
			`2023](https://github.com/aochagavia/rustls-bench-results) and [July`
			`2019](https://jbp.io/2019/07/01/rustls-vs-openssl-performance.html)). You can also use them to`
			`evaluate rustls' performance on different hardware (e.g. a bare-metal server with support for`
			`AVX-512 instructions vs. a cloud VM with a consumer-grade CPU).`

			`The measured aspects are:`

			`1. Bulk data transfer throughput in MiB/s;`
			`2. Handshake throughput (full, session id, tickets) in handshakes per second;`
			`3. Memory usage per connection.`

			`If you are interested in comparing against OpenSSL, check out the [twin OpenSSL`
			`benchmarks](https://github.com/ctz/openssl-bench), which produce similar measurements.`

			`#### Building`

			The benchmarks are implemented in the form of "example code" in `rustls/examples/internal/bench.rs`.
			Use `cargo build --release -p rustls --example bench` to obtain the corresponding binary (you can
			toggle conditionally compiled code with the `--no-default-features` and `--features` flags).

			Note: while `cargo build --release --example bench` also works, it results in surprises when used
			together with `--no-default-features` because of how Cargo's feature unification works (some
			`features get enabled automatically by other subcrates).`

			`#### Running`

			`There is a makefile in [admin/bench-measure.mk](admin/bench-measure.mk) providing useful commands to`
			`facilitate benchmarking:`

			- `make measure`: runs bulk transfer and handshake throughput benchmarks using a predefined list of
			`cipher suites.`
			- `make memory`: measures memory usage for different amounts of connections.

			You can inspect the makefile to get an idea of the command line arguments accepted by `bench`. With
			the right arguments, you can run benchmarks for other cipher suites (through `cargo run --release`
			`or by directly launching the compiled binary).`

			`#### Reducing noise`

			`We usually extend the duration of the benchmarks in an attempt to neutralize the effect of cold CPU`
			and page caches, giving us more accurate results. This is done through the `BENCH_MULTIPLIER`
			`environment variable, which tells the benchmark runner to multiply the amount of work done. For`
			instance, `BENCH_MULTIPLIER=8` will ensure we do 8 times the work.

			`Additional ways to reduce noise are:`

			- Disabling ASLR (through `setarch -R`).
			`- Disabling CPU dynamic frequency scaling (usually on the BIOS/UEFI level).`
			`- Disabling CPU hyper-threading (usually on the BIOS/UEFI level).`
			`- Setting the Linux CPU governor to performance for all cores.`
			`- Running the benchmarks multiple times (e.g. 30) and taking the median for each scenario (the`
			`[December 2023 results](https://github.com/aochagavia/rustls-bench-results) include Python code`
			`doing this).`

			`## CI benchmarks`

			`These benchmarks are meant to provide _automated_ and _accurate_ feedback on a PR's performance`
			`impact compared to the main branch. By automating them we ensure they are regularly used, by keeping`
			`them accurate we ensure they are actionable (i.e. too much noise would train reviewers to ignore the`
			`information).`

			`The benchmarks themselves are located under [ci-bench](ci-bench), together with a detailed readme`
			`(including instructions on how to run them locally). The automated runner lives in its own`
			`[repository](https://github.com/rustls/rustls-bench-app) and is deployed to a bare-metal machine to`
			`ensure low-noise results.`

			`## Nightly benchmarks`

			There are some `#[bench]` benchmarks spread throughout the codebase. We do not use them
			`systematically, but they help understand the performance of smaller pieces of code (one or two`
			`functions), which would be difficult to see when the unit-of-benchmark is an entire handshake.`

			These benchmarks require a nightly compiler. If you are using `rustup`, you can run them with
			`RUSTFLAGS=--cfg=bench cargo +nightly bench`