From 93228ebdbfbe924cdbf7b4d4825eb385b1c0ac1f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Adolfo=20Ochagav=C3=ADa?= <github@adolfo.ochagavia.nl>
Date: Tue, 19 Dec 2023 11:36:39 +0100
Subject: [PATCH] Add BENCHMARKING.md

This file is meant as an entry point for users and contributors who are
interested in benchmarking rustls. It is linked from the readme so
people can find it easily.

Closes #1478 and #1685
---
 BENCHMARKING.md | 83 +++++++++++++++++++++++++++++++++++++++++++++++++
 README.md       |  3 +-
 2 files changed, 85 insertions(+), 1 deletion(-)
 create mode 100644 BENCHMARKING.md

diff --git a/BENCHMARKING.md b/BENCHMARKING.md
new file mode 100644
index 00000000..93aa8049
--- /dev/null
+++ b/BENCHMARKING.md
@@ -0,0 +1,83 @@
+# Benchmarking
+
+This repository includes benchmarks for multiple use cases. They are described below, along with
+information on how to run them.
+
+## Throughput and memory usage benchmarks
+
+These benchmarks measure the throughput and memory footprint you get from rustls. They have been
+used in the past to compare performance against OpenSSL (see the results of [December
+2023](https://github.com/aochagavia/rustls-bench-results) and [July
+2019](https://jbp.io/2019/07/01/rustls-vs-openssl-performance.html)). You can also use them to
+evaluate rustls' performance on different hardware (e.g. a bare-metal server with support for
+AVX-512 instructions vs. a cloud VM with a consumer-grade CPU).
+
+The measured aspects are:
+
+1. Bulk data transfer throughput in MiB/s;
+2. Handshake throughput (full, session id, tickets) in handshakes per second;
+3. Memory usage per connection.
+
+If you are interested in comparing against OpenSSL, check out the [twin OpenSSL
+benchmarks](https://github.com/ctz/openssl-bench), which produce similar measurements.
+
+#### Building
+
+The benchmarks are implemented in the form of "example code" in `rustls/examples/internal/bench.rs`.
+Use `cargo build --release -p rustls --example bench` to obtain the corresponding binary (you can
+toggle conditionally compiled code with the `--no-default-features` and `--features` flags).
+
+Note: while `cargo build --release --example bench` also works, it results in surprises when used
+together with `--no-default-features` because of how Cargo's feature unification works (some
+features get enabled automatically by other subcrates).
+
+#### Running
+
+There is a makefile in [admin/bench-measure.mk](admin/bench-measure.mk) providing useful commands to
+facilitate benchmarking:
+
+- `make measure`: runs bulk transfer and handshake throughput benchmarks using a predefined list of
+  cipher suites.
+- `make memory`: measures memory usage for different amounts of connections.
+
+You can inspect the makefile to get an idea of the command line arguments accepted by `bench`. With
+the right arguments, you can run benchmarks for other cipher suites (through `cargo run --release`
+or by directly launching the compiled binary).
+
+#### Reducing noise
+
+We usually extend the duration of the benchmarks in an attempt to neutralize the effect of cold CPU
+and page caches, giving us more accurate results. This is done through the `BENCH_MULTIPLIER`
+environment variable, which tells the benchmark runner to multiply the amount of work done. For
+instance, `BENCH_MULTIPLIER=8` will ensure we do 8 times the work.
+
+Additional ways to reduce noise are:
+
+- Disabling ASLR (through `setarch -R`).
+- Disabling CPU dynamic frequency scaling (usually on the BIOS/UEFI level).
+- Disabling CPU hyper-threading (usually on the BIOS/UEFI level).
+- Setting the Linux CPU governor to performance for all cores.
+- Running the benchmarks multiple times (e.g. 30) and taking the median for each scenario (the
+  [December 2023 results](https://github.com/aochagavia/rustls-bench-results) include Python code
+  doing this).
+
+## CI benchmarks
+
+These benchmarks are meant to provide _automated_ and _accurate_ feedback on a PR's performance
+impact compared to the main branch. By automating them we ensure they are regularly used, by keeping
+them accurate we ensure they are actionable (i.e. too much noise would train reviewers to ignore the
+information).
+
+The benchmarks themselves are located under [ci-bench](ci-bench), together with a detailed readme
+(including instructions on how to run them locally). The automated runner lives in its own
+[repository](https://github.com/rustls/rustls-bench-app) and is deployed to a bare-metal machine to
+ensure low-noise results.
+
+## Nightly benchmarks
+
+There are some `#[bench]` benchmarks spread throughout the codebase. We do not use them
+systematically, but they help understand the performance of smaller pieces of code (one or two
+functions), which would be difficult to see when the unit-of-benchmark is an entire handshake.
+
+These benchmarks require a nightly compiler. If you are using `rustup`, you can run them with
+`RUSTFLAGS=--cfg=bench cargo +nightly bench`
\ No newline at end of file
diff --git a/README.md b/README.md
index 9bcc81e7..6ce0a3ec 100644
--- a/README.md
+++ b/README.md
@@ -12,7 +12,8 @@ Rustls is used in production at many organizations and projects. We aim to maint
 reasonable API surface stability but the API may evolve as we make changes to accomodate
 new features or performance improvements.
 
-We have a [roadmap](ROADMAP.md) for our future plans.
+We have a [roadmap](ROADMAP.md) for our future plans. We also have [benchmarks](BENCHMARKING.md) to
+prevent performance regressions and to let you evaluate rustls on your target hardware.
 
 If you'd like to help out, please see [CONTRIBUTING.md](CONTRIBUTING.md).