Resolves#111. Creates a `StaticExecutor` type under a feature flag and allows
constructing it from an `Executor` via `Executor::leak`. Unlike the executor
it came from, it's a wrapper around a `State` and omits all changes to
`active`.
Note, unlike the API proposed in #111, this PR also includes a unsafe
`StaticExecutor::spawn_scoped` for spawning non-'static tasks, where the
caller is responsible for ensuring that the task doesn't outlive the borrowed
state. This would be required for Bevy to migrate to this type, where we're
currently using lifetime transmutation on `Executor` to enable
`Thread::scope`-like APIs for working with borrowed state. `StaticExecutor`
does not have an external lifetime parameter so this approach is infeasible
without such an API.
The performance gains while using the type are substantial:
```
single_thread/executor::spawn_one
time: [1.6157 µs 1.6238 µs 1.6362 µs]
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
single_thread/executor::spawn_batch
time: [28.169 µs 29.650 µs 32.196 µs]
Found 19 outliers among 100 measurements (19.00%)
10 (10.00%) low severe
3 (3.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
single_thread/executor::spawn_many_local
time: [6.1952 ms 6.2230 ms 6.2578 ms]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe
single_thread/executor::spawn_recursively
time: [50.202 ms 50.479 ms 50.774 ms]
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
single_thread/executor::yield_now
time: [5.8795 ms 5.8883 ms 5.8977 ms]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
multi_thread/executor::spawn_one
time: [1.2565 µs 1.2979 µs 1.3470 µs]
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe
multi_thread/executor::spawn_batch
time: [38.009 µs 43.693 µs 52.882 µs]
Found 22 outliers among 100 measurements (22.00%)
21 (21.00%) high mild
1 (1.00%) high severe
Benchmarking multi_thread/executor::spawn_many_local: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 386.6s, or reduce sample count to 10.
multi_thread/executor::spawn_many_local
time: [27.492 ms 27.652 ms 27.814 ms]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
3 (3.00%) high mild
Benchmarking multi_thread/executor::spawn_recursively: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.6s, or reduce sample count to 30.
multi_thread/executor::spawn_recursively
time: [165.82 ms 166.04 ms 166.26 ms]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
multi_thread/executor::yield_now
time: [22.469 ms 22.649 ms 22.798 ms]
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) low severe
3 (3.00%) low mild
single_thread/leaked_executor::spawn_one
time: [1.4717 µs 1.4778 µs 1.4832 µs]
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe
single_thread/leaked_executor::spawn_many_local
time: [4.2622 ms 4.3065 ms 4.3489 ms]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) low mild
single_thread/leaked_executor::spawn_recursively
time: [26.566 ms 26.899 ms 27.228 ms]
single_thread/leaked_executor::yield_now
time: [5.7200 ms 5.7270 ms 5.7342 ms]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
multi_thread/leaked_executor::spawn_one
time: [1.3755 µs 1.4321 µs 1.4892 µs]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
multi_thread/leaked_executor::spawn_many_local
time: [4.1838 ms 4.2394 ms 4.2989 ms]
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high mild
multi_thread/leaked_executor::spawn_recursively
time: [43.074 ms 43.159 ms 43.241 ms]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
multi_thread/leaked_executor::yield_now
time: [23.210 ms 23.257 ms 23.302 ms]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
```
This commit aims to add benchmarks that more realistically reflect
workloads that might happen in the real world.
These benchmarks are as follows:
- "channels", which sets up TASKS tasks, where each task uses a channel
to wake up the next one.
- "server", which tries to simulate a web server-type scenario.
Signed-off-by: John Nunley <dev@notgull.net>
For some workloads many tasks are spawned at a time. This requires
locking and unlocking the executor's inner lock every time you spawn a
task. If you spawn many tasks this can be expensive.
This commit exposes a new "spawn_batch" method on both types. This
method allows the user to spawn an entire set of tasks at a time.
Closes#91
Signed-off-by: John Nunley <dev@notgull.net>
Currently, runner local queues rely on a RwLock<Vec<Arc<ConcurrentQueue>>>> to store the queues instead of using actual thread-local storage.
This adds thread_local as a dependency, but this should allow the executor to work steal without needing to hold a lock, as well as allow tasks to schedule onto the local queue directly, where possible, instead of always relying on the global injector queue.
Fixes#62
Co-authored-by: John Nunley <jtnunley01@gmail.com>