sled/src/doc/motivating_experiences/mod.rs

97 lines
4.2 KiB
XML

//! <p align="center">
//! <img src="https://raw.githubusercontent.com/spacejam/sled/main/art/tree_face.png" width="20%" height="auto" />
//! </p>
//!
//! # Experiences with Other Systems
//!
//! sled is motivated by the experiences gained while working with other
//! stateful systems, outlined below.
//!
//! Most of the points below are learned from being burned, rather than
//! delighted.
//!
//! #### MySQL
//!
//! * make it easy to tail the replication stream in flexible topologies
//! * support merging shards a la MariaDB
//! * support mechanisms for live, lock-free schema updates a la
//! pt-online-schema-change
//! * include GTID in all replication information
//! * actively reduce tree fragmentation
//! * give operators and distributed database creators first-class support for
//! replication, sharding, backup, tuning, and diagnosis
//! * O_DIRECT + real linux AIO is worth the effort
//!
//! #### Redis
//!
//! * provide high-level collections that let engineers get to their business
//! logic as quickly as possible instead of forcing them to define a schema in
//! a relational system (usually spending an hour+ googling how to even do it)
//! * don't let single slow requests block all other requests to a shard
//! * let operators peer into the sequence of operations that hit the database
//! to track down bad usage
//! * don't force replicas to retrieve the entire state of the leader when they
//! begin replication
//!
//! #### HBase
//!
//! * don't split "the source of truth" across too many decoupled systems or you
//! will always have downtime
//! * give users first-class APIs to peer into their system state without
//! forcing them to write scrapers
//! * serve http pages for high-level overviews and possibly log access
//! * coprocessors are awesome but people should have easy ways of doing
//! secondary indexing
//!
//! #### RocksDB
//!
//! * give users tons of flexibility with different usage patterns
//! * don't force users to use distributed machine learning to discover
//! configurations that work for their use cases
//! * merge operators are extremely powerful
//! * merge operators should be usable from serial transactions across multiple
//! keys
//!
//! #### etcd
//!
//! * raft makes operating replicated systems SO MUCH EASIER than popular
//! relational systems / redis etc...
//! * modify raft to use leader leases instead of using the paxos register,
//! avoiding livelocks in the presence of simple partitions
//! * give users flexible interfaces
//! * reactive semantics are awesome, but access must be done through smart
//! clients, because users will assume watches are reliable
//! * if we have smart clients anyway, quorum reads can be cheap by
//! lower-bounding future reads to the raft id last observed
//! * expose the metrics and operational levers required to build a self-driving
//! stateful system on top of k8s/mesos/cloud providers/etc...
//!
//! #### Tendermint
//!
//! * build things in a testable way from the beginning
//! * don't seek gratuitous concurrency
//! * allow replication streams to be used in flexible ways
//! * instant finality (or interface finality, the thing should be done by the
//! time the request successfully returns to the client) is mandatory for nice
//! high-level interfaces that don't push optimism (and rollbacks) into
//! interfacing systems
//!
//! #### LMDB
//!
//! * approach a wait-free tree traversal for reads
//! * use modern tree structures that can support concurrent writers
//! * multi-process is nice for browsers etc...
//! * people value read performance and are often forgiving of terrible write
//! performance for most workloads
//!
//! #### Zookeeper
//! * reactive semantics are awesome, but access must be done through smart
//! clients, because users will assume watches are reliable
//! * the more important the system, the more you should keep old snapshots
//! around for emergency recovery
//! * never assume a hostname that was resolvable in the past will be resolvable
//! in the future
//! * if a critical thread dies, bring down the entire system
//! * make replication configuration as simple as possible. people will mess up
//! the order and cause split brains if this is not automated.