202 lines
8.7 KiB
Plaintext
202 lines
8.7 KiB
Plaintext
= RFC-0001: Otto System Design
|
|
:toc: preamble
|
|
:toclevels: 3
|
|
ifdef::env-github[]
|
|
:tip-caption: :bulb:
|
|
:note-caption: :information_source:
|
|
:important-caption: :heavy_exclamation_mark:
|
|
:caution-caption: :fire:
|
|
:warning-caption: :warning:
|
|
endif::[]
|
|
|
|
.Metadata
|
|
[cols="1h,1"]
|
|
|===
|
|
| RFC
|
|
| 0001
|
|
|
|
| Title
|
|
| Otto System Design
|
|
|
|
| Sponsor
|
|
| link:https://github.com/rtyler/[R Tyler Croy]
|
|
|
|
| Status
|
|
| Draft :speech_balloon:
|
|
|
|
| Type
|
|
| Standards
|
|
|
|
| Created
|
|
| 2019-02-23
|
|
|
|
|===
|
|
|
|
== Abstract
|
|
|
|
Otto is a robust distributed system for scalable continuous integration and
|
|
delivery. To accomplish this Otto is multi-process oriented and distributed by
|
|
default; all system interactions must occur over process boundaries even if
|
|
executing in the same logical host. This document outlines the high level
|
|
architecture but omits specific inter-process communication formats, protocols,
|
|
and dependencies.
|
|
|
|
|
|
== Specification
|
|
|
|
The processes which compose Otto are segmented based on their role or task, and
|
|
functionally independent from one another. They must not share data on disk or
|
|
memory with one another excepting explicit data storage services.
|
|
|
|
The system is composed such that each process holds a relatively well-defined
|
|
area of responsibility. Each of the processes should be easily replaceable, or
|
|
horizontally scalable, depending on the environment. The diagram below is
|
|
incomplete, but one such example would be the "Orchestrator" service, whose
|
|
responsibilities might involve pushing workloads into Agents depending on how
|
|
the CI/CD process has been defined. For a Kubernetes-based deployment of Otto,
|
|
this should be swapped out with a Kubernetes-friendly Orchestrator which
|
|
implements the orchestration API. In an AWS or Azure-based deployment, this
|
|
might instead be provided by a "serverless" function.
|
|
|
|
[source]
|
|
----
|
|
+------------+ +-----------+ +------------+
|
|
| Frontend +-----> Event Bus +-------->Orchestrator|
|
|
+--------^---+ +-----------+ +----------+-+
|
|
| |
|
|
| +-v------+
|
|
| +---------------+------+ Agents |
|
|
| | | +---------+
|
|
| | | +--------|
|
|
| | | +-------+
|
|
| | |
|
|
| +-----------v---+ +-------v------+
|
|
+--->|Relational Data| |Object Storage|
|
|
|Storage | +--------------+
|
|
+---------------+
|
|
|
|
----
|
|
|
|
Each of the above boxes represents a process as part of Otto which would have
|
|
its own API. Rather than logical components which might represent off-the-shelf
|
|
solution.
|
|
|
|
To support this, all interprocess communication must happen via HTTP and in the
|
|
JSON format.
|
|
|
|
All processes are stateless and rely on the relational and object storage for
|
|
any longer term persistence. No other system is the source of record.
|
|
|
|
[[motivation]]
|
|
== Motivation
|
|
|
|
The motivations for Otto's systems design fall into three major areas:
|
|
|
|
1. Safe extensibility, allowing different components to be added or removed
|
|
without affecting the other parts of the system.
|
|
1. Scalability, allowing different components to scale horizontally as
|
|
workloads demand it.
|
|
1. Cost, allowing different components to only execute as needed.
|
|
|
|
The section below contains more reasoning on how these motivations relate to
|
|
other systems and their strengths and weaknesses.
|
|
|
|
== Reasoning
|
|
|
|
When considering potential designs for Otto, the most significant influence was
|
|
the design and implementation of
|
|
link:https://jenkins.io/[Jenkins].
|
|
As the predominant open source CI/CD server, Jenkins has succeeded in gaining
|
|
adoption by being simple to use, but has struggled to adapt to larger-scale
|
|
deployments, or cloud-native environments. Design inspiration also comes from
|
|
link:https://kubernetes.io/[Kubernetes],
|
|
link:https://kafka.apache.org/[Kafka], and
|
|
commonly understood microservice architecture patterns.
|
|
|
|
Considering the motivations described <<motivations,above>>,
|
|
the reasoning expanded in the following sections.
|
|
|
|
=== Safe extensibility
|
|
|
|
Extensibility is to be deemed "_safe_" by the avoidance of side effects on
|
|
other pre-existing components of the system. A compromise, crash, or error in a
|
|
single component **must not** impact other functional areas of the system. In
|
|
Otto, this is accomplished by using process boundaries, even in cases of
|
|
single-logical-host deployments. This process-external extensibility model is
|
|
most successful and prominent in the design and implementation of Kubernetes.
|
|
|
|
The alternative is to incorporate process-internal extensibility, a model many
|
|
plugin-oriented systems such as MySQL, Redis, and to a larger extent Jenkins.
|
|
The key benefit of this process-internal extensibility model is that it appears
|
|
simpler at a superficial layer. Since the introduction of the `dlopen(3)` system
|
|
call, users and administrators have been able to augment functionality of
|
|
programs with dynamically loaded code. In **all cases**, this compromises safety
|
|
of the parent process. The program is effectively trusting the loaded code as if
|
|
it were its own, and thereby granting it access to all of its internal data
|
|
structures and ABIs. In the case of Jenkins in particular, this has resulted in
|
|
core internal APIs becoming de facto public APIs, as plugin authors discover
|
|
classes and methods which suit there purposes.
|
|
|
|
An external-only extensibility system requires that _all_ actors create explicit
|
|
public APIs, while allowing them the freedom of differing implementations
|
|
|
|
The process-internal approach typically confers implementation restrictions on
|
|
the author of the plugin or module. With native applications like MySQL or
|
|
Redis, it is possible to write a module in a language other than C or C++ so
|
|
long as the ABI is respected and the module compiles to native code. In Jenkins,
|
|
although implemented on the JVM, so long as the plugin can operate within the
|
|
confines of the Java environment it _should_ work. In practice, process-internal
|
|
extensibility systems tend to result in a plugin/module monoculture as it offers
|
|
the path of least resistance.
|
|
|
|
iOtto's process-external extensibility model allows significant flexibility as
|
|
practically every programming language and environment allows for implementing
|
|
HTTP services. Even within Otto, it may prove useful to operate services
|
|
implemented in different languages depending on the runtime environment.
|
|
Process-external extensibility allows for this flexibility for all actors in the
|
|
system, including the system itself.
|
|
|
|
=== Scalability
|
|
|
|
Scalability is a key concern for the design of Otto. As present time it is not
|
|
expected to be deployed in a truly multi-tenant environment supporting vastly
|
|
different users as a SaaS. There are important benefits to approaching the
|
|
system design as if it were to need to evolve into a SaaS model at some point
|
|
however.
|
|
|
|
By structuring Otto as separate stateless processes, each process can be
|
|
auto-scaled as workloads demand it. Additionally structuring the "edges" between the different processes to require no single central point of transit, different areas of the system will be able to grow as needed without requiring the entire system to vertically scale. In contrast, consider the implementation of
|
|
link:https://jenkins.io/doc/book/pipeline[Jenkins Pipeline] for example. Under
|
|
the hood Jenkins Pipeline is executed by a Groovy runtime running on the Jenkins
|
|
master, with portions delegated to the agents. It is effectively a very
|
|
traditional hub-and-spoke systems architecture, with the hub (master) devoting
|
|
non-zero compute resources to interpreting the Jenkins Pipeline script. Like any
|
|
hub-and-spoke architecture, this has inherent scale limitations as the hub
|
|
becomes over-utilized.
|
|
|
|
Note: The "Event Bus" in the Otto systems design runs the risk of becoming a central
|
|
hub, crucial to all other operations within the system and thereby causing some
|
|
scaling challenges.
|
|
|
|
=== Cost
|
|
|
|
Cost to operate is another key concern for the design of Otto. The separation of
|
|
distinct processes is intended to allow Otto to naturally graft onto existing
|
|
"serverless" offerings by various cloud-providers. Rather than a design which
|
|
requires a virtual machine to be running a web server to receive webhooks,
|
|
Otto's receiver can run via AWS Lambda and only "come alive" to receive the
|
|
payload, before executing whatever work it must do in response to the webhook.
|
|
As a _reactive_ system, Otto is far cheaper to operate than a fixed-footprint
|
|
infrastructure.
|
|
|
|
The isolation of system state further helps Otto bind into reactive, or
|
|
"serverless" infrastructure components. As nothing except storage is effectively
|
|
persistent, the other components can be operated only as need.
|
|
|
|
|
|
== Security
|
|
|
|
Securing this system is a sufficiently complex topic to merit it's own RFC.
|
|
|
|
== References
|