Write about Code Valet

This commit is contained in:
R Tyler Croy 2019-02-03 15:24:49 -08:00
parent 66052ad822
commit 34ce477c7e
No known key found for this signature in database
GPG Key ID: E5C92681BEF6CEA2
2 changed files with 183 additions and 0 deletions

View File

@ -0,0 +1,177 @@
---
layout: post
title: "Even a failed experiment teaches you something"
tags:
- jenkins
- codevalet
---
The [most recent Jenkins security
advisory](https://jenkins.io/security/advisory/2019-01-28/) contains a fix for
an issue in the [GitHub Authentication plugin](https://plugins.jenkins.io/github-auth). One
which I reported many moons ago, during an experiment I named "Code Valet."
Seeing the issue finally resolved brought fond memories back into my mind and I
realized that I have never really reflected and shared what it was and more
importantly: why it failed. At it's core, Code Valet was intended to solve two
fundamental problems: firstly, I wanted a [Jenkins
Pipeline](https://jenkins.io/doc/book/pipeline/) as a Service, since I find
Jenkins [Declarative] Pipeline to be a very useful tool. Secondly, the Jenkins
project needed to shift its footing towards a continuous feedback and
continuous delivery model. Code Valet aimed to solve both of these problems.
CloudBees had tried to run "Jenkins as a Service" with its now (finally!)
defunct "DEV@Cloud" service, or Kohsuke's "Buildhive" project. DEV@Cloud
provided users with a _full_ Jenkins installation, which I believe was part of
its undoing. The essential balancing act of developer tools is that we must
provide our customers with enough power and flexibility, but not so much that
they shoot themselves in the foot. Providing a full Jenkins installation, to
many people, is not only giving them the gun, but loading it, pointing it at
their foot, and then daring them not to shoot. Jenkins is **powerful** but
unfortunately it can become unwieldy and a support nightmare. "Jenkins
Pipeline as a Service" meant something different to me, Code Valet would allow
users to run Pipelines but not actually _administer_ the instance. In fact, I
made explicit design choices to prevent end-users from even seeing too much
behind the curtains of Code Valet. I consider myself an expert at configuring
and operating Jenkins, and everything in Code Valet was tuned appropriately for
the users already, nothing on an administrative level was necessary. In fact,
the user experience was also intentionally locked into [Blue
Ocean](https://jenkins.io/projects/blueocean/), and it was actually impossible
to do anything but create Pipelines and run them. For my own open source
projects this was _perfect_, I could easily add a `Jenkinsfile` to my GitHub
repositories, and things just worked!
To address the second challenge, Code Valet's Jenkins image was **bleeding
edge**. Not in the way [Jenkins
Evergreen](https://jenkins.io/projects/evergreen) is bleeding edge, but more
Gentoo Linux-style. Jenkins core was built daily from the `HEAD` of the master
branch. The batteries-included plugins were also **all** built from their
respective master branches. The Git plugin, Azure VM Agents plugin, Pipeline
plugins, _everything in the system was the absolute latest_. As you may have
guessed, this resulted in dozens of very interesting build and runtime failures
which I reported upstream. For the first time, I had an environment which was
providing "real world" SaaS-style feedback, using [Sentry](https://sentry.io)
on _real_ Jenkins workloads.
Code Valet operated as an interesting experiment out of the CloudBees CTO
office for 4-5 months before I ultimately shut it down. There was nothing more
to learn, and it was clear Code Valet did not have any immediate future within
the CloudBees product roadmap.
### Lessons
For my taste, that bleeding edge turned out to be a little too
rough. I filed two dozen tickets for various plugins, some of which were
addressed very quickly, and a couple more which remain open. Rapid feedback is
only useful with rapid(ish) iteration. Like many large and federated open
source communities, the Jenkins project suffers a bit from uneven levels of
investment across the plugin ecosystem. For example, while Microsoft's
developers were very responsive to issues discovered in their Azure plugins,
another plugin integral to Code Valet was maintained by one person who might
address issues once in a while if work wasn't too hectic. By no means to I
fault maintainers for not being responsive for volunteer projects, but building
a complex system from components with disparate levels of maturity can be
painful. Fortunately, many of pains from release management with Code Valet
went on to inform and improve the design of what came after: [Jenkins
Evergreen](https://github.com/jenkins-infra/evergreen).
Before I built Code Valet, I could have likely filled half a book with thoughts
and practices on security practices for Jenkins. Originally designed with
internal development teams in mind, running Jenkins on a hostile network like
the public internet is an uphill battle, updating default configuration
settings, disabling functionality to reduce the attack surface area, and
ensuring security best practices are being enforced and adhered to by the
users. In the case of Code Valet, everything got _far more challenging._.
Combine the existing lock down procedures for a Jenkins on the public internet,
and then add the requirement to lock down Jenkins far beyond what it natively
supports, in order to provide users with that limited "Jenkins Pipeline as a
Service" experience. I ended up writing a significant amount of Groovy code to
configure and adjust settings throughout Jenkins, as [Configuration as
Code](https://jenkins.io/projects/jcasc/) didn't exist at the time. At times
this still wasn't enough, and I resorted to clever container or nginx hacks to
disable, hide, or otherwise obfuscate certain aspects of Jenkins' behavior.
[All of these hacks](https://github.com/codevalet/master) I still consider to
be rather clever, but trying to reduce the surface area of Jenkins is an
endless struggle. The system yearns to be extended in new and different ways,
so it's a constant game of whack-a-mole trying to close things up. Locking down
Jenkins Pipeline alone is arguably an impossible task, something not even my
[silly Pipeline Shared
Library](/2017/08/03/overriding-builtin-steps-pipeline.html) hacks could
manage.
The cost model for Code Valet was never something I expected to be
net-positive. In discussions I would refer to it as a "loss leader", something
to drive adoption of Jenkins Pipeline with a calculated user acquisition cost.
Still, Code Valet was deployed onto Kubernetes (AKS) and very intentionally
restricted to reduce per-user cost and overhead wherever possible. Tightly
packing Jenkins master containers into Kubernetes, and then dynamically
provisioning Azure VM and container agents for workloads remain design patterns
I stand by for cheaply running Jenkins-as-a-Service, but Jenkins remains
undoubtedly **huge**. I could not get decent performance in a low enough memory
and CPU footprint for Code Valet _not_ to be a loss leader. And while Kohsuke's
Buildhive project was, if I recall, running one big multi-tenant Jenkins
instance, Code Valet by design used per-user instances. The exact numbers I
cannot remember, but I spent my time thinking of more and more novel, yet
non-invasive ways to reduce the monthly cost per user. Looking across the
market today, it's very clear how punishing the race to the bottom for price on
CI-as-a-Service products has been. Most vendors spend a non-trivial amount of
time finding clever ways of engaging in AWS Spot Instance arbitrage or other
means of shaving pennies of compute-hours where possible. All this before
GitHub Actions recently showed up on the scene and dropped the bottom out of
the market entirely. In my opinion, developer tools has always been a
challenging market to work in. Developers inherently undervalue their tools,
recognizing the importances of a $3000 laptop or a $1000 chair to daily
productivity, we still balk at $15/month services or licenses which would
otherwise improve our lives.
Early in 2018, CloudBees acquired [Code Ship](https://codeship.com), bringing
onboard a great product and engineering team, but also a pretty good
CI-as-a-Service offering. While Code Ship doesn't speak Jenkins Pipeline, it
was already quite mature with significant market traction, helping close the
book on the Code Valet experiment.
---
The CI/CD landscape in 2019 is littered with various forms of declarative and
turing-complete YAML. I still find myself wanting a Jenkins Pipeline as a
Service because I still fervently believe that Pipeline is a good tool. The
modelling capacity is better than anything else, and the extensibility options
are superb, whether with Shared Libraries, or a variant my pal [James
Dumay](https://github.com/i386) prototyped which never made it out
of the lab. That said, running a large Jenkins infrastructure no longer
appeals to me. There's a very good reason why CloudBees builds and sells their
[CloudBees Core](https://www.cloudbees.com/products/cloudbees-core) product:
scaling Jenkins CI/CD as a service is **hard**. For something intended to be
generally available like Code Valet, I would now argue that it is
**impossible** to do with off-the-shelf Jenkins.
The direction Pipeline is heading, driven by one of the chief architects of
Jenkins Pipeline [Andrew Bayer](https://github.com/abayer) is looking more and
more _parseable_ which may leave the door open to alternative runtime engines
for Jenkins Pipeline in the future. To me that is key for a descendant of Code
Valet. With an execution engine designed for the needs of a
Pipeline-as-a-Service offering, Jenkins Pipeline would more easily be supported
from a security, maintainability, and cost standpoint.
Ultimately I think the biggest lesson I learned from Code Valet was to think
**bigger**. I started out using Jenkins as the runtime for Code Valet because
Blue Ocean already existed, Jenkins Pipeline already existed, all these things
were built already, albeit for a very different purpose. I intentionally
steered away from building a new engine because I figured it would be too
much work. Some months after I first created Code Valet, I ended up writing a
parser for Jenkins Pipeline's and a basic execution engine. While neither were
complete, I was surprised with how little effort it took to built a basic
Jenkins Pipeline from scratch, compared to the effort of reducing Jenkins to
something smaller.
Rather than trying to use a swiss-army knife as a sword, it may be a better
idea to melt it down and simply forge anew.

6
tag/codevalet.md Normal file
View File

@ -0,0 +1,6 @@
---
layout: tag_page
title: "Tag: codevalet"
tag: codevalet
robots: noindex
---