Add some recent blog posts

This commit is contained in:
R. Tyler Croy 2017-03-14 07:37:01 -07:00
parent 1c303a17fa
commit f01c26d450
No known key found for this signature in database
GPG Key ID: 1426C7DC3F51E16F
2 changed files with 353 additions and 0 deletions

View File

@ -0,0 +1,64 @@
---
layout: post
title: Jenkins will not be part of GSoC 2017
tags:
- jenkins
- opinion
---
Unfortunately the Jenkins project will not be participating in the [Google
Summer of Code](https://summerofcode.withgoogle.com/) (GSoC) 2017. While I am
disappointed, I am not all that surprised. Last year, our inaugural year in
GSoC, was tough insofar that we had to learn many things the hard way, did a
poor job of selecting student proposals, and failed to recruit a satisfactory
number of mentors.
This year, I campaigned for a change in our proposal process. Instead of
allowing anybody from the project to throw an idea into the top-hat of
suggested project ideas, we would only list suggested project ideas that had a
modicum of mentor commitment. Basically, if you weren't willing to help mentor
a student on your suggested project, we weren't going to list it.
As a result we only had 3 suggested project ideas. Disappointing.
Considering our experience last year, where
[Oleg](https://github.com/oleg-nenashev) bore a significant personal burden
stepping in as mentor when others failed to meet the commitment necessary, I
would rather not have been accepted than to force Org Admins to do that again.
In my opinion the Jenkins project, or **any** open source project, should not
attempt to participate in Google's Summer of Code without being able to provide
the necessary commitment to mentor students during the program. It's exciting
to imagine new contributors, and some funds donated, to the project, but the
end-goal is to mentor and cultivate the next generation of open source
contributors. The students must come first.
While it's disappointing that the Jenkins project won't have an influx of
students this year, I have been discussing plans with Oleg to find ways in
which we can mentor or encourage more activity from the hundreds of "casual
contributors" to the Jenkins project.
In between this year and next, I think we (the Jenkins project) can focus on
building up a mentorship base by:
* Expanding the number of plugin developers who contribute to more than just
one plugin. Encouraging existing contributors to leave their little
one-plugin-fiefdom of concern.
* Curate more low-hanging fruit in JIRA, and expose those to potential new
contributors
* Doing whatever it takes to make it easier for somebody to contribute to the
project. This includes, but is not limited to, Daniel's [Developer
Documentation](https://github.com/jenkins-infra/jenkins.io/pull/711) efforts,
discouraging pedantic bike-shedding in new contributor pull requests, and
encouraging more hands-on mentorship/pairing opportunities.
There are certainly a bunch of other things we can try between now and GSoC
2018. Either way, I'm disappointed, but I'm also optimistic that a GSoC
rejection will encourage us to fix things. I would rather not be in GSoC than
to ask a select few who are already stretched thin to commit time they don't
have.
Failure or rejection are learning opportunities, and to quote one of my
favorite sayings: "let's make better mistakes tomorrow."

View File

@ -0,0 +1,289 @@
---
layout: post
title: "On running containers in production"
tags:
- docker
- openinfra
- jenkins
---
As part of [SCaLE 15x](https://www.socallinuxexpo.org), I took part in the
first [Open Source Infra Day](http://scale.opensourceinfra.org) where a number
of other sysadmins and I shared stories and patterns which have helped us
maintain open source infrastructure. As part of the "unconference" tracks, I
suggested and then led the session "Running containers in production." As my
luck would have it, in a group of roughly 10 people representing various
groups, Jenkins was the only project running production services in
containers. I thought I should share what it's like, and why you should stop
standing on the sidelines and give containers in production a try.
Containers have suffered from an unreasonable amount of hype. Which has caused
many people, myself included, to be exceptionally skeptical of their maturity
and utility in a modern infrastructure. Here's what containers, _by
themselves_, do **not** solve:
* Containers do not make your applications more secure.
* Containers do not make your applications more scalable.
* Containers do not make your applications more portable.
However, here is what containers can and **do** solve for:
* Containers make your applications "look" the same (mostly) to the underlying
infrastructure.
* Containers make the application runtime dependencies the developers's
responsibility.
* Containers require the application developers to consider application
state and persistence.
* Containers, once built, provide a (mostly) consistent behavior between dev,
staging, and production environments.
At this point, I won't _not_ use containers anymore. Their benefits far
outweigh their flaws, even in production environments. But there are flaws.
### Pros
The advantages of using containers are fairly well-documented across various
presentations, blog posts, and gushy tweets. For my usage, which is partly
personal and partly for the Jenkins project, the benefits are as follows.
#### Faster and easier application delivery
By using containers I enjoy much faster and easier application deployment,
without needing to change any existing infrastructure. In essence,
infrastructure which already exists and is capable of running a Docker
container is capable of running a JavaScript, Python, or Java applications
without any modifications.
For a mostly mixed infrastructure like that of the Jenkins project, this can be
hugely beneficial. For some applications newer, bleeding edge Java Runtime
Environments have been required, whilst others might need whatever basic JRE is
available. With the container including that form of system dependency within
it, the virtual machines running the containers don't need to know, or care,
about the mixed application runtime requirements and dependencies.
#### Defined "handoff" between developers and operators
Despite hype to the contrary, Operations is still a necessary specialization in
most organizations, the same as design, quality engineering, or product
management. That doesn't necessitate Ops "ownership" of applications, or their
implementations. The more freedom, and responsibility, which can be granted to
application developers the faster they will be able to build, test, and deploy
their applications.
In the case of an open source project like Jenkins, this is especially true.
Very few contributors have the experience, and trust, necessary to act as
infrastructure administrators. Those contributors do not have the time to
tend to, or "own", each application.
Containers provide a very logical, and realistic, "meeting ground" between
application developers and infrastructure admins. The most recent application
deployed for the Jenkins project,
[plugins.jenkins.io](https://plugins.jenkins.io) was developed as a Java/Jetty
backend and JavaScript frontend application. The "requirement" defined, by me,
was that it be delivered as a Docker container which, once created, could be
readily deployed to production.
This defined hand-off point gave the application developer a tangible
achievable goal, which is within his reach, to meet in order for his
application to be deployed.
As much as I would like to teach more people how to use Puppet, or Chef, most
developers have enough to cram into their brains without adding configuration
management to the mix.
#### Local development benefits
By endeavouring to build the application into a container, the application
developer can, for the most part, run the container _locally_ just as it would
run in the production environment. Not only does this make local development
easier, but it also empowers developers to take responsibility for and modify
the runtime environment to a greater degree than if the application ran
differently locally versus in production.
**NOTE**: By constructing hellish webs of inter-container dependencies, or
containers which require excessive amounts of environment variables, etc, this
advantage will be lost. A poorly designed or implemented application is still
going to be difficult to run, regardless of whether it's packaged in a
container or not.
#### Immutable delivery mechanism for Continuous Delivery
Continuous delivery of containers is most certainly a topic for a blog post
unto itself, but suffice it to say, it's amazingly easy with Jenkins these
days. The key benefit containers provide to a continuous delivery process is
**immutability**. Building a container provides an immutable object which can
progress through a pipeline towards production.
While this is also feasible with a `.war` file, or other application archive,
the inclusion of the runtime environment within the package ensures that the
runtime characteristics between a "testing", "staging", and "production"
environments are all the same.
In the `Jenkinsfile` for the [application referenced
above](https://github.com/jenkins-infra/plugin-site-api), one stage of the
defined [Jenkins Pipeline](https://jenkins.io/doc/book/pipeline) builds the
container, then the next stage _runs_ the container and performs some
rudimentary acceptance tests against that container. On at least one occasion
this has prevented the deployment of a genuinely broken application.
#### A growing ecosystem
The momentum behind the container ecosystem continues to grow, which means a
beleagured infrastructure administrator, like yours truly, can take advantage
of a myriad of tools and technologies to make life easier. Technologies like
Kubernetes, especially on Azure or Google Container Engine (GKE), abstract a
*lot* away, in a good way, to where you can think of the service as opposed to
the specific implementation details of the application
For the Jenkins project's next major iteration of infrastructure on Azure,
we're deploying a Kubernetes environment to deploy all applications into,
thereby dramatically reducing the runtime footprint, and cost, of the dozen
little apps floating around. As part of that migration, we're deploying Fluentd
for log aggregation, and Datadog's Kubernetes support for thorough monitoring
of applications as well. All of this tooling comes together to allow us to
describe services, rather than processes, which are necessary for the business
of the project.
----
I believe, and now have proof to back it up, that by supporting containers in
production we (the Jenkins project) are able to support more varied
applications, with faster and more automated release cycles, than without
containers. If speed and reliability are **not important** to you, then it's
probably safe to continue not running containers in production.
----
### Cons
There is no such thing as free lunch, and as much as I would like to say
everything in containers is sunshine and rainbows, there are definitely some
growing pains, and issues with containers in production.
#### Docker networking is a hellscape
I will generalize here, because there is no "one container networking"
mechanism. Host networks, overlay networks, weave networks, etc. There's a lot
of ways to network containers together, whether in a cluster like Kubernetes or
on a single machine. _All_ of them rely on what I will broadly categorize as
"awful kernel networking tricks."
In the pre-Kubernetes days for the Jenkins project we are still orchestrating
the deployment of containers with Puppet on individual virtual machines. This
means we're taking advantage of overlay networks and trusting that the Docker
daemon will configure the appropriate `iptables` rules to expose our containers
to the external network interfaces. To put a finer point on it, this means the
Docker container is creating **N**etwork **A**ddress **T**ranslation rules (aka
Masquerading in iptables parlance) on the machine to get traffic in and out.
NAT is bad, and it should feel bad.
What we have observed over time, is that the Docker daemon cannot be trusted to
cleanly flush `iptables` rules as the container lifecycle changes. What
intermittently happens is: a service goes offline after a new deployment, because
the kernel is still NATing traffic from the external port 8080 to port 8080 on
an _old_ virtual IP address (`172.2.0.2`) instead of the _new_ virtual IP
address the new container was deployed with (`172.2.0.4`). When we look at the
`iptables -t nat -L` output, we see the `DOCKER` and `POSTROUTING` chains with
rules for **both** IPs, but the older one remains with a higher precedence.
Unfortunately this is "fucking impossible" (my words) to reproduce, so I
haven't been able to file a suitable bug report upstream.
This issue exposed a novel gap in our monitoring practices as well. Previously,
some applications had process checks, basically "is apache2 running? great,
everything is fine." When this issue manifests itself, the application
container is running properly, the process table will be correct, the issue
will hide in the `iptables` `nat` table and users will complain that the
application is unreachable.
The obvious take-away here is a known monitoring best practice: **monitor how
the user sees the application**, not how your infrastructure sees it.
Instead of checking that the process serving the web application is running,
running a `curl(1)` against the domain name, or external IP, actually validates
that the application is online and reachable, meeting the service contract for
your users.
#### Sometimes `dockerd` gives up
Running the Docker daemon, `dockerd`, for long periods of time is apparently
not a common practice, but we do it. As a result we have observed, on daemons
with heavy churn or high workloads, that `dockerd` periodically will
wedge/freeze/deadlock taking every container running on that daemon with it.
Running `strace(1)` shows the daemon waiting on a lock (`futex(2)`) which will
never resolve.
Unfortunately this also is "fucking impossible" (my words) to reproduce, so I
haven't been able to file a suitable bug report upstream.
This issue exposed another monitoring gap, checking that `dockerd` is running
is insufficient. Our docker monitoring must execute `docker` commands to see if
the daemon is responsive and doing what it should. While we haven't gone so far
as to automatically restart `dockerd` when this happens, it would be relatively
straight-forward to implement.
#### Disk space is finite
Containers typically will have some ephemeral and mapped storage, backed by a
storage backend configured in the daemon. I have tried both `overlayfs` and
`aufs` and seen similar behaviors of ever increasing disk usage, regardless of
what the application is doing. The behavior I have observed, but not dug too
deeply into, is that disk space allocated for the `aufs` backend, for example,
will only ever grow. It will not shrink, regardless of whether the application
is actually using the space or not. This seems to "reset" when the container is
stopped and removed however.
I believe we notice this acutely because we have long-running containers which
live on long-running virtual machines.
#### Orchestration and secrets
I have seen a lot of skepticism from my peers about tools like Kubernetes and
the hype surrounding them. Some of this skepticism is fair, but let me make
this much clear: **if you do not use a tool like Kubernetes, you will end up
building a shittier version of it.**
Containers being lightweight allows for over-provisioning virtual machines with
multiple containers per instance, and of course it would be silly to run just
one instance of an application so you need replicas, and of course you will
want some form of persistent storage for your applications, and all of a sudden
you're staring at the domain Kubernetes aims to address.
Additionally, managing secrets can be challenging with containers. It would be
bone-headed to bake a container with production SSL certificates, or API keys,
within it. Instead you will need a mechanism for injecting environment-specific
(testing, staging, production) credentials into containers. Without a tool like
Kubernetes, you will most certainly hack something up yourself to get secrets
into your containers, and it will most certainly be ugly.
----
You should no longer be afraid of using contianers in production. There are
certainly caveats and challenges to address, but I assert those downsides are
far outweighed by the benefits to the organization. Astute readers might be
wondering at this point when I will talk about containers and security, but
frankly, I don't see anything novel about security with containers. Firstly, do
not assume containers provide you any additional layer of security. Secondly,
the patch lifecycle, while differently managed compared to traditional package
management, is still more or less the same.
I recommend investigating Kubernetes on Azure or Google Container
Engine, don't waste your time with Amazon's Elastic Container Service, and
start by deploying small stateless applications and slowly work your way up to
beefier stateful monoliths.
But do remember, there are no silver bullets. There will be flaws and
challenges, but adopting containers in production will encourage more
automation and continuous delivery of applications, lead to broader
delegation of responsibilities, and allow infrastructure people to focus on
infrastructure rather than applications.