Add some recent blog posts

2017-03-14 07:37:01 -07:00 · 2017-03-14 07:37:01 -07:00 · f01c26d450
parent 1c303a17fa
commit f01c26d450
2 changed files with 353 additions and 0 deletions
--- a/_posts/2017-03-01-no-gsoc-this-year.md
+++ b/_posts/2017-03-01-no-gsoc-this-year.md
@ -0,0 +1,64 @@
+---
+layout: post
+title: Jenkins will not be part of GSoC 2017
+tags:
+- jenkins
+- opinion
+---
+
+Unfortunately the Jenkins project will not be participating in the [Google
+Summer of Code](https://summerofcode.withgoogle.com/) (GSoC) 2017. While I am
+disappointed, I am not all that surprised. Last year, our inaugural year in
+GSoC, was tough insofar that we had to learn many things the hard way, did a
+poor job of selecting student proposals, and failed to recruit a satisfactory
+number of mentors.
+
+This year, I campaigned for a change in our proposal process. Instead of
+allowing anybody from the project to throw an idea into the top-hat of
+suggested project ideas, we would only list suggested project ideas that had a
+modicum of mentor commitment. Basically, if you weren't willing to help mentor
+a student on your suggested project, we weren't going to list it.
+
+As a result we only had 3 suggested project ideas. Disappointing.
+
+
+Considering our experience last year, where
+[Oleg](https://github.com/oleg-nenashev) bore a significant personal burden
+stepping in as mentor when others failed to meet the commitment necessary, I
+would rather not have been accepted than to force Org Admins to do that again.
+
+
+In my opinion the Jenkins project, or **any** open source project,  should not
+attempt to participate in Google's Summer of Code without being able to provide
+the necessary commitment to mentor students during the program. It's exciting
+to imagine new contributors, and some funds donated, to the project, but the
+end-goal is to mentor and cultivate the next generation of open source
+contributors. The students must come first.
+
+While it's disappointing that the Jenkins project won't have an influx of
+students this year, I have been discussing plans with Oleg to find ways in
+which we can mentor or encourage more activity from the hundreds of "casual
+contributors" to the Jenkins project.
+
+In between this year and next, I think we (the Jenkins project) can focus on
+building up a mentorship base by:
+
+* Expanding the number of plugin developers who contribute to more than just
+  one plugin. Encouraging existing contributors to leave their little
+  one-plugin-fiefdom of concern.
+* Curate more low-hanging fruit in JIRA, and expose those to potential new
+  contributors
+* Doing whatever it takes to make it easier for somebody to contribute to the
+  project. This includes, but is not limited to, Daniel's [Developer
+  Documentation](https://github.com/jenkins-infra/jenkins.io/pull/711) efforts,
+  discouraging pedantic bike-shedding in new contributor pull requests, and
+  encouraging more hands-on mentorship/pairing opportunities.
+
+There are certainly a bunch of other things we can try between now and GSoC
+2018.  Either way, I'm disappointed, but I'm also optimistic that a GSoC
+rejection will encourage us to fix things. I would rather not be in GSoC than
+to ask a select few who are already stretched thin to commit time they don't
+have.
+
+Failure or rejection are learning opportunities, and to quote one of my
+favorite sayings: "let's make better mistakes tomorrow."
--- a/_posts/2017-03-14-on-containers-in-production.md
+++ b/_posts/2017-03-14-on-containers-in-production.md
@ -0,0 +1,289 @@
+---
+layout: post
+title: "On running containers in production"
+tags:
+- docker
+- openinfra
+- jenkins
+---
+
+As part of [SCaLE 15x](https://www.socallinuxexpo.org), I took part in the
+first [Open Source Infra Day](http://scale.opensourceinfra.org) where a number
+of other sysadmins and I shared stories and patterns which have helped us
+maintain open source infrastructure. As part of the "unconference" tracks, I
+suggested and then led the session "Running containers in production." As my
+luck would have it, in a group of roughly 10 people representing various
+groups, Jenkins was the only project running production services in
+containers. I thought I should share what it's like, and why you should stop
+standing on the sidelines and give containers in production a try.
+
+
+Containers have suffered from an unreasonable amount of hype.  Which has caused
+many people, myself included, to be exceptionally skeptical of their maturity
+and utility in a modern infrastructure. Here's what containers, _by
+themselves_, do **not** solve:
+
+* Containers do not make your applications more secure.
+* Containers do not make your applications more scalable.
+* Containers do not make your applications more portable.
+
+However, here is what containers can and **do** solve for:
+
+* Containers make your applications "look" the same (mostly) to the underlying
+  infrastructure.
+* Containers make the application runtime dependencies the developers's
+  responsibility.
+* Containers require the application developers to consider application
+  state and persistence.
+* Containers, once built, provide a (mostly) consistent behavior between dev,
+  staging, and production environments.
+
+
+At this point, I won't _not_ use containers anymore. Their benefits far
+outweigh their flaws, even in production environments. But there are flaws.
+
+
+### Pros
+
+The advantages of using containers are fairly well-documented across various
+presentations, blog posts, and gushy tweets. For my usage, which is partly
+personal and partly for the Jenkins project, the benefits are as follows.
+
+#### Faster and easier application delivery
+
+By using containers I enjoy much faster and easier application deployment,
+without needing to change any existing infrastructure. In essence,
+infrastructure which already exists and is capable of running a Docker
+container is capable of running a JavaScript, Python, or Java applications
+without any modifications.
+
+For a mostly mixed infrastructure like that of the Jenkins project, this can be
+hugely beneficial. For some applications newer, bleeding edge Java Runtime
+Environments have been required, whilst others might need whatever basic JRE is
+available. With the container including that form of system dependency within
+it, the virtual machines running the containers don't need to know, or care,
+about the mixed application runtime requirements and dependencies.
+
+#### Defined "handoff" between developers and operators
+
+Despite hype to the contrary, Operations is still a necessary specialization in
+most organizations, the same as design, quality engineering, or product
+management. That doesn't necessitate Ops "ownership" of applications, or their
+implementations. The more freedom, and responsibility, which can be granted to
+application developers the faster they will be able to build, test, and deploy
+their applications.
+
+In the case of an open source project like Jenkins, this is especially true.
+Very few contributors have the experience, and trust, necessary to act as
+infrastructure administrators. Those contributors do not have the time to
+tend to, or "own", each application.
+
+Containers provide a very logical, and realistic, "meeting ground" between
+application developers and infrastructure admins. The most recent application
+deployed for the Jenkins project,
+[plugins.jenkins.io](https://plugins.jenkins.io) was developed as a Java/Jetty
+backend and JavaScript frontend application. The "requirement" defined, by me,
+was that it be delivered as a Docker container which, once created, could be
+readily deployed to production.
+
+This defined hand-off point gave the application developer a tangible
+achievable goal, which is within his reach, to meet in order for his
+application to be deployed.
+
+As much as I would like to teach more people how to use Puppet, or Chef, most
+developers have enough to cram into their brains without adding configuration
+management to the mix.
+
+#### Local development benefits
+
+By endeavouring to build the application into a container, the application
+developer can, for the most part, run the container _locally_ just as it would
+run in the production environment. Not only does this make local development
+easier, but it also empowers developers to take responsibility for and modify
+the runtime environment to a greater degree than if the application ran
+differently locally versus in production.
+
+**NOTE**: By constructing hellish webs of inter-container dependencies, or
+containers which require excessive amounts of environment variables, etc, this
+advantage will be lost. A poorly designed or implemented application is still
+going to be difficult to run, regardless of whether it's packaged in a
+container or not.
+
+#### Immutable delivery mechanism for Continuous Delivery
+
+Continuous delivery of containers is most certainly a topic for a blog post
+unto itself, but suffice it to say, it's amazingly easy with Jenkins these
+days. The key benefit containers provide to a continuous delivery process is
+**immutability**. Building a container provides an immutable object which can
+progress through a pipeline towards production.
+
+While this is also feasible with a `.war` file, or other application archive,
+the inclusion of the runtime environment within the package ensures that the
+runtime characteristics between a "testing", "staging", and "production"
+environments are all the same.
+
+In the `Jenkinsfile` for the [application referenced
+above](https://github.com/jenkins-infra/plugin-site-api), one stage of the
+defined [Jenkins Pipeline](https://jenkins.io/doc/book/pipeline) builds the
+container, then the next stage _runs_ the container and performs some
+rudimentary acceptance tests against that container. On at least one occasion
+this has prevented the deployment of a genuinely broken application.
+
+
+#### A growing ecosystem
+
+The momentum behind the container ecosystem continues to grow, which means a
+beleagured infrastructure administrator, like yours truly, can take advantage
+of a myriad of tools and technologies to make life easier.  Technologies like
+Kubernetes, especially on Azure or Google Container Engine (GKE), abstract a
+*lot* away, in a good way, to where you can think of the service as opposed to
+the specific implementation details of the application
+
+For the Jenkins project's next major iteration of infrastructure on Azure,
+we're deploying a Kubernetes environment to deploy all applications into,
+thereby dramatically reducing the runtime footprint, and cost, of the dozen
+little apps floating around. As part of that migration, we're deploying Fluentd
+for log aggregation, and Datadog's Kubernetes support for thorough monitoring
+of applications as well. All of this tooling comes together to allow us to
+describe services, rather than processes, which are necessary for the business
+of the project.
+
+----
+
+I believe, and now have proof to back it up, that by supporting containers in
+production we (the Jenkins project) are able to support more varied
+applications, with faster and more automated release cycles, than without
+containers. If speed and reliability are **not important** to you, then it's
+probably safe to continue not running containers in production.
+
+----
+
+### Cons
+
+There is no such thing as free lunch, and as much as I would like to say
+everything in containers is sunshine and rainbows, there are definitely some
+growing pains, and issues with containers in production.
+
+#### Docker networking is a hellscape
+
+I will generalize here, because there is no "one container networking"
+mechanism. Host networks, overlay networks, weave networks, etc. There's a lot
+of ways to network containers together, whether in a cluster like Kubernetes or
+on a single machine. _All_ of them rely on what I will broadly categorize as
+"awful kernel networking tricks."
+
+In the pre-Kubernetes days for the Jenkins project we are still orchestrating
+the deployment of containers with Puppet on individual virtual machines. This
+means we're taking advantage of overlay networks and trusting that the Docker
+daemon will configure the appropriate `iptables` rules to expose our containers
+to the external network interfaces. To put a finer point on it, this means the
+Docker container is creating **N**etwork **A**ddress **T**ranslation rules (aka
+Masquerading in iptables parlance) on the machine to get traffic in and out.
+
+NAT is bad, and it should feel bad.
+
+What we have observed over time, is that the Docker daemon cannot be trusted to
+cleanly flush `iptables` rules as the container lifecycle changes. What
+intermittently happens is: a service goes offline after a new deployment, because
+the kernel is still NATing traffic from the external port 8080 to port 8080 on
+an _old_ virtual IP address (`172.2.0.2`) instead of the _new_ virtual IP
+address the new container was deployed with (`172.2.0.4`). When we look at the
+`iptables -t nat -L` output, we see the `DOCKER` and `POSTROUTING` chains with
+rules for **both** IPs, but the older one remains with a higher precedence.
+
+
+Unfortunately this is "fucking impossible" (my words) to reproduce, so I
+haven't been able to file a suitable bug report upstream.
+
+This issue exposed a novel gap in our monitoring practices as well. Previously,
+some applications had process checks, basically "is apache2 running? great,
+everything is fine." When this issue manifests itself, the application
+container is running properly, the process table will be correct, the issue
+will hide in the `iptables` `nat` table and users will complain that the
+application is unreachable.
+
+The obvious take-away here is a known monitoring best practice: **monitor how
+the user sees the application**, not how your infrastructure sees it.
+
+Instead of checking that the process serving the web application is running,
+running a `curl(1)` against the domain name, or external IP, actually validates
+that the application is online and reachable, meeting the service contract for
+your users.
+
+
+#### Sometimes `dockerd` gives up
+
+Running the Docker daemon, `dockerd`, for long periods of time is apparently
+not a common practice, but we do it. As a result we have observed, on daemons
+with heavy churn or high workloads, that `dockerd` periodically will
+wedge/freeze/deadlock taking every container running on that daemon with it.
+Running `strace(1)` shows the daemon waiting on a lock (`futex(2)`) which will
+never resolve.
+
+Unfortunately this also is "fucking impossible" (my words) to reproduce, so I
+haven't been able to file a suitable bug report upstream.
+
+This issue exposed another monitoring gap, checking that `dockerd` is running
+is insufficient. Our docker monitoring must execute `docker` commands to see if
+the daemon is responsive and doing what it should. While we haven't gone so far
+as to automatically restart `dockerd` when this happens, it would be relatively
+straight-forward to implement.
+
+
+#### Disk space is finite
+
+Containers typically will have some ephemeral and mapped storage, backed by a
+storage backend configured in the daemon. I have tried both `overlayfs` and
+`aufs` and seen similar behaviors of ever increasing disk usage, regardless of
+what the application is doing. The behavior I have observed, but not dug too
+deeply into, is that disk space allocated for the `aufs` backend, for example,
+will only ever grow. It will not shrink, regardless of whether the application
+is actually using the space or not. This seems to "reset" when the container is
+stopped and removed however.
+
+I believe we notice this acutely because we have long-running containers which
+live on long-running virtual machines.
+
+
+#### Orchestration and secrets
+
+I have seen a lot of skepticism from my peers about tools like Kubernetes and
+the hype surrounding them. Some of this skepticism is fair, but let me make
+this much clear: **if you do not use a tool like Kubernetes, you will end up
+building a shittier version of it.**
+
+Containers being lightweight allows for over-provisioning virtual machines with
+multiple containers per instance, and of course it would be silly to run just
+one instance of an application so you need replicas, and of course you will
+want some form of persistent storage for your applications, and all of a sudden
+you're staring at the domain Kubernetes aims to address.
+
+Additionally, managing secrets can be challenging with containers. It would be
+bone-headed to bake a container with production SSL certificates, or API keys,
+within it. Instead you will need a mechanism for injecting environment-specific
+(testing, staging, production) credentials into containers. Without a tool like
+Kubernetes, you will most certainly hack something up yourself to get secrets
+into your containers, and it will most certainly be ugly.
+
+
+----
+
+You should no longer be afraid of using contianers in production. There are
+certainly caveats and challenges to address, but I assert those downsides are
+far outweighed by the benefits to the organization. Astute readers might be
+wondering at this point when I will talk about containers and security, but
+frankly, I don't see anything novel about security with containers. Firstly, do
+not assume containers provide you any additional layer of security.  Secondly,
+the patch lifecycle, while differently managed compared to traditional package
+management, is still more or less the same.
+
+I recommend investigating Kubernetes on Azure or Google Container
+Engine, don't waste your time with Amazon's Elastic Container Service, and
+start by deploying small stateless applications and slowly work your way up to
+beefier stateful monoliths.
+
+But do remember, there are no silver bullets. There will be flaws and
+challenges, but adopting containers in production will encourage more
+automation and continuous delivery of applications, lead to broader
+delegation of responsibilities, and allow infrastructure people to focus on
+infrastructure rather than applications.