Add a bunch of blog posts from the past couple months

I'm apparently a very lazy committer
2017-12-02 08:40:21 -08:00 · 2017-12-02 08:40:21 -08:00 · 477df3475b
parent b85175a379
commit 477df3475b
7 changed files with 1074 additions and 0 deletions
--- a/_posts/2017-10-04-they-will-blame-you.md
+++ b/_posts/2017-10-04-they-will-blame-you.md
@ -0,0 +1,65 @@
+---
+layout: post
+title: "They will blame you"
+tags:
+- sysadmin
+- devops
+- opinion
+---
+
+Over the past decade two things have become increasingly clear: practically
+every modern industry is part of "the software industry," in one way or
+another, and "the software industry" is rife with shortcuts and technical debt.
+Working in an Operations or Systems Administration capacity provides a
+front-row seat to many of these dysfunctional behaviors. But it's not just
+sysadmins, many developers are also called to engage in or allow: half-baked
+product launches, poor-quality code deployments, or subpar patch lifecycle
+management.
+
+Make no mistake, if something goes wrong, **they will blame you.**
+
+Just yesterday, I was working on my truck in the driveway and a neighbor struck
+up a conversation about diesel engines. The conversation naturally led to a
+discussion about Volkswagen's massive diesel emissions scandal. I mentioned to
+my neighbor how infuriated I was that [Volkswagen executives blamed developers](http://www.latimes.com/business/autos/la-fi-hy-vw-hearing-20151009-story.html)
+for the scandal. Prior to that news story, I naively assumed that executives
+took ultimate responsibility for the successes, and failures, of their
+organizations.
+
+As the sun set, I wrapped up my work and came back inside to see [this story from Engadget](https://www.engadget.com/2017/10/03/former-equifax-ceo-blames-breach-on-one-it-employee/)
+wherein former Equifax CEO blamed IT staff for the failure. The Equifax breach
+was made possible because of an out-of-date Apache Struts dependency.
+
+Setting aside for a moment that personal-identifying information should _never_
+be a single vulnerability away from exposure. Setting aside for a moment that
+the majority of the Equifax business relies on **trust**, and should have
+therefore been subject to vigorous and regular third-party security audits.
+Setting aside for a moment that information security relies on defense in
+depth, which is an organization-wide practice. The former CEO blamed
+underlings, rather than leadership for the systemic failures of Equifax to
+secure highly sensitive personal information.
+
+Make no mistake, if something goes wrong, **they will blame you.**
+
+---
+
+Before I dropped out of college, while I was still pretending to study
+Computer Engineering, I took an Engineering Ethics course. We discussed Space
+Shuttle disasters, bridge failures, and other calamities, at length. One
+recurring theme from many of the incidents was management ignoring or covering
+up expert advice, or concerns, by engineering staff. The conclusion drawn, for
+the auditorium of young engineering students, was that it was our
+responsibility as "Professional Engineers" to ensure the safety and quality of
+our work, and make sure that we had solid documentation for any safety concerns
+we raise, otherwise we could be held liable.
+
+
+I am starting to believe that, before the decade is over, we will start to see
+developers and systems administrators held civilly liable for failures in
+systems we create and for which we are responsible.
+
+It is up to you to advocate for good patch lifecycle management practices. It
+is up to you to build systems which prevent poor-quality code deployments. It
+is up to you to advocate for well-designed products which defend user privacy
+and personally-identifiable information. Because make no mistake, if something
+goes catastrophically wrong, they will blame you.
--- a/_posts/2017-10-09-fire-coming-down-the-mountain.md
+++ b/_posts/2017-10-09-fire-coming-down-the-mountain.md
@ -0,0 +1,72 @@
+---
+layout: post
+title: "Watching fire come down the mountain"
+tags:
+- california
+- santarosa
+- opinion
+---
+
+The insanely strong gusts of wind would not stop clattering the tin roof panels
+over the back patio. Begrudgingly, I awoke, dressed, and tried to secure the
+roof panels before the neighbors got too ornery. Stepping up the ladder, I
+noticed an orange glow north of the house. Just after midnight, I had not heard
+any sirens, I jumped into the car on the assumption that one of those houses
+by the park was burning and had not yet been reported.
+
+Wearing a flannel, jeans, and my flip-flops, I speed off into the night. Not
+entirely sure what aid I could render, as a mostly-useless person wearing
+inappropriate fire-fighting footwear.
+
+Passing the park, seeing nothing, I figure it's the neighborhood behind, and
+continued driving. The next neighborhood doesn't show any fire but I smell
+smoke, so I continue on towards Fountaingrove Parkway which crosses one of the
+highest ridges in Santa Rosa.
+
+Atop Fountaingrove Parkway, I see the hills to the north, an area I later learn
+is "Shiloh ridge", are glowing.
+
+I do not see flames, but they're glowing. I turn my hat backwards so the gusts
+of wind don't blow my hat from my head. Not more than two minutes pass and
+flames crest the ridge.
+
+"Oh shit" I exclaim to nobody in particular.
+
+Walking back to the car, I stand on the bumper for a better view and see the
+flames already pushing more than halfway down Shiloh Ridge. In a matter of
+minutes, the ridge glowing against the smokey night sky had erupted in flames.
+
+"Oh fuck this!" and I scurry into the car and speed off.
+
+---
+
+Driving back to house, I call my wife, who is rather surprised to learn I'm not
+sleeping beside her. She puts a kettle on, and starts preparing the go-bag. I
+arrive home around 1:00, half the sky is clear with a full moon, the other half
+smoke filled with an orange backlight.
+
+While preparing some stuff to go, we start listening to the scanner, and begin
+to watch Twitter.
+
+Within 30 minutes the evacuation notices are rolling out.
+
+Within 60 minutes the fire jumps over US Highway 101.
+
+---
+
+We voluntarily evacuated to Sebastopol at 3:00.
+
+---
+
+Between Santa Rosa and Sebastopol, the air foggy with smoke and ash, we are
+able to see fires raging on the hills to the southeast of Santa Rosa. Arriving
+in Sebastopol at 3:45, everybody had already been awoken by the smell of smoke.
+
+By 10:00, significant chunks of northern Santa Rosa have burnt to the ground.
+The neighborhood from that glowing ridge, which I saw around midnight: gone.
+The valley below, where I watched the flames flicker down the hill: gone. The
+ridge I stood atop for all of five minutes, is now also on fire.
+
+It is still uncertain how the fire will develop throughout the day, how long
+the fire will burn, and how scarred the beautiful Sonoma and Napa Valleys will
+be when it's all over.
--- a/_posts/2017-10-23-this-is-reality.md
+++ b/_posts/2017-10-23-this-is-reality.md
@ -0,0 +1,132 @@
+---
+layout: post
+title: "This is your reality now"
+tags:
+- santarosa
+- sonoma
+- fire
+- sonomafireinfo
+---
+
+The traffic on the Bay Bridge connecting San Francisco to Oakland is one of the
+most congested routes of traffic in all of Northern California. Somehow it gets
+even worse on Saturday and Sunday. One weekend, a few years ago, I was driving my wife
+and some of the women from her soccer team back to Berkeley, from a game in
+San Francisco's Golden Gate Park. On the east side of the bridge, before
+inching onto I-580N, I was pretty pissed off, and half-joking half-frustrated
+shook back-and-forth at the steering wheel "GAHHHHHHHHHHHH." The woman sitting
+behind me, who was certainly the "funny one" of the group, put her hand on my
+arm and gently said "Tyler, this is your reality now."
+
+Certainly a well-delivered line, perfect timing, received with laughter all around, but
+the phrase has stuck in my memory longer than the woman's name.
+
+
+My [last post](/2017/10/09/fire-coming-down-the-mountain.html) I wrote as a way
+to process and capture the trauma of watching fire rip into northern Santa
+Rosa. A town I have adopted and which is the subject of a number of picturesque
+photos I have posted over the past three years, always titled with my
+unofficial city motto: "Santa Rosa: It's nice."
+
+The day after I wrote that post, I ended up at the [Chimera Arts and
+Makerspace](http://chimeraarts.org) in Sebastopol, the little hippie town west
+of Santa Rosa, where I joined a fledgling effort called [Sonoma Fire
+Info](http://sonomafireinfo.org). I took the remainder of the week off from
+work, and our little volunteer organization rapidly became a clearinghouse for
+verified information across the county in its time of need. Soaking up the
+efforts of over 60 volunteers who made thousands of phone calls, scoured social
+media, and captured truth amid the chaos. In a two week period, the website had
+been viewed by over 100k people.
+
+I think we did a great job of informing Sonoma County. The rest of the country,
+and world, remains frustratingly less informed about an event from which my adorable
+little city is going to take _years_ to recover.
+
+The fire that I watched whip down the hillside is known as the "Tubbs
+Fire". The fire that I could see from miles away on Llano Rd during our
+voluntary evacuation to Sebastopol at 3:45 that morning is known as the "Nuns
+Fire." While I saw both of these with my own eyes, there were **four other
+fires**, of various sizes, engorged by 50-70mph winds, raging in Northern
+California:
+
+* The "Sulphur Fire" burned in Lake County to our northeast.
+* The "Pocket Fire" destroyed parts of northern Sonoma county.
+* The "Redwood Valley Fire" incinerated Mendocino County further to the north.
+* The "Atlas Fire" tore through Napa County to our east.
+
+At one time there were **six active fires** in the part of Northern California north of
+San Francisco and west of Sacramento. To put this into a historic context,
+**four** of those six fires rank in the 20 most destructive (structures destroyed)
+wildfires ever recorded in California history:
+
+![The 20 most destructive fires](/images/post-images/your-reality-now/destructive-fires.jpg)
+(posted by [@CALFIRE](https://twitter.com/CAL_FIRE/status/921441414981885952/photo/1) on October 20th)
+
+
+The most destructive (Tubbs), and sixth most destructive (Nuns), wildfires in
+the Bear Republic's history scarred Sonoma county on a difficult to understand
+and on a difficult to process scope.
+
+The impact on Santa Rosa, in particular, from this [unfathomably big fire](https://twitter.com/agentdero/status/921609069810532353)
+cannot be understated. Considered the fifth most populous city in the "Bay
+Area," with just over 170k residents, it lost **5%** of its housing in less than
+twelve hours. The gale-force winds which woke me up at 12:30am on October 9th
+pushed the fire through neighborhoods, across 4-6 lanes of Highway 101, and
+through hundreds more homes before it could be stopped, all in a matter of
+about 8 hours.
+
+---
+
+We returned to our house the Thursday night after the fires started, exhausted.
+After a full day working at Chimera on Sonoma Fire Info, and some dinner that
+Friday, I holed up in my office and continued scouring the internet for news
+and updates when I startled at the sound of water falling on the tin patio roof.
+
+My first thought: "did a water-tanker helicopter just fly over?" Followed
+quickly by "no fucking way, did it start raining!?" Bolting out the front door,
+I was disappointed to learn it had not started raining, but then was bemused to
+find my neighbor, watering my house.
+
+I can understand the compulsion to water down the house "just in case" in areas
+near wildfires, but this wasn't a "just in case" rather, my neighbor caught an
+ember burning on my roof earlier in the week. He had since taken to watering both our
+houses a couple times a day.
+
+I also learned from my night-owl of a neighbor that he had been sitting on my
+corner-lot house's porch, and brandished his pistol a few times at some cars
+which took an especially slow roll through our neighborhood, not about to let
+any thieves take advantage of the situation.
+
+The CALFIRE maps show that we are almost exactly one mile south of the last
+structures completely destroyed by the Tubbs Fire.
+
+This was close, terrifyingly close.
+
+---
+
+The next Monday, a week after the fires broke out, I return to work, to
+questions of "are things okay?"
+
+I lie.
+
+
+Everybody in Sonoma county who didn't lose a house, knows somebody who did.
+Thousands of people will have to wait until early 2018 for the EPA to remove
+thousands of tons of toxic ash and debris, requiring a clean-up operation of
+unprecedented size, before they can begin to rebuild. Large portions of
+Sugarloaf Ridge State Park are burned, the majority of Annadel State Park is
+destroyed. Most of the little Sonoma Valley towns I drive through on my way to
+Napa have suffered severe damage.
+
+This region, this adopted home of mine, is scarred in places beyond appreciation
+for many Americans, including some who live here.
+
+
+Much as I would like to wallow in that frustration and despair, there is no
+direction to go but forward.  There is nothing that will undo what has been
+done, nothing will make this "okay."
+
+There is no option for Sonoma county, and Santa Rosa, but to enjoy the warmth
+of the autumn sun, pick up the pieces, and to rebuild.
+
+"This is your reality now."
--- a/_posts/2017-10-31-fosdem-testingautomation.md
+++ b/_posts/2017-10-31-fosdem-testingautomation.md
@ -0,0 +1,80 @@
+---
+layout: post
+title: "Call for Proposals: Testing and Automation @ FOSDEM 2018"
+tags:
+- fosdem
+- testingautomation
+- jenkins
+---
+
+
+2018 will be the sixth year for the Testing/Automation dev room at
+[FOSDEM](https://fosdem.org/2016). This room is about creating better
+software through a focus on testing and automation at all layers of
+the stack. From creating libraries and end-user applications all the
+way down to packaging, distribution and deployment. Testing and
+automation is not isolated to a single toolchain, language or
+platform, there is much to learn and share regardless of background!
+
+# What
+
+Since this is the sixth year we're hosting the Testing and Automation
+dev room, here are some ideas of what we would like to see, and what
+worked in prior years, they're just ideas though! Check out the
+[2013](https://archive.fosdem.org/2013/schedule/track/testing_and_automation/),
+[2014](https://archive.fosdem.org/2014/schedule/track/testing_and_automation/),
+[2015](https://archive.fosdem.org/2015/schedule/track/testing_and_automation/),
+[2016](https://archive.fosdem.org/2016/schedule/track/testing_and_automation/),
+[2017](https://archive.fosdem.org/2017/schedule/track/testing_and_automation/)
+schedules for inspiration.
+
+### Testing in the real, open source world
+
+* War stories/strategies for testing large scale or complex projects
+* Tools that extend the ability to test low-level code
+* Projects that are introducing new/interesting ways of testing "systems"
+
+### Cool Tools (good candidates for lightning talks)
+
+* Explain/demo how your open source tool made developing quality software better
+* Combining projects/plugins/tools to build amazing things "Not enough
+people in the open source community know how to use $X, but here's a
+tutorial on how to use $X to make your project better."
+
+# Where
+
+FOSDEM is hosted at [Universite libre de Bruxelles in Brussels,
+Belgium](https://fosdem.org/2018/practical/transportation/). The
+Testing and Automation dev room is likely slated for Building H, room
+2213, which seats ~100.
+
+# When
+ * CFP Submission Deadline: **23:59 UTC, 26 November 2017**
+ * Schedule Announced: **15 December 2017**
+ * Presentations: **3 February 2018**
+
+# How
+
+Please submit one (or more) 30-40 minute talk proposal(s) OR one (or
+more) 10 minute lightning talk proposal(s) by **23:59 UTC on November
+26th 2017**. We will notify all those submitting proposals about their
+acceptance by December 15th 2017.
+
+To submit a talk proposal (you can submit multiple proposals if you'd
+like) with [Pentabarf](https://penta.fosdem.org/submission/FOSDEM18/),
+the FOSDEM paper submission system. Be sure to select `Testing and
+Automation` otherwise we won't see it!
+
+
+You can create an account, or use an existing account if you already have one.
+
+Please note: FOSDEM is a
+[FLOSS](https://en.wikipedia.org/wiki/Free_and_open-source_software)
+community event, by and for the community, please ensure your topic is
+appropriate (i.e. this isn't the right forum for commercial product
+presentations)
+
+# Who
+
+ * [R. Tyler Croy](https://github.com/rtyler) - Jenkins hacker
+ * [Mark Waite](https://github.com/markewaite) - Jenkins/Git hacker
--- a/_posts/2017-11-20-tasks-with-docker-azure-functions.md
+++ b/_posts/2017-11-20-tasks-with-docker-azure-functions.md
@ -0,0 +1,206 @@
+---
+layout: post
+title: "Running tasks with Docker and Azure Functions"
+tags:
+- azure
+- docker
+---
+
+Months ago Microsoft announced [Azure Container
+Instances](https://docs.microsoft.com/en-us/azure/container-instances/) (ACI), which
+allow for rapidly provisioning containers "in the cloud." When they were first
+announced, I played around with them for a bit, before realizing that the
+pricing for running a container "full-time" was almost 3x what it would cost to
+deploy that container on an equitable Standard A0 virtual machine. Since then
+however, Azure has added support for a "Never" restart policy, which opens the
+door for using Azure Container Instances for [arbitrary task
+execution](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-restart-policy).
+
+
+The ability to quickly run arbitrary containerized tasks is a really exciting
+feature. Any Ruby, Python, JavaScript, script that I can package into a Docker
+container I can kick out to Azure Container Instances in seconds, and pay by
+the second of runtime. **Very** exciting, but it's not practical for me to
+always have the Azure CLI at the ready to execute something akin to:
+
+```
+az container create \
+    --resource-group myResourceGroup \
+    --name mycontainer \
+    --image rtyler/my-silly-container:latest \
+    --restart-policy Never
+```
+
+Fortunately, Microsoft publishes a number of client libraries for Azure,
+including a Node.js one.  This is where introducing [Azure
+Functions](https://docs.microsoft.com/en-us/azure/azure-functions/) can help
+make Azure Container Instances really _shine_.  Similar to AWS Lambda, or
+Google Cloud Functions, Azure Functions provide a light-weight computing
+environment for running teeny-tiny little bits of code, typically JavaScript,
+"in the cloud."
+
+
+This past weekend I had an arguably good argument for combining the two in a
+novel fashion: launching a (containerized) script every ten minutes.
+
+The expensive and old fashioned way to handle this would be to just deploy a
+small VM, add a crontab entry, and spend the money to keep that machine online
+for what equates to approximately 6 hours of work throughout the month.
+
+* Standard A0 virtual machine monthly cost: $14.64
+* Azure Container Instance, for 6 hours a month, cost: $0.56
+
+In this blog post I won't go too deeply into the creation of an Azure Function,
+but I will focus on the code which actually provisions an Azure Container
+Instance from Node.js.
+
+### Prerequisites
+
+In order to provision resources in Azure, we must first create the Azure
+credentials objects necessary. For better or worse, Azure builds on top of
+Azure Active Directory which offers an absurd amount of role-based access
+controls and options. The downside of that flexibility is that it's supremely
+awkward to get simple API tokens set up for what seem like otherwise mundane
+tasks.
+
+To provision resources, we will need an "Application", "Service Principal", and
+"Secret". The instructions below will use the Azure CLI:
+
+* `openssl rand -base64 24` will generate a good "client secret" to use.
+* `az ad app create --display-name MyAppName --homepage http://example.com/my-app --identifier-uris http://example.com/my-app --password $CLIENT_SECRET` creates the Azure Active Directory Application, mind the "App ID" (aka client ID).
+* `az ad sp create --id $CLIENT_ID` will create a Service Principal.
+* And finally, I'll assign a role to that Service Principal: `az role assignment create --assignee http://example.com/my-app --role Contributor --scope /subscriptions/$SUBSCRIPTION)_ID/resourceGroups/my-apps-resource-group`.
+
+In these steps, I've isolated the Service Principal to a specific Resource
+Group (`my-apps-resource-group`) to keep it away from other resources, but also
+to make it easier to monitor costs.
+
+A number of these variables will be set in the Azure Function "Application
+Settings" to enable my JavaScript function to authenticate against the Azure
+APIs.
+
+
+### Accessing Azure from Azure
+
+Writing the JavaScript to actually launch a container instance was a little
+tricky, as I couldn't find a single example in the [azure-arm-containerinstance
+package](https://github.com/Azure/azure-sdk-for-node/tree/master/lib/services/containerinstanceManagement).
+
+In the "Codes" section below is the entire Azure Function, but the only major
+caveat is that in my example I've "hacked" the `apiVersion` which is used when
+accessing the Azure REST APIs, as the current package hits an API which doesn't
+support the "Never" restart policy for the container.
+
+With the Azure SDK for Node, authenticating properly, it's feasible to do all
+kinds of interesting operations in Azure, creating, updating, or deleting
+resources based on specific triggers from Azure Functions.
+
+### Future Possibilities
+
+The code below is among the most simplistic use-cases imaginable for
+combining Azure Functions and Azure Container Instances. Thinking more broadly,
+one could conceivably trigger short-lived containers 'on-demand" in response to
+messages coming from Event Hub, or even inbound HTTP requests from another user
+or system. Imagine, for example, if you wanted to provide a quick demo of some
+application to new users on your website. One Azure Function provisioning
+containers for specific users, and another periodically reaping any containers
+which have been running past their timeout, would be both cheap and easily
+deployed.
+
+I still wouldn't use Azure Container Instances for any "full-time" workload,
+their pricing model is fundamentally flawed for those kinds of tasks. If you
+have workloads which are run for only seconds, minutes, or hours at a time,
+they make a *lot* more sense, and with Azure Functions, are cheaply and easily
+orchestrated.
+
+
+### Codes
+
+**index.js**
+
+```
+module.exports = function (context) {
+    const ACI   = require('azure-arm-containerinstance');
+    const AZ    = require('ms-rest-azure');
+
+    context.log('Starting a container');
+
+    AZ.loginWithServicePrincipalSecret(
+        process.env.AZURE_CLIENT_ID,
+        process.env.AZURE_CLIENT_SECRET,
+        process.env.AZURE_TENANT_ID,
+        (err, credentials) => {
+            if (err) {
+                throw err;
+            }
+            let client = new ACI(credentials, process.env.AZURE_SUBSCRIPTION_ID);
+            let container = new client.models.Container();
+            context.log('Launching a container for client', client);
+
+            container.name = 'my-container-name';
+            container.environmentVariables = [
+                {
+                    name: 'SOME_ENV_VAR',
+                    value: process.env.SOME_ENV_VAR
+                }
+            ];
+            container.image = 'my-fancy-image-name:latest';
+            container.ports = [{port: 80}];
+            container.resources = {
+                requests: {
+                    cpu: 1,
+                    memoryInGB: 1
+                }
+            };
+
+            /* HACK THE PLANET */
+            /* https://github.com/Azure/azure-sdk-for-node/issues/2334 */
+            client.apiVersion = '2017-10-01-preview';
+
+            context.log('Provisioning a container', container);
+            client.containerGroups.createOrUpdate(
+                'spyglass-containers', /* resource group */
+                'some-proc',          /* container group name */
+                {
+                    containers: [container],
+                    osType: 'Linux',
+                    location: 'westus',
+                    restartPolicy: 'never'
+                }
+            ).then((r) => {
+                context.log('Launched:', r);
+                context.done();
+            });
+    });
+};
+```
+
+**package.json**
+
+```
+{
+  "name": "foobar-processing",
+  "version": "0.0.1",
+  "description": "Timer-triggered function for running an Azure Container Instance",
+  "main": "index.js",
+  "author": "R Tyler Croy",
+  "dependencies": {
+    "azure-arm-containerinstance": "^1.0.0-preview"
+  }
+}
+```
+**function.json**
+
+```
+{
+  "disabled": false,
+  "bindings": [
+    {
+        "direction": "in",
+        "schedule": "0 */10 * * * *",
+        "name": "tenMinuteTimer",
+        "type": "timerTrigger"
+    }
+  ]
+}
+```
--- a/_posts/2017-12-01-aks-storage-research.md
+++ b/_posts/2017-12-01-aks-storage-research.md
@ -0,0 +1,519 @@
+---
+layout: post
+title: "Jenkins on Kubernetes with Azure storage"
+tags:
+- aks
+- azure
+- jenkins
+- kubernetes
+---
+
+_This research was funded by [CloudBees](https://cloudbees.com/) as part of my
+work in the CTO's Office with the vague guideline of "ask interesting
+questions and then answer them." It does not represent any specific product
+direction by CloudBees and was performed with
+[Jenkins](https://jenkins.io), rather than CloudBees products, and Kubernetes
+1.8.1 on Azure._
+
+
+At [this point](/tag/azure.html) it is certainly no secret that I am fond of the
+work the Microsoft Azure team have been doing over the past couple years. While
+I was excited to announce [we had
+partnered](https://jenkins.io/blog/2016/05/18/announcing-azure-partnership/) to
+run Jenkins project infrastructure on Azure. Things didn't start to get _really_
+interesting until they announced [Azure Container
+Service](https://azure.microsoft.com/en-us/services/container-service/). A
+mostly-turn-key Kubernetes service alone was pretty interesting, but then
+"[AKS](https://azure.microsoft.com/en-us/blog/introducing-azure-container-service-aks-managed-kubernetes-and-azure-container-registry-geo-replication/)"
+was announced, bringing a, much needed, _managed_ Kubernetes resource into the
+Azure ecosystem. Long story short, thanks to Azure, I'm quite the fan of
+Kubernetes now too.
+
+Kubernetes is brilliant at a lot of things. It's easy to use, has some really
+great abstractions for common orchestration patterns, and is superb for running
+stateless applications. State**ful** applications also run fairly well on
+Kubernetes, but the challenge usually has _much_ more to do with the
+application, rather than Kubernetes. Jenkins is one of those challenging
+applications.
+
+Since Jenkins is my jam, this post covers the ins-and-outs of deploying a
+Jenkins master on Kubernetes, specifically through the lens of Azure Container
+Service (AKS).  This will cover the basic gist of running a Jenkins environment
+on Kubernetes, evaluating the different storage options for "Persistent
+Volumes" available in Azure, outlining their limitations for stateful
+applications such as Jenkins, and will conclude with some recommendations.
+
+
+* [Jenkins and the File System](#filesystem)
+* [Kubernetes Storage](#k8s-storage)
+* [Azure Disk](#azure-disk)
+* [Azure File](#azure-file)
+* [Conclusions](#conclusions)
+
+<a name="filesystem"></a>
+
+## Jenkins and the File System
+
+To understand how Jenkins relates to storage in Kubernetes, it's useful to
+first review how Jenkins utilizes its backing file system. Unlike many
+contemporary web applications, Jenkins does not make use of a relational
+database or any other off-host storage layer, but rather writes a number of
+files to the file system of the host running the master process.
+
+These files are not data files, or configuration files, in the traditional
+sense. The Jenkins master maintains an internal tree-like object model, wherein
+generally each node (object) in that tree is serialized in an XML format to the
+file system. This does not mean that every single object in memory is written
+to an XML file, but a non-trivial number of "live" objects representing
+Credentials, Agents, Projects, and other configurations, may be periodically
+written to disk at any given time.
+
+A concrete example would be: when an administrator navigates to
+`http://JENKINS_URL/manage` and changes a setting such as "Quiet Period" and
+clicks "Save", the `config.xml` file (typically) in `/var/lib/jenkins` will be
+rewritten.
+
+These files aren't typically read in any periodic fashion, they're usually
+only read when objects are loaded into memory during the initialization of Jenkins.
+
+Additionally, XML files will span a number of levels in the directory
+hierarchy. Each Job or Pipeline will have a directory in
+`/var/lib/jenkins/jobs/<jobname>` which will have subfolders containing files
+corresponding to each Run.
+
+In short, Jenkins generates a large number of little files across a broad, and
+sometimes deep, directory hierarchy. Combined with the read/write access
+patterns Jenkins has, I would consider it a "worst-case scenario" for just
+about any commonly used network-based storage solution.
+
+Perhaps some future post will more thoroughly profile the file system
+performance of Jenkins, but suffice it to say: it's complicated.
+
+
+<a name="k8s-storage"></a>
+
+## Kubernetes Storage
+
+With a bit of background on Jenkins, here's a cursory overview storage in
+Kubernetes. Kubernetes itself provides a consistent, cross-platform, interface
+primarily via three "objects" if you will: [Persistent
+Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/),
+Persistent Volume Claims, and [Storage
+Classes](https://kubernetes.io/docs/concepts/storage/storage-classes/). Without
+diving too deep into the details, workloads such as Jenkins will typically make
+a "Persistent Volume Claim", as in "hey give me something I can mount as a
+persistent file system."  Kubernetes then takes this and confers with the
+configured Storage Classes to determine how to meet that need.
+
+In Azure these claims are handled by one of two provisioners:
+
+* [Azure Disk](#azure-disk): an abstraction on top of Azure's "data disks"
+  which are attached to a Node within the cluster. These show up on the actual
+  Node as if a real disk/storage device has been plugged into the machine.
+* [Azure File](#azure-file): an abstraction on top of Azure Files Storage, which
+  is basically CIFS/SMB-as-a-Service. CIFS mounts are attached to the Node
+  within the cluster, but rapidly attachable/detachable like any other CIFS/SMB
+  mount.
+
+Both of these can be used simultaneously to provide persistence for stateful
+applications in Kubernetes running on Azure, but their performance and
+capabilities are not going to be interchangeable.
+
+
+<a name="azure-disk"></a>
+
+### Azure Disk
+
+In AKS, two Storage Classes are pre-configured by default, yet neither one is
+configured to [actually **be** the default Storage
+Class](https://github.com/Azure/AKS/issues/48):
+
+* `default`: utilizes the "Standard" storage (as in, hard drive, spinning
+  magnetic disks) model in Azure.
+* `managed-premium`: utilizes the "Premium" storage (as in, solid state
+  drives).
+
+The only real distinctions between the two which I have observed are going to be
+speed and cost.
+
+#### Limitations
+
+Regardless of whether "Standard" or "Premium" storage is used for Azure
+Disk-backed Persistent Volumes in Kubernetes (AKS or ACS) the limitations are
+the same.
+
+In my testing, the most frustrating limitation is the [fixed number of data disks which can be attached to a Virtual Machine in Azure](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general).
+
+As of this writing, the default Virtual Machine size used when provisioning AKS
+is: `Standard_D1_v2`. One vCPU and 3.5GB of memory and a data disk limit of
+**four**. Fortunately the default node count for AKS is current 3, but this
+means that a default AKS cluster cannot currently support more than 12
+Persistent Volumes backed by Azure Disk at once.
+
+An easy way to change that is to provision larger Virtual Machine sizes with
+AKS, but this **cannot be changed** once the cluster has been provisioned. For
+my research clusters I have stuck with a minimum size of `Standard_D4_v2` which
+provides up to 32 data disks per Virtual Machine, e.g.:
+`az aks create -g my-resource-group -n aks-test-cluster -s Standard_D4_v2`
+
+
+The Azure Disk Storage Class in Kubernetes also only supports the
+`ReadWriteOnce` [access mode](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes).
+In effect a Persistent Volume can only be mounted read/write by a single Node
+within the Kubernetes cluster. By understanding how Azure Disk volumes are
+provisioned and attached to Virtual Machines in Azure, this makes total sense.
+The impact of this means that the only allowable `replica` setting for any
+given workload which might use this Persistent Volume is **1**.
+
+This has one further limitation on scheduling and high-availability for
+workloads running on the cluster. Detaching and attaching disks to these
+Virtual Machines is a **slow** operation. In my experimenting this varied from
+approximately 1 to 5 minutes.
+
+For a "high availability" stateful workload, this means that a Pod dying or
+being killed by a rolling update, may incur a non-trivial outage __if__
+Kubernetes schedules that Pod for a different Node in the cluster. While there
+is support [specifying node affinity](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/)
+in Kubernetes, I have not figured out a means of encouraging Kubernetes to keep
+a workload scheduled on whichever Node has mounted the Persistent Volume.
+Though it would be possible to explicitly pin a Persistent Volume to a specific
+Node, and then pin a Pod to that Node, a lot of workload flexibility would be
+lost.
+
+
+#### Benefits
+
+It may be tempting to think at this point "Azure Disk is not good, so
+everything should just use Azure File!" But there are benefits to Azure Disk
+which should be considered. Azure Disk is, for lack of a better description, a
+big dumb block store. In that simplicity are its strengths.
+
+While Persistent Volumes backed by Azure Disk can be slow to detach or reattach
+to a Node, once they're present, they're fast. Operations like disk scans,
+small reads and writes, all _feel_ like trivially fast operations from the
+Jenkins standpoint. In my testing the difference between a Jenkins master
+running on local instance storage (the Virtual Machine's "main" disk) and
+running a Jenkins master on a partition from a Data Disk, is imperceptible.
+
+Another benefit which I didn't realize until I evaluated [Azure
+File](#azure-file) backed Persistent Volumes is that, as a big dumb block
+store, Azure Disks are essentially whatever file system format you want them to
+be. In AKS they default to `ext4` which is perfectly happy and native to me,
+meaning my Linux-based containers will make the correct assumptions about the
+underlying file system's capabilities.
+
+
+<a name="azure-file"></a>
+
+### Azure File
+
+AKS does not set up an Azure File Storage Class by default, but the Kubernetes
+versions which are available (1.7.7, 1.8.1) have the support for Azure File
+backed Persistent Volumes. In order to add the storage class, pass something
+like the following to Kubernetes via `kubectl create -f azurefile.yaml`:
+
+```yaml
+---
+kind: StorageClass
+apiVersion: storage.k8s.io/v1
+metadata:
+  name: azurefile
+  annotations:
+  labels:
+    kubernetes.io/cluster-service: 'true'
+provisioner: kubernetes.io/azure-file
+parameters:
+  storageAccount: 'mygeneralpurposestorageaccount'
+reclaimPolicy: 'Retain'
+# mountOptions are passed into mount.cifs as `-o` options
+mountOptions:
+```
+
+According to [the Azure File documentation](https://kubernetes.io/docs/concepts/storage/storage-classes/#azure-file)
+it's not necessary to specify the `storageAccount` key, but I had some
+difficulty coaxing AKS to provision an Azure Storage Account on its own, so I
+manually provisioned one within the "hidden" AKS Resource Group"
+(`MC_<group>_<aks-name>_<location>`) and entered the name into
+`azurefile.yaml`.
+
+Full disclosure: I **hate** Storage Accounts in Azure. Where nearly everything
+else in Azure rather enjoyable to use, and neatly tucked into Resource Groups,
+and have reasonable naming restrictions, Storage Accounts are crummy and live
+in an Azure _global namespace_ so if somebody else chooses the same name as what
+you want, tough luck. The reason this is somewhat relevant to the current
+discussion is that Storage Accounts _feel old_ when you use them. Everything
+about them _feels_ as if it's from a by-gone era in Azure's development (ASM).
+
+The feature used by the Azure File Storage Class is what I would describe as
+"Samba/CIFS-as-a-Service."  Kubernetes is basically utilizing the
+Microsoft-technology-equivalent of NFS.
+
+But it's not NFS, it's CIFS. And that is **important** to Linuxy container
+folks.
+
+
+#### Limitations
+
+The biggest limitations with Azure File backed Persistent Volumes in Kubernetes
+are really limitations of
+[CIFS](https://technet.microsoft.com/en-us/library/cc939973.aspx), and frankly,
+they are _infuriating_. An application like Jenkins will make, what were at one
+point, reasonable assumptions about the operation system and underlying
+file system. "If it looks like a Linux operating system, I am going to assume
+the file system supports symbolic links" comes to mind. Jenkins will attempt to
+create symbolic links when a Pipeline Run or Build completes, to update a
+`lastSuccessfulBuild` or `lastFailedBuild` symbolic link, which are useful for
+hyperlinks in the Jenkins web interface.
+
+Jenkins should no doubt be more granular and thoughtful about file system
+capabilities, but I'm willing to bet that a number of other applications which
+you might consider deploying on Kubernetes are also making assumptions along
+the lines of "it's a Linux, so it's probably a Linuxey file system" which Azure
+File backed Persistent Volumes invalidate.
+
+Volumes which are attached to the Node, are attached [with very strict
+permissions](https://github.com/kubernetes/kubernetes/issues/2630#issuecomment-344091454).
+On a Linux file system level, an Azure File backed volume attached at `/mnt/az`
+would be attached with `0700` permissions allowing _only_ root access. There
+are two options for working around this, as far as I am aware of:
+
+1. Adding a `uid=1000` to the `mountOptions` specified for the Storage Class in
+   the `azurefile.yaml` referenced above. Unfortunately this would require that
+   every container attempting to utilize Azure File backed volumes use the same
+   uid.
+1. Specifying a
+   [securityContext](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
+   for the container with: `runAsUser: 0`. This makes me feel exceptionally
+   uncomfortable, and I would absolutely not recommend running any untrusted
+   workloads on a Kubernetes cluster with this setting.
+
+
+The final, and for me the most important, limitation for Azure File backed
+storage is the performance. Presently there is [no Premium model offered for
+Azure Files Storage](https://feedback.azure.com/forums/217298-storage/suggestions/8374074-offer-premium-storage-option-for-azure-file-servic),
+which I would presume means that Azure File volumes are backed by spinning hard
+drives, rather than solid state.
+
+The performance bottleneck for Jenkins is _not_ theoretical however. With a
+totally fresh Persistent Volume Claim for a Jenkins application, the
+initialization of the application took upwards of **5-15 minutes**, namely:
+
+* 2-3 _seconds_ to create the Persistent Volume and bind it to a Node in the
+  Kubernetes cluster.
+* 3-4 minutes to "extract [Jenkins] from war file". When `jenkins.war` runs the
+  first time, it unpacks the `.war` file into `JENKINS_HOME` (usually
+`/var/lib/jenkins`) and populates `/var/lib/jenkins/war` with a number of small
+ static files. Basically, unzipping a 100MB archive which contains hundreds of
+ files.
+* 5-10 minutes from "Starting Initialization" to "Jenkins is ready." In my
+  observation this tends to be highly variable depending on the size of Jenkins
+  environment, how many plugins are loaded, and what kind of configuration XML
+  files must be loaded at initialization time.
+
+
+The closest comparison to Azure File backed storage and the performance
+challenges I have observed with it, are similar to challenges the CloudBees
+Engineering team observed with [Amazon EFS](https://aws.amazon.com/efs/) when
+it was first announced. The disk read/write patterns exhibited by Jenkins
+caused trouble on EFS as well, but that has seen marked improvement over the
+last 6 months, whereas Azure Files Storage doesn't appear to have had
+significant performance improvements in a number of years.
+
+
+#### Benefits
+
+Despite performance challenges, Azure File backed Persistent Volumes are not
+without their benefits. The most notable benefit, which is what originally
+attracted me to the Azure File Storage Class, is the support for the
+`ReadWriteMany` access mode.
+
+For some workloads, of which Jenkins is not one of them, this would enable a
+`replicas` setting greater than 1 and concurrent Persistent Volume access
+between the running containers. Even for single container workloads, this is a
+valuable setting as it allows for effectively zero-downtime rolling updates and
+re-deployments when a new Pod is scheduled on a different underlying Node.
+
+Additionally, Azure File volumes can be simultaneously mounted by other machines in the
+resource group, or even across the internet, which can be very useful for
+debugging or forensics when something goes wrong (things usually go wrong).
+Compared to an Azure Disk volume which would require a [container to be successfully
+running](https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/) in the Kubernetes environment before you could dig into the disk.
+
+
+<a name="conclusions"></a>
+
+## Conclusions
+
+Running a highly available Jenkins environment is a non-trivial exercise. One
+which requires a substantial understanding of both the nuances of how Jenkins
+interacts with the file system but also how users expect to interact with the
+system. While I was optimistic at the outset of this work that Kubernetes, or
+more specifically AKS, might significantly change the equation; it has not.
+
+To the best of my understanding, this work applies evenly to Azure Container
+Service (ACS) and Azure Container Service (AKS) (naming is hard), since both
+are using the same fundamental Kubernetes support for Azure via the Azure Disk
+and Azure File Storage Classes. Unfortunately I don't have time to do a serious
+performance analysis of Data Disks using Standard storage, Data Disks using
+Premium Storage, and Azure File mounts. I would love to see work in that area
+published by the Microsoft team though!
+
+
+At this point in time, those seeking to provision Jenkins on ACS or AKS, I
+strongly recommend using the Azure Disk Storage Class with Premium storage.
+That will not help with "high availability" of Jenkins, but at least once
+Jenkins is running, it will be running swiftly. I also recommend using [Jenkins
+Pipeline](https://jenkins.io/doc/book/pipeline) for all Jenkins-based
+workloads, not just because I fundamentally think it's a better tool than
+classic Freestyle Jobs, but it has built-in **durability**.  Using Jenkins in
+tandem with the [Azure VM Agents](https://plugins.jenkins.io/azure-vm-agents)
+plugin, workloads are kicked out to dynamically provisioned Virtual Machines,
+and when the master goes down, from which recovery can take 5-ish minutes in
+the worst case scenario, the outstanding Pipeline-based workloads will not be
+interrupted during that window.
+
+
+I still find myself excited about the potential of AKS, which is currently in
+"public preview." My recommendation to Microsoft would be to spend a
+significant amount of time investing in both storage and cluster performance to
+strongly differentiate AKS from Kubernetes provided on other clouds.
+Personally, I would love to have: faster stateful applications, auto-scaled
+Nodes based on compute (or even Data Disk limits!), and cross-location
+[Federation](https://kubernetes.io/docs/concepts/cluster-administration/federation/)
+for AKS.
+
+
+Maybe in 2018!
+
+
+
+---
+
+### Configuration
+
+Below is the configuration for the Service, Namespace, Ingress, and Stateful
+Set I used:
+
+```yaml
+---
+apiVersion: v1
+kind: "List"
+items:
+  - apiVersion: v1
+    kind: Namespace
+    metadata:
+      name: "jenkins-codevalet"
+
+  - apiVersion: v1
+    kind: Service
+    metadata:
+      name: 'jenkins-codevalet'
+      namespace: 'jenkins-codevalet'
+    spec:
+      ports:
+        - name: 'http'
+          port: 80
+          targetPort: 8080
+          protocol: TCP
+        - name: 'jnlp'
+          port: 50000
+          targetPort: 50000
+          protocol: TCP
+      selector:
+        app: 'jenkins-codevalet'
+
+  - apiVersion: extensions/v1beta1
+    kind: Ingress
+    metadata:
+      name: 'http-ingress'
+      namespace: 'jenkins-codevalet'
+      annotations:
+        kubernetes.io/tls-acme: "true"
+        kubernetes.io/ingress.class: "nginx"
+    spec:
+      tls:
+      - hosts:
+        - codevalet.io
+        secretName: ingress-tls
+      rules:
+      - host: codevalet.io
+        http:
+          paths:
+          - path: '/u/codevalet'
+            backend:
+              serviceName: 'jenkins-codevalet'
+              servicePort: 80
+
+  - apiVersion: apps/v1beta1
+    kind: StatefulSet
+    metadata:
+      name: "jenkins-codevalet"
+      namespace: "jenkins-codevalet"
+      labels:
+        name: "jenkins-codevalet"
+    spec:
+      serviceName: 'jenkins-codevalet'
+      replicas: 1
+      selector:
+        matchLabels:
+          app: 'jenkins-codevalet'
+      volumeClaimTemplates:
+        - metadata:
+            name: "jenkins-codevalet"
+            namespace: "jenkins-codevalet"
+          spec:
+            accessModes:
+              - ReadWriteOnce
+            resources:
+              requests:
+                storage: 5Gi
+      template:
+        metadata:
+          labels:
+            app: "jenkins-codevalet"
+          annotations:
+        spec:
+          securityContext:
+            fsGroup: 1000
+            # https://github.com/kubernetes/kubernetes/issues/2630#issuecomment-344091454
+            runAsUser: 0
+          containers:
+            - name: "jenkins-codevalet"
+              image: "rtyler/codevalet-master:latest"
+              imagePullPolicy: Always
+              ports:
+                - containerPort: 8080
+                  name: http
+                - containerPort: 50000
+                  name: jnlp
+              resources:
+                requests:
+                  memory: 384M
+                limits:
+                  memory: 1G
+              volumeMounts:
+                - name: "jenkins-codevalet"
+                  mountPath: "/var/jenkins_home"
+              env:
+                - name: CPU_REQUEST
+                  valueFrom:
+                    resourceFieldRef:
+                      resource: requests.cpu
+                - name: CPU_LIMIT
+                  valueFrom:
+                    resourceFieldRef:
+                      resource: limits.cpu
+                - name: MEM_REQUEST
+                  valueFrom:
+                    resourceFieldRef:
+                      resource: requests.memory
+                      divisor: "1Mi"
+                - name: MEM_LIMIT
+                  valueFrom:
+                    resourceFieldRef:
+                      resource: limits.memory
+                      divisor: "1Mi"
+                - name: JAVA_OPTS
+                  value: "-Dhudson.DNSMultiCast.disabled=true -Djenkins.CLI.disabled=true -Djenkins.install.runSetupWizard=false -Xmx$(MEM_REQUEST)m -Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85"
+```
+
--- a/images/post-images/your-reality-now/destructive-fires.jpg
+++ b/images/post-images/your-reality-now/destructive-fires.jpg