iep/iep-004
Olivier Vernin 3f8e6be3dc Merge remote-tracking branch 'upstream/kubernetes-004' into kubernetes-004 2017-03-17 16:36:17 +01:00
..
images Add images for enabling Custom Logs functionality for an Azure account 2017-01-24 09:29:04 -08:00
README.adoc Merge remote-tracking branch 'upstream/kubernetes-004' into kubernetes-004 2017-03-17 16:36:17 +01:00

README.adoc

<html lang="en"> <head> </head>

IEP-4: Kubernetes for hosting project applications

Table 1. Metadata

IEP

4

Title

Kubernetes for hosting project applications

Author

R. Tyler Croy, Olivier Vernin

Status

💬 In-process

Type

Architecture

Created

2016-12-15

Abstract

The Jenkins project hosts a number of applications, using different technology stacks, with Docker containers as the packaging and runtime environment. While the merits of using Docker containers to run these applications is not the subject of this IEP document, the hosting and deployment of these containers is. In essence, some tooling for safely hosting, deploying, and monitoring containers within the Jenkins infrastructure is necessary in order to support the applications the project requires.

Specification

To support aforementioned containerized applications, Kubernetes (also referred to as "k8s") will be deployed using the Azure Container Service (ACS). This specification will detail the initial Kubernetes cluster architecture but is not prescriptive on how the cluster should grow/shrink as requirements change for the applications hosted.

This specification only outlines the architecture for the "Public Production" [1] Kubernetes cluster. It is expected that the project will run multiple Kubernetes clusters, following this architecture, depending on the access control requirements for each discrete Kubernetes.

At a high level Kubernetes is a master/agent architecture which would live in a single region. For the purposes of Jenkins infrastructure, the production Kuberenetes clusters would be located in the "East US" Azure region. [2]

The virtual machines are all D-series instances in order to provide an ideal balance of cost and performance. For more, see Costs below.

Kubernetes master

A cluster would run a single D3v2 instance as the master node

Kubernetes agents

A cluster would run agents using a Scale Set of D2v2 instances, with a minimum of three agents running at all times.

Note

Following IEP-3 the Reference Implementation uses Terraform to describe the infrastructure necessary to support this specification.

Monitoring

As the Jenkins project already uses Datadog for monitoring our production infrastructure, the Kubernetes cluster must integrate appropriately.

This will be accomplished using the Datadog/Kubernetes integration maintained by Datadog themselves.

The integration is will provide metrics for both:

  • Kubernetes cluster health/status

  • Container health/status running atop the cluster

The metrics required are not specified here under the assumption that all metrics appropriate for ensuring stable service levels of project infrastructure will be collected.

Logging

Centralized logging for applications hosted within Kubernetes will be provided a combination of containers in the cluster running Fluentd and the Microsoft Operations Management Suite (OMS).

Fluentd container will redirect logs based on rules defined by logs type.

As a first iteration, we identify two logs type, archive and stream. They are explain below.

Type

Stream

'Stream' means that logs are directly send to log analytics where they will be available for a short time period (7 days). After what they will be definitively deleted.

Reasons why we consider logs as 'stream' are:

  • Costs, we dont want to pay for useless logs' storage

  • Debugging, we may have to analyze applications behaviour

In order to retrieve logs informations, well need an access to log analytics dashboard.

! This is the default behaviour followed by all logs types

Archive

'Archive' means that we want to access them for a long time period. We store them on an azure blob storage (or shared disk). Those logs will be kept on Azure containers for an undetermined period. In order to retrieved them, well have to request compressed archives from an admin.

Reasons why we may consider logs as 'archive' are:

  • Need long time period access

  • Want to backup important informations

Logs work as follow:

  • Docker containers write logs to json files located in /var/log/containers/ on each kubernetes agent

  • Each kubernetes agent run one fluentd container( as a daemonset) that read logs from /var/log/containers and apply some 'rules'

Data flow for k8s logs
+--------------------------------------------------------------+
| K8s Agent:                                                   |
|                                            +------------+    |
|                                            |Container_A |    |
|                                            |            |    |
| Agent Filesystem:                          +---------+--+    |
| +--------------------+      <send_logs_to            |       |
| |/var/log/containers |<------------------------------+       |
| +----------+---------+                               |       |
|            |                               +---------+--+    |
|            |                               |Container_B |    |
| Fetch_logs |                               |            |    |
|            v                               +------------+    |         +--------------------+
|      +----------+    apply_rule_1_stream_logs_to --------------------->| Azure LogAnalytics |
|      |Fluentd   +-------------------------------/            |         +--------------------+
|      |Container +-------------------------------\            |         +--------------------+
|      +----------+   apply_rule_0_archive_logs_to --------------------->| Azure Blob Storage |
|                                                              |         +--------------------+
+--------------------------------------------------------------+

In order to know which workflow need to be apply. We use kubernetes lables.

By convention we use label 'logtype'.

If logtype == 'archive', we apply 'archive' workflow. Otherwise we apply 'stream' workflow.

pros:

  • We dont have to modify default logging configuration.

  • We dont have to rebuild docker image when we change log type.

  • We dont have to restart docker container when we modify log type.

  • Easy to handle from fluentd configuration.

cons:

  • We cant have different logs types within an application

A docker image that implement this workflow can be found in Olivier Vernins fluentd-k8s-azure repository.

Deployment/Orchestration

As we made the decision to use a kubernetes infrastructure,
We still need processes and tools to automate and tests kubernetes deployment.

Kubernetes use :

  • Yaml files to define infrastructure state

  • Kubectl cli to interact with kubernetes cluster (CRUD operations)

Because using kubectl to deploy yaml files is idempotent,we can easily script it.

We investigated two approaches to automate k8s deployment

  1. Using Jenkins + scripting to apply configurations

  2. Using Puppet to apply configurations

Jenkins

Each time we need to apply modifications, we just have to create/update yaml configurations files and let jenkins deploy it.
Fairly easy as kubectl is idempotent so we just have to follow this process.

Jenkins
+-------------+                           +----------------+
|             |                           |     Github     |
| Contributor +-------------------------> |  jenkins-infra |
|             |       Commit              |                |
+-------------+        code               +--------+-------+
                                                   |
                                                   |   Trigger
+-----------------+                                | Test&Deploy
| K8s cluster     |                                v
| +-----+ +-----+ |                       +--------+-------+
| |Node | |Node | |                       |     Jenkins    |
| |  1  | | 2   | <-----------------------|   Jenkinsfile  |
| +-----+ +-----+ | Apply configurations  +----------------+
+-----------------+

But:

  • How do we share/publish secret informations, credentials,…​?
    A solution would be to encrypt secrets with password or gpg keys before pushing them on git repository. Jenkins will have to unencrypt them before deploying them on kubernetes cluster.

  • How do we handle resources ordering?
    We can use naming conventions to be sure that resource 00-secret.yaml will be deploy before 01-daemonset.yaml

  • Is some case, complexe logics need to be apply to achieve correct deployments.
    Which can be done through scripting like bash,python,…​
    Ex: Updating k8s secrets, do not update secrets used in containers applications. which mean that each time we update secrets, we also have to take care of pods using them

Problems explained above are common concerns that configuration management tools try to solve.

Puppet

We may also use puppet to template and apply kubernetes configurations files. Main advantages: - We already have a puppet environment configured and correctly working - We already have a good testing process with puppet, linting, rspec,…​ - We already have a deployment workflow, feature branch → staging → production - We can use hiera to store secrets - We can use puppet to define complexe scenarios

Puppet
+-------------+                      +----------------+
|             |                      |     Github     |
| Contributor +--------------------> |  jenkins-infra |
|             |       Commit         |                |
+-------------+        code          +--------+-------+
                                              |
                                              | Trigger
+-----------------+                           |  Test
| K8s cluster     |                           |
| +-----+ +-----+ |                  +--------v-------+
| |Node | |Node | |                  |     Jenkins    |
| |  1  | | 2   | |                  |   Jenkinsfile  |
| |     | |     | |                  +--------+-------+
| +-----+ +-----+ |                           |
+--------+--------+                           |   Merge
         ^                                    | Production
         |                                    |
         |                           +--------v-------+
         |                           |     Puppet     |
         +---------------------------+     Master     |
            Apply configurations     +----------------+

Conclusion

We agreed that we gonna use puppet to deploy kubernetes configurations. If needed we are still able to use another solution.

Motivation

The motivation for centralizing container hosting is fairly self-evident. Consistency of management, deployment, logging, monitoring, and runtime environment will be a major time-saver for volunteers participating in the Jenkins project.

Additionally, consolidation on a well understood and supported tool (Kuberenetes) allows the infrastructure team to spend less time operating the underlying hosting platform.

Rationale

As mentioned in the Abstract, the Jenkins project runs containerized applications, the merits of which are outside the scope of this document. Thusly this document outlines an approach for managing numerous containers in Azure.

There is a fundamental assumption being made in using Azure Container Service, that is: its cheaper/easier/faster to use a "turn-key" solution for building and running a container orchestrator (e.g. Kubernetes) than it would be to build out such a cluster ourselves using virtual machines and Puppet (for example).

With this assumption, the options provided by ACS are: Kubernetes, Docker Swarm, or DC/OS.

The selection for Kubernetes largely rests on two criteria:

  1. Kubernetes is supported in some form by two of the three major cloud vendors (Microsoft, Google). Which indicates project maturity and long-term support but also flexibility for the Jenkins project to migrate to alternative cloud vendors if the need were to arise.

  2. Developer preference: we prefer Kubernetes and the tooling it provides over the alternatives.

Docker Swarm

Docker Swarm is the leading option, behind Kubernetes, But the open source "swarm mode" functionality is not supported by Azure Container Service, nor is Docker Swarm supported by any other vendor other than Microsoft at this point.

The focus from Docker, Inc. seems to be more on products such as Docker Datacenter long-term, which makes choosing Docker Swarm on ACS seem risky.

DC/OS

Similar to Docker Swarm on ACS, there is no mainstream support for DC/OS on other cloud providers which suggests either immaturity in the project or lack of long-term committment by platform vendors to support it.

Additionally, at this point in time, the authors of this document do not know anybody committed to running production workloads on DC/OS (were certain they exist however).

Costs

ACS is a free service that clusters Virtual Machines (VMs) into a container service. You only pay for the VMs and associated storage and networking resources consumed.

Assuming a single minimally scaled cluster with a single master and three agents, the annual cost of the Kubernetes cluster itself would be: $3,845.64. Obviously as the number of agents increases, the cost will increase per-agent instance.

Table 2. Costs
Instance Annual Cost (East US)

D2v2

$1278.96

D3v2

$2566.68

Reference Implementation

The current reference implementation is authored by Olivier Vernin in pull request #5 to the azure repository.

</html>