iep/iep-007
olblak 1897ab61a3 IEP-007: Update HTTPS endpoint validation command 2018-01-12 10:04:57 +01:00
..
README.adoc IEP-007: Update HTTPS endpoint validation command 2018-01-12 10:04:57 +01:00

README.adoc

<html lang="en"> <head> </head>

IEP-007: Kubernetes cluster upgrade

Table 1. Metadata

IEP

007

Title

Kubernetes cluster upgrade

Author

Olblak

Status

💬 In-process

Type

[Process]

Created

2016-07-04

Abstract

As Kubernetes project often releases new versions, its interesting to enjoy new features or bugfixes introduced by those new versions. Therefor we need an easy way to upgrade Kubernetes cluster on a regular basis.

Specification

Currently there are two main strategies to "upgrade" a cluster.

  1. Upgrading an existing cluster.

  2. Migrating on a second cluster.

As Azure do not (yet) provide any tools to upgrade existing clusters, we have to upgrade them manually.
It appears to be easier and safer to deploy a new cluster and re-deploy all resources on the new one.

As long as the new cluster stays in the same region than the previous one, we can use same blob storages and attach them to the new cluster.
The only important element that we loose when we migrate to a new cluster, is the cluster public IP.
This means that we need to update 'nginx.azure.jenkins.io' with a new public IP.

Migration Process

Important
The old cluster must be kept until the new one is ready to serve requests.

Step 1: Backup

Ensure secrets containing Letsencrypt certificates are exported. (Require #PR819)
A cron job should periodically export letsencrypt certificate into ~/backup/$(CLUSTER)/secret.$(APP)-tls.yaml

Export a secret
.bin/kubectl get secret $(APPLICATION)-tls --export=true --kubeconfig .kube/$(CLUSTER).conf  -o yaml > ~/backup/$(CLUSTER)/secret.$(APPLICATION)-tls.yaml

Step 2: Deploy the new cluster

Add a second k8s resource in github.com/jenkins-infra/azure, named 'pea' (Require #PR11)

Step 3: Configure the new cluster

  • Update following hieraconfig variables with new the k8s cluster information(Require PR on jenkins-infra/jenkins-infra)

profile::kubernetes::params::clusters:
    - server: https://clusterexample1.eastus.cloudapp.azure.com
      username: clusterexample1-admin
      clustername: clusterexample1
      certificate_authority_data: ...
      client_certificate_data: ...
      client_key_data: ...
  • Run puppet agent

  • Get new public IP (Manual operation)

kubectl get service nginx --namespace nginx-ingress
  • Restore backed up secrets containing Letsencrypt certificates on the new cluster (Manual operation)

.bin/kubectl apply -f ~/backup/$(OLD_CLUSTER)/secret.*-tls.yaml --kubeconfig .kube/$(CLUSTER).conf
  • Validate HTTPS endpoint (Manual operation)

curl -I https://jenkins.io --resolve "jenkins.io:443:<new_ingress_ip>"
curl -I https://plugins.jenkins.io --resolve "plugins.jenkins.io:443:<new_ingress_ip>"
curl -I https://repo.azure.jenkins.io --resolve "repo.azure.jenkins.io:443:<new_ingress_ip>"
...

Step 4: Update DNS Record

Update nginx.azure.jenkins.io with the new public IP (Require PR on jenkins-infra/jenkins-infra)

Note
During DNS update, requests will be send either to the new cluster, either to the old cluster. Users shouldnt detect any differences.

Step 5: Remove the old cluster

Remove k8s.tf from jenkins-infra/azure (Require PR on jenkins-infra/azure)

Note
It may be safer to not automate this step, and just delete the good storage account through Azure portal

Conclusion

With this scenario, we shouldnt have any downtime as HTTP/HTTPS requests will almost have (depending on the service) the same response whatever we reach the old or the new cluster.

Motivation

As testing environments have short lives, we create them to validate deployments then we trash them so they often use the latest version available from Azure. This means that we may not detect issues when those versions are not aligned with production We also would like to enjoy improvements from bugfixes and new features. Its easier to follow Kubernetes documentation if we use a version close to the upstream version.

Rationale

There isnt any tool to easily upgrade a Kubernetes cluster on Azure so I tried to do it manually.

I only spent one day to try this upgrade but it wasnt trivial. I applied following steps manually:

  • Update kubelet && kubectl

  • Update Kubernetes options according the version (some options were deprecated and others were new)

  • Reproduce the production cluster in order to validate the migration.

  • Restart each node (master&client) after the upgrade

Even after those operations I faced weird issues, so I decided to stop there and concluded that cluster migration was easier and a safer process.

They are several open issues regarding update procedure so I suppose that it may be a possible alternative in the future.

Costs

It will costs a second cluster during the migration but when everything is switched to the new cluster, the previous one can be decommissioned.

Reference implementation

As of right now there is not reference implementation of Kubernetes Cluster upgrade

</html>