mirror of https://github.com/jenkins-infra/iep
Merge pull request #11 from olblak/iep-007
Add IEP-007: Kubernetes cluster upgrade
This commit is contained in:
commit
de212a6599
|
@ -0,0 +1,158 @@
|
|||
ifdef::env-github[]
|
||||
:tip-caption: :bulb:
|
||||
:note-caption: :information_source:
|
||||
:important-caption: :heavy_exclamation_mark:
|
||||
:caution-caption: :fire:
|
||||
:warning-caption: :warning:
|
||||
endif::[]
|
||||
|
||||
= IEP-007: Kubernetes cluster upgrade
|
||||
|
||||
:toc:
|
||||
|
||||
.Metadata
|
||||
[cols="2"]
|
||||
|===
|
||||
| IEP
|
||||
| 007
|
||||
|
||||
| Title
|
||||
| Kubernetes cluster upgrade
|
||||
|
||||
| Author
|
||||
| link:https://github.com/olblak[Olblak]
|
||||
|
||||
| Status
|
||||
| :speech_balloon: In-process
|
||||
|
||||
| Type
|
||||
| [Process]
|
||||
|
||||
| Created
|
||||
| 2016-07-04
|
||||
|===
|
||||
|
||||
|
||||
|
||||
== Abstract
|
||||
|
||||
As Kubernetes project often releases new versions, it's interesting to enjoy new features or bugfixes introduced by those new versions.
|
||||
Therefor we need an easy way to upgrade Kubernetes cluster on a regular basis.
|
||||
|
||||
|
||||
== Specification
|
||||
|
||||
Currently there are two main strategies to "upgrade" a cluster. +
|
||||
|
||||
. Upgrading an existing cluster.
|
||||
. Migrating on a second cluster.
|
||||
|
||||
As Azure do not (yet) provide any tools to upgrade existing clusters, we have to upgrade them manually. +
|
||||
It appears to be easier and safer to deploy a new cluster and re-deploy all resources on the new one.
|
||||
|
||||
As long as the new cluster stays in the same region than the previous one, we can use same blob storages and attach them to the new cluster. +
|
||||
The only important element that we loose when we migrate to a new cluster, is the cluster public IP. +
|
||||
This means that we need to update 'nginx.azure.jenkins.io' with a new public IP.
|
||||
|
||||
=== Migration Process
|
||||
|
||||
IMPORTANT: The old cluster must be kept until the new one is ready to serve requests.
|
||||
|
||||
==== Step 1: Backup
|
||||
|
||||
Ensure secrets containing Letsencrypt certificates are exported. (Require https://github.com/jenkins-infra/jenkins-infra/pull/819[#PR819]) +
|
||||
A cron job should periodically export letsencrypt certificate into `~/backup/$(CLUSTER)/secret.$(APP)-tls.yaml`
|
||||
|
||||
.Export a secret
|
||||
----
|
||||
.bin/kubectl get secret $(APPLICATION)-tls --export=true --kubeconfig .kube/$(CLUSTER).conf -o yaml > ~/backup/$(CLUSTER)/secret.$(APPLICATION)-tls.yaml
|
||||
----
|
||||
|
||||
==== Step 2: Deploy the new cluster
|
||||
|
||||
Add a second k8s resource in github.com/jenkins-infra/azure, named 'pea' (Require https://github.com/jenkins-infra/iep/pull/11[#PR11])
|
||||
|
||||
==== Step 3: Configure the new cluster
|
||||
|
||||
* Update following hieraconfig variables with new the k8s cluster information(Require PR on jenkins-infra/jenkins-infra)
|
||||
|
||||
----
|
||||
profile::kubernetes::params::clusters:
|
||||
- server: https://clusterexample1.eastus.cloudapp.azure.com
|
||||
username: clusterexample1-admin
|
||||
clustername: clusterexample1
|
||||
certificate_authority_data: ...
|
||||
client_certificate_data: ...
|
||||
client_key_data: ...
|
||||
----
|
||||
|
||||
* Run puppet agent
|
||||
* Get new public IP (Manual operation)
|
||||
----
|
||||
kubectl get service nginx --namespace nginx-ingress
|
||||
----
|
||||
* Restore backed up secrets containing Letsencrypt certificates on the new cluster (Manual operation)
|
||||
----
|
||||
.bin/kubectl apply -f ~/backup/$(OLD_CLUSTER)/secret.*-tls.yaml --kubeconfig .kube/$(CLUSTER).conf
|
||||
----
|
||||
* Validate HTTPS endpoint (Manual operation)
|
||||
----
|
||||
curl --header 'Host: plugins.jenkins.io' 'https://<new_public_ip>
|
||||
curl --header 'Host: repo-proxy.jenkins.io' 'https://<new_public_ip>
|
||||
curl --header 'Host: accounts.jenkins.io' 'https://<new_public_ip>
|
||||
----
|
||||
|
||||
==== Step 4: Update DNS Record
|
||||
|
||||
Update nginx.azure.jenkins.io with the new public IP (Require PR on jenkins-infra/jenkins-infra)
|
||||
|
||||
[NOTE]
|
||||
During DNS update, requests will be send either to the new cluster, either to the old cluster.
|
||||
Users shouldn't detect any differences.
|
||||
|
||||
==== Step 5: Remove the old cluster
|
||||
Remove k8s.tf from jenkins-infra/azure (Require PR on jenkins-infra/azure)
|
||||
|
||||
[NOTE]
|
||||
It may be safer to not automate this step, and just delete the good storage account through Azure portal
|
||||
|
||||
|
||||
==== Conclusion
|
||||
With this scenario, we shouldn't have any downtime as HTTP/HTTPS requests will almost have (depending on the service) the same response whatever we reach the old or the new cluster.
|
||||
|
||||
|
||||
== Motivation
|
||||
|
||||
As testing environments have short lives, we create them to validate deployments then we trash them so they often use the latest version available from Azure.
|
||||
This means that we may not detect issues when those versions are not aligned with production
|
||||
We also would like to enjoy improvements from bugfixes and new features.
|
||||
It's easier to follow Kubernetes documentation if we use a version close to the upstream version.
|
||||
|
||||
|
||||
== Rationale
|
||||
|
||||
There isn't any tool to easily upgrade a Kubernetes cluster on Azure so I tried to do it manually.
|
||||
|
||||
I only spent one day to try this upgrade but it wasn't trivial.
|
||||
I applied following steps manually:
|
||||
|
||||
* Update kubelet && kubectl
|
||||
* Update Kubernetes options according the version (some options were deprecated and others were new)
|
||||
* Reproduce the production cluster in order to validate the migration.
|
||||
* Restart each node (master&client) after the upgrade
|
||||
|
||||
Even after those operations I faced weird issues, so I decided to stop there and concluded that cluster migration was easier and a safer process.
|
||||
|
||||
They are several open issues regarding update procedure so I suppose that it may be a possible alternative in the future.
|
||||
|
||||
* https://github.com/Azure/ACS/issues/5[Azure/acs]
|
||||
* https://github.com/Azure/acs-engine/issues/464[Azure/ace-engine]
|
||||
|
||||
== Costs
|
||||
|
||||
It will costs a second cluster during the migration but when everything is switched to the new cluster, the previous one can be decommissioned.
|
||||
|
||||
|
||||
== Reference implementation
|
||||
|
||||
As of right now there is not reference implementation of Kubernetes Cluster upgrade
|
Loading…
Reference in New Issue