With so many settings and setup options, using Kubernetes can be very challenging. Kubernetes may guarantee that your application is up and running, but it does not guarantee that it is functioning well. Additionally, you must consider the fact that Kubernetes cluster upgrades, application updates, and some unexpected occurrences like node deletion, node drain, kernel panic, and hypervisor failure will constantly disrupt your applications.
Following are some parameters that can be used to ensure zero-downtime on kubernetes at most.
Table of Content
Health Checks:
Although your pod is running, the application inside of it is no longer able to manage traffic. It happens often. It's a crucial component of the configuration. Consider installing a readiness probe and a liveness probe. They have definite objectives.[link]
Readiness: K8s performs some checks to determine whether or not the application is ready to receive traffic. If the readiness probe is not set, Kubernetes will presume that the application is prepared to accept traffic as soon as the container starts.
Therefore, if the container takes a while to start, all requests will be rejected. Not just at the initial step, but during the whole life of the pod, readiness is practiced.
Liveness: K8s runs several tests to determine whether or not the container has to be restarted. Your failing container will be restarted by the Liveness probe. Only when the program is no longer responding should the Liveness probe be utilized as a recovery technique. Remember not to use liveness to handle fatal errors on your application.
Sample of Health checks
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
[...]
spec:
containers:
- name: nginx
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthz
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
Graceful Shutdown:
Because Pods represent the processes that are executing on cluster nodes, it's crucial to support graceful process termination for those processes when they are no longer required. If your program manages lengthy transactions, these transactions will be aborted, which can cause problems when you deploy a new release. When the pod is terminating, you can issue a better termination command than SIGTERM or add a preStop handler to give the process time to complete. [link]
Replicas:
Need to configure more then one replica on deployment. Recommended is 3. WHY?
To be immune to interruption. A K8s cluster is continuously changing, and it is hosted by a provider who changes along with it. For instance, GKE is updated each week [link]. You must therefore consider the possibility of frequent deletion of your pod. [link]
Sample for replicas:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 3
CPU & Memory Requests:
Your pod will be assigned computational resources such as CPU and memory. Kubernetes determines you require 0 when you don't configure CPU or memory requests. As a result, you might put your pods on a node that is being overloaded by another application. This can result in your program using too little resources or running out of memory. [link]
The recommended setting for CPU and memory requests is to be as low as feasible; the limit is merely a suggestion. You can find those values by running the command kubectl top pods or by using your preferred monitoring tool, such as Prometheus, Datadog, Stackdriver for GKE.
Sample to set CPU and Memory:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
template:
spec:
containers:
- name: nginx
resources:
requests:
cpu: 100m #can be 1 or 1000m
memory: 300Mi #can be 2000Mi or 2Gi
limits:
cpu: 1 #can be 1 or 1000m
memory: 1Gi #can be 2000Mi or 2Gi
PodAntiAffinity:
Even if you have many replicas configured, they shouldn't all be on the same node, which is the server hosting the pods. Because your pods replicas are hosted on this server, if the node goes down, your application would as well [link]. The majority of the time, you must configure podAntiAffinity. Useful Links [OpenshiftRedHat]
Sample of PodAntiAffinity
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
[...]
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: "kubernetes.io/hostname"
containers:
[...]
Rolling Update policy:
By gradually replacing old Pods instances with new ones, rolling updates enable deployments to happen with no downtime. A new version of the program can be deployed in conjunction with Health Checks without any interruptions.
Although default settings are often perfect, you sometimes need to alter them. For instance, your program might not be able to execute two different versions simultaneously. Useful Links [Github] [weaveworks]
Sample of Rolling Strategy:
Number of pods set in replicas is always available.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 #can be integer or in percentage
maxUnavailable: 0 #can be integer or in percentage
Horizontal Pod Autoscaling (HPA):
Your fixed replicas configuration may occasionally be unable to handle a high volume of queries. Even if every pod is operating as it should, this could cause problems. Benefiting from the HPA is one of Kubernetes' outstanding advantages. [link]. The Kubernetes autoscaling method is HPA. To properly handle the traffic and adjust the prices as well, it is advisable to set a basic HPA based on CPU real utilization.
Sample of Horizontal Pod Autoscaling
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: nginx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deployment
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
The parameters used in this configuration are as follows:
spec.scaleTargetRef
: A named reference to the resource being scaled.spec.minReplicas
: The lower limit for the number of replicas to which the autoscaler can scale down.spec.maxReplicas
: The upper limit.spec.metrics.type
: The metric to use to calculate the desired replica count. This example is using theResource
type, which tells the HPA to scale the deployment based on averageCPU
(or memory) utilization.averageUtilization
is set to a threshold value of50
.