PodsDisruptionBudget: Why you will need in Kubernetes?
Introduction
Before starting, I am assuming reader have basic ideas on Kubernetes. :) Let’s understand this interesting topic in Kubernetes. As a reader, we definitely noticed the word disruption in the word PodDisruptionBudget
.
What is a Disruption?
A disruption means an event when a pod needs to be killed and re-spawned. Disruptions are unavoidable, and these would need to be handled delicately, otherwise we will face outage. We also can imagine, we have a service but none of the backing pods are available. Thus, that's why we need to improve, right?
PodDisruptionBudget
(pdb
) is a useful layer of defense provided by Kubernetes to deal with this kind of issue.
What is pdb?
pdb
stands for PodDisruptionBudget
. Kubernetes API introduced this resource in the initial release of version 1.0
. This defines the budget of voluntary disruption. In essence, a human operator is letting the cluster aware of a minimum threshold in terms of available pods that the cluster needs to guarantee in order to ensure a baseline availability or performance. The word budget is used as in error budget, in the sense that any voluntary disruption within this budget should be acceptable.
Example pdb yaml:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: pdb
spec:
minAvailable: 1
selector:
matchLabels:
app: nginx
If we take a closer look at this sample, we will notice
- It selects other resources based on labels
- It demands that there needs to be at least one pod running.
Specifying a PodDisruptionBudget:
There are three fields in a pdb
:
1) .spec.selector
(required): a selector field to specify a resource on which this pdb
is applied. A pdb
can be applied on:
- Deployment
- ReplicationController
- ReplicaSet
- StatefulSet
In this case, make a note of the controller’s .spec.selector
; the same selector goes into the PDBs .spec.selector
.
From version 1.15 PDBs support custom controllers where the scale subresource is enabled.
2) one .spec.minAvailable
or one .spec.maxUnavailable
(required. either but not together): We can define minAvailable
or maxUnavailable
in absolute digits (i.e. have at least two pods available / at most two pods unavailable) or in percentage (i.e. have at least 10% of pods available / at most 10% pods unavailable)
If you want to check and test pdb
in action, please checkout my sample repository here. If you haven’t used pdb
before, this demo will help you understand the concepts that follow.
We can define only one of maxUnavailable
and minAvailable
in a single PodDisruptionBudget
. In the examples below, “desired replicas” is the scale
of the controller managing the pods being selected by the PodDisruptionBudget
.
Example 1: With a minAvailable
of 5, evictions are allowed as long as they leave behind 5 or more healthy pods among those selected by the PodDisruptionBudget's selector
.
Example 2: With a minAvailable
of 30%, evictions are allowed as long as at least 30% of the number of desired replicas are healthy.
Example 3: With a maxUnavailable
of 5, evictions are allowed as long as there are at most 5 unhealthy replicas among the total number of desired replicas.
Example 4: With a maxUnavailable
of 30%, evictions are allowed as long as no more than 30% of the desired replicas are unhealthy.
Some condition, a disruption budget does not truly work as we expected. Suppose a node that hosts a pod from the collection may fail when the collection is at the minimum size specified in the budget, thus bringing the number of available pods from the collection below the specified size. The budget can only protect against voluntary evictions, not all causes of unavailability.
- If you set
maxUnavailable
to 0% or 0, or you setminAvailable
to 100% or the number of replicas, you are requiring zero voluntary evictions. When you set zero voluntary evictions for a workload object such as ReplicaSet, then you cannot successfully drain a Node running one of those Pods. If you try to drain a Node where an unevictable Pod is running, the drain never completes. This is permitted as per the semantics ofPodDisruptionBudget.
- When you specify the value as a percentage, it may not map to an exact number of Pods. For example, if you have 7 Pods and you set
minAvailable
to"50%"
, it's not immediately obvious whether that means 3 Pods or 4 Pods must be available. Kubernetes rounds up to the nearest integer, so in this case, 4 Pods must be available.
Let’s discuss on voluntary disruption:
Voluntary Disruption
Voluntary disruptions can take the form of:
- A node group replacement, from an incompatible change or a cluster upgrade.
- Scaling up/down nodes.
Oftentimes, the responsibility of managing an application workload is separated from the responsibility of managing the cluster, and usually picked up by separate teams such as a platform team and an application team.
There can be a conflict of interest between them:
- An application team wants their apps running all the times, with 100% availability and the endpoints as responsive as possible.
- A platform team needs to make changes to the cluster. Those changes will take down nodes, with the pods running on them as well.
A pdb
is, in all fairness, a compromise between an application team and a platform team. Application team acknowledges the necessity of having scheduled/voluntary disruption, and provides a guideline to assist in completing the rollout, which is carried out by the platform team.
Of course, there are involuntary disruptions as well, such as electricity outage or node kernel crash or admin deletes VM(Instance). pdb
won’t protect your workload from them, understandably.
Lastly,
What if you don’t have pdb
in your cluster?
Your workload might go offline when a cluster maintenance event is in place. Yes, even if you have replicas set to a value greater than 1.
Let’s consider you have a deployment of nginx
with replicas = 2
. If both of them are scheduled onto a single node, what will happen if that particular node is recycled? Downtime, until the pod is re-scheduled on another node. When pdb
is being set inside your cluster, what will happen is controller manager will honor the clause by maintaining a minimum count of available replicas.
Without pdb
, you are essentially leaving availability to probability. It probably will be fine.
Conclusion
PodDisruptionBudget
is quite important if your team has an Service Level Agreement (SLA). Granted, it is not absolutely mandatory as we discussed before - if the cluster you manage has enough spare capacity in CPU/memory, the rollout can uneventfully finish without impacting the workload more often than not. Nevertheless, it is still a recommended approach to have control in the event of a voluntary disruption.