ML Infra| Taints & Tolerations
In Kubernetes, tolerations are used to indicate that a pod can tolerate (or ignore) nodes with certain conditions, such as node maintenance or taints. Let’s break down the fields:
- key: Specifies the key of the taint on the node.
- operator: Specifies how the key-value pair should be compared. Common operators include
Equal
,Exists
,In
,NotIn
, etc. - value: Specifies the value associated with the key. It’s used for comparison with the taint’s value on nodes.
- effect: Specifies the effect of the taint. Possible values are
NoSchedule
,PreferNoSchedule
, andNoExecute
. tells what to do if pod does not tolerate the taint.
tolerations:
- effect: "NoExecute"
key: "x-cluster"
operator: "Equal"
value: "generic-value"
This tolerations
section specifies that pods can tolerate nodes that have a taint with key x-cluster, operator Equal
(meaning the value must exactly match), and value generic-value. The effect
is NoExecute
, which means pods will not be scheduled or evicted from nodes with this taint.
Eg: Bugs are like pods and a person is like a node. To keep the bugs (pods) away from the person, spray(taint) is used. Bugs that are tolerant to the smell(tolerance) of the spray will stay on the person (node), while bugs that are intolerant to the smell will go away.
No unwanted pods can be scheduled on the node if the taint is not tolerated. To allow the pods on the node, tolerations are added. Intolerant pods will go to nodes that are available at that time and do not have the taint.
## Example to taint a node:
kubectl taint nodes node_name key=value:effect
eg: kubectl taint nodes node_name x-cluster=generic-value:NoExecute
## How to check node labels
kubectl get nodes — show-labels
## How to check nodes present in Kubernetes cluster
kubectl get nodes ## (to check all nodes)
kubectl get nodes -n namespace ## (to check all nodes in certain namespace)
## Check the node specifications
kubectl describe nodes
NOTE: Taints and tolerations does not tell pod to go on a particular node but tells a node to accept certain pods. Taints are set on nodes and tolerations are set on pods.
Pods with useTolerations : False
Assume you have the following nodes:
node1
andnode2
are untainted but low on memory.node3
is a high-memory node with a taintmemory=high:NoSchedule
.
If you deploy a pod without tolerations and useTolerations: False
In this scenario:
- Kubernetes will try to schedule the pod on
node1
andnode2
but will fail due to insufficient memory. - The pod will not be scheduled on
node3
because it does not have the necessary toleration for thememory=high:NoSchedule
taint. - The pod will remain in a
Pending
state.
To avoid this situation, you should use tolerations to allow your pod to be scheduled on the tainted high-memory nodes. Here’s how you add the necessary toleration.
By adding this toleration, you ensure that the pod can be scheduled on node3
, which has enough memory, instead of remaining in a Pending
state.
Difference between different effects:
In Kubernetes, taints and tolerations are mechanisms used to control which pods can be scheduled on specific nodes. Taints are applied to nodes, and tolerations are applied to pods. There are three main effects of taints: NoSchedule
, PreferNoSchedule
, and NoExecute
. Here’s a detailed explanation of each:
1. NoSchedule
- Effect: Pods that do not tolerate this taint will not be scheduled on the node.
- Behavior: If a node is tainted with
NoSchedule
, the scheduler will not place any new pods on this node unless they have a toleration for the specific taint. - Use Case: Useful when you want to reserve certain nodes for specific workloads and prevent other pods from being scheduled there.
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule
- Effect: The scheduler will try to avoid placing pods that do not tolerate this taint on the node, but it’s not a strict requirement.
- Behavior: If a node is tainted with
PreferNoSchedule
, the scheduler will prefer not to schedule new pods on this node unless there are no other options. - Use Case: Useful for soft constraints where you want to steer pods away from certain nodes but allow scheduling if necessary.
Example:
kubectl taint nodes <node-name> key=value:PreferNoSchedule
Pod toleration:
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "PreferNoSchedule"
3. NoExecute
- Effect: Pods that do not tolerate this taint will be evicted from the node immediately. Additionally, new pods that do not tolerate this taint will not be scheduled on the node.
- Behavior: If a node is tainted with
NoExecute
, any existing pods on that node that do not have a toleration for the taint will be evicted, and new pods without the toleration will not be scheduled on the node. - Use Case: Useful when you want to immediately evacuate and prevent scheduling on a node for maintenance, decommissioning, or isolation purposes.
Example:
kubectl taint nodes <node-name> key=value:NoExecute
Pod toleration:
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoExecute"
Summary
- NoSchedule: Prevents new pods from being scheduled on the node unless they tolerate the taint.
- PreferNoSchedule: Attempts to avoid scheduling new pods on the node, but does not enforce it strictly.
- NoExecute: Evicts existing pods that do not tolerate the taint and prevents new ones from being scheduled.
Each effect serves a different purpose in managing node utilization and ensuring that workloads are placed appropriately within the cluster.
References:
https://www.youtube.com/watch?v=mo2UrkjA7FE