ML Infra| Taints & Tolerations

Swatimeena
4 min readJun 21, 2024

--

Step by step guide to deploy AI/ML Applications on Kuberenetes

In Kubernetes, tolerations are used to indicate that a pod can tolerate (or ignore) nodes with certain conditions, such as node maintenance or taints. Let’s break down the fields:

  • key: Specifies the key of the taint on the node.
  • operator: Specifies how the key-value pair should be compared. Common operators include Equal, Exists, In, NotIn, etc.
  • value: Specifies the value associated with the key. It’s used for comparison with the taint’s value on nodes.
  • effect: Specifies the effect of the taint. Possible values are NoSchedule, PreferNoSchedule, and NoExecute. tells what to do if pod does not tolerate the taint.
tolerations:
- effect: "NoExecute"
key: "x-cluster"
operator: "Equal"
value: "generic-value"

This tolerations section specifies that pods can tolerate nodes that have a taint with key x-cluster, operator Equal (meaning the value must exactly match), and value generic-value. The effect is NoExecute, which means pods will not be scheduled or evicted from nodes with this taint.

Eg: Bugs are like pods and a person is like a node. To keep the bugs (pods) away from the person, spray(taint) is used. Bugs that are tolerant to the smell(tolerance) of the spray will stay on the person (node), while bugs that are intolerant to the smell will go away.

No unwanted pods can be scheduled on the node if the taint is not tolerated. To allow the pods on the node, tolerations are added. Intolerant pods will go to nodes that are available at that time and do not have the taint.

## Example to taint a node:

kubectl taint nodes node_name key=value:effect 
eg: kubectl taint nodes node_name x-cluster=generic-value:NoExecute

## How to check node labels

kubectl get nodes — show-labels

## How to check nodes present in Kubernetes cluster

kubectl get nodes ## (to check all nodes) 
kubectl get nodes -n namespace ## (to check all nodes in certain namespace)

## Check the node specifications

kubectl describe nodes

NOTE: Taints and tolerations does not tell pod to go on a particular node but tells a node to accept certain pods. Taints are set on nodes and tolerations are set on pods.

Kubernetes : Taints and Tolerations

Pods with useTolerations : False

Assume you have the following nodes:

  • node1 and node2 are untainted but low on memory.
  • node3 is a high-memory node with a taint memory=high:NoSchedule.

If you deploy a pod without tolerations and useTolerations: False

In this scenario:

  • Kubernetes will try to schedule the pod on node1 and node2 but will fail due to insufficient memory.
  • The pod will not be scheduled on node3 because it does not have the necessary toleration for the memory=high:NoSchedule taint.
  • The pod will remain in a Pending state.

To avoid this situation, you should use tolerations to allow your pod to be scheduled on the tainted high-memory nodes. Here’s how you add the necessary toleration.

By adding this toleration, you ensure that the pod can be scheduled on node3, which has enough memory, instead of remaining in a Pending state.

Difference between different effects:

In Kubernetes, taints and tolerations are mechanisms used to control which pods can be scheduled on specific nodes. Taints are applied to nodes, and tolerations are applied to pods. There are three main effects of taints: NoSchedule, PreferNoSchedule, and NoExecute. Here’s a detailed explanation of each:

1. NoSchedule

  • Effect: Pods that do not tolerate this taint will not be scheduled on the node.
  • Behavior: If a node is tainted with NoSchedule, the scheduler will not place any new pods on this node unless they have a toleration for the specific taint.
  • Use Case: Useful when you want to reserve certain nodes for specific workloads and prevent other pods from being scheduled there.
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule
  • Effect: The scheduler will try to avoid placing pods that do not tolerate this taint on the node, but it’s not a strict requirement.
  • Behavior: If a node is tainted with PreferNoSchedule, the scheduler will prefer not to schedule new pods on this node unless there are no other options.
  • Use Case: Useful for soft constraints where you want to steer pods away from certain nodes but allow scheduling if necessary.

Example:

kubectl taint nodes <node-name> key=value:PreferNoSchedule

Pod toleration:

tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "PreferNoSchedule"

3. NoExecute

  • Effect: Pods that do not tolerate this taint will be evicted from the node immediately. Additionally, new pods that do not tolerate this taint will not be scheduled on the node.
  • Behavior: If a node is tainted with NoExecute, any existing pods on that node that do not have a toleration for the taint will be evicted, and new pods without the toleration will not be scheduled on the node.
  • Use Case: Useful when you want to immediately evacuate and prevent scheduling on a node for maintenance, decommissioning, or isolation purposes.

Example:

kubectl taint nodes <node-name> key=value:NoExecute

Pod toleration:

tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoExecute"

Summary

  • NoSchedule: Prevents new pods from being scheduled on the node unless they tolerate the taint.
  • PreferNoSchedule: Attempts to avoid scheduling new pods on the node, but does not enforce it strictly.
  • NoExecute: Evicts existing pods that do not tolerate the taint and prevents new ones from being scheduled.

Each effect serves a different purpose in managing node utilization and ensuring that workloads are placed appropriately within the cluster.

References:
https://www.youtube.com/watch?v=mo2UrkjA7FE

https://kubernetes.io/docs/tutorials/kubernetes-basics/

--

--

Swatimeena
Swatimeena

Written by Swatimeena

Senior Product Engineer@Sprinklr | IIT Bombay | IIT (ISM) Dhanbad

No responses yet