Pod Affinity and Anti-Affinity

Pods can be constrained to run on specific nodes or under specific circumstances. This can include cases where you want only one application pod running per node or want pods to be paired together on a node. Additionally, when using node affinity pods can have preferred or mandatory restrictions.

For this lesson, we'll focus on inter-pod affinity and anti-affinity by scheduling the checkout-redis pods to run only one instance per node and by scheduling the checkout pods to only run one instance of it on nodes where a checkout-redis pod exists. This will ensure that our caching pods (checkout-redis) run locally with a checkout pod instance for best performance.

The first thing we want to do is see that the checkout and checkout-redis pods are running:

~$kubectl get pods -n checkout

NAME                              READY   STATUS    RESTARTS   AGE

checkout-698856df4d-vzkzw         1/1     Running   0          125m

checkout-redis-6cfd7d8787-kxs8r   1/1     Running   0          127m

We can see both applications have one pod running in the cluster. Now, let's find out where they are running:

~$kubectl get pods -n checkout \

-o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.nodeName}{"\n"}'

checkout-698856df4d-vzkzw       ip-10-42-11-142.us-west-2.compute.internal

checkout-redis-6cfd7d8787-kxs8r ip-10-42-10-225.us-west-2.compute.internal

Based on the results above, the checkout-698856df4d-vzkzw pod is running on the ip-10-42-11-142.us-west-2.compute.internal node and the checkout-redis-6cfd7d8787-kxs8r pod is running on the ip-10-42-10-225.us-west-2.compute.internal node.

note

In your environment the pods may be running on the same node initially

Let's set up a podAffinity and podAntiAffinity policy in the checkout deployment to ensure that one checkout pod runs per node, and that it will only run on nodes where a checkout-redis pod is already running. We'll use the requiredDuringSchedulingIgnoredDuringExecution to make this a requirement, rather than a preferred behavior.

The following kustomization adds an affinity section to the checkout deployment specifying both podAffinity and podAntiAffinity policies:

Kustomize Patch
Deployment/checkout
Diff

~/environment/eks-workshop/modules/fundamentals/affinity/checkout/checkout.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout
  namespace: checkout
spec:
  template:
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app.kubernetes.io/component
                    operator: In
                    values:
                      - redis
              topologyKey: kubernetes.io/hostname
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app.kubernetes.io/component
                    operator: In
                    values:
                      - service
                  - key: app.kubernetes.io/instance
                    operator: In
                    values:
                      - checkout
              topologyKey: kubernetes.io/hostname

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/created-by: eks-workshop
    app.kubernetes.io/type: app
  name: checkout
  namespace: checkout
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: service
      app.kubernetes.io/instance: checkout
      app.kubernetes.io/name: checkout
  template:
    metadata:
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/port: "8080"
        prometheus.io/scrape: "true"
      labels:
        app.kubernetes.io/component: service
        app.kubernetes.io/created-by: eks-workshop
        app.kubernetes.io/instance: checkout
        app.kubernetes.io/name: checkout
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app.kubernetes.io/component
                    operator: In
                    values:
                      - redis
              topologyKey: kubernetes.io/hostname
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app.kubernetes.io/component
                    operator: In
                    values:
                      - service
                  - key: app.kubernetes.io/instance
                    operator: In
                    values:
                      - checkout
              topologyKey: kubernetes.io/hostname
      containers:
        - envFrom:
            - configMapRef:
                name: checkout
          image: public.ecr.aws/aws-containers/retail-store-sample-checkout:1.2.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 3
          name: checkout
          ports:
            - containerPort: 8080
              name: http
              protocol: TCP
          resources:
            limits:
              memory: 512Mi
            requests:
              cpu: 250m
              memory: 512Mi
          securityContext:
            capabilities:
              drop:
                - ALL
            readOnlyRootFilesystem: true
          volumeMounts:
            - mountPath: /tmp
              name: tmp-volume
      securityContext:
        fsGroup: 1000
      serviceAccountName: checkout
      volumes:
        - emptyDir:
            medium: Memory
          name: tmp-volume

         app.kubernetes.io/created-by: eks-workshop
         app.kubernetes.io/instance: checkout
         app.kubernetes.io/name: checkout
     spec:
+      affinity:
+        podAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            - labelSelector:
+                matchExpressions:
+                  - key: app.kubernetes.io/component
+                    operator: In
+                    values:
+                      - redis
+              topologyKey: kubernetes.io/hostname
+        podAntiAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            - labelSelector:
+                matchExpressions:
+                  - key: app.kubernetes.io/component
+                    operator: In
+                    values:
+                      - service
+                  - key: app.kubernetes.io/instance
+                    operator: In
+                    values:
+                      - checkout
+              topologyKey: kubernetes.io/hostname
       containers:
         - envFrom:
             - configMapRef:
                 name: checkout

In the above manifest, the podAffinity section ensures:

Checkout pods will only be scheduled on nodes where Redis pods are running.
This is enforced by matching pods with label app.kubernetes.io/component: redis.
The topologyKey: kubernetes.io/hostname ensures this rule applies at the node level.

The podAntiAffinity section ensures:

Only one checkout pod runs per node.
This is achieved by preventing pods with labels app.kubernetes.io/component: service and app.kubernetes.io/instance: checkout from running on the same node.

To make the change, run the following command to modify the checkout deployment in your cluster:

~$kubectl delete -n checkout deployment checkout

~$kubectl apply -k ~/environment/eks-workshop/modules/fundamentals/affinity/checkout/

namespace/checkout unchanged

serviceaccount/checkout unchanged

configmap/checkout unchanged

service/checkout unchanged

service/checkout-redis unchanged

deployment.apps/checkout configured

deployment.apps/checkout-redis unchanged

~$kubectl rollout status deployment/checkout \

-n checkout --timeout 180s

The podAffinity section ensures that a checkout-redis pod is already running on the node — this is because we can assume the checkout pod requires checkout-redis to run correctly. The podAntiAffinity section requires that no checkout pods are already running on the node by matching the app.kubernetes.io/component=service label. Now, let's scale up the deployment to check the configuration is working:

~$kubectl scale -n checkout deployment/checkout --replicas 2

Now validate where each pod is running:

~$kubectl get pods -n checkout \

-o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.nodeName}{"\n"}'

checkout-6c7c9cdf4f-p5p6q       ip-10-42-10-120.us-west-2.compute.internal

checkout-6c7c9cdf4f-wwkm4

checkout-redis-6cfd7d8787-gw59j ip-10-42-10-120.us-west-2.compute.internal

In this example, the first checkout pod runs on the same node as the existing checkout-redis pod, as it fulfills the podAffinity rule we set. The second one is still pending, because the podAntiAffinity rule we defined does not allow two checkout pods to get started on the same node. As the second node doesn't have a checkout-redis pod running, it will stay pending.

Next, we'll scale the checkout-redis to two instances for our two nodes, but first let's modify the checkout-redis deployment policy to spread out our checkout-redis instances across each node. To do this, we'll simply need to create a podAntiAffinity rule.

Kustomize Patch
Deployment/checkout-redis
Diff

~/environment/eks-workshop/modules/fundamentals/affinity/checkout-redis/checkout-redis.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-redis
  labels:
    app.kubernetes.io/created-by: eks-workshop
    app.kubernetes.io/team: database
spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app.kubernetes.io/component
                    operator: In
                    values:
                      - redis
              topologyKey: kubernetes.io/hostname

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/created-by: eks-workshop
    app.kubernetes.io/team: database
  name: checkout-redis
  namespace: checkout
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: redis
      app.kubernetes.io/instance: checkout
      app.kubernetes.io/name: checkout
  template:
    metadata:
      labels:
        app.kubernetes.io/component: redis
        app.kubernetes.io/created-by: eks-workshop
        app.kubernetes.io/instance: checkout
        app.kubernetes.io/name: checkout
        app.kubernetes.io/team: database
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app.kubernetes.io/component
                    operator: In
                    values:
                      - redis
              topologyKey: kubernetes.io/hostname
      containers:
        - image: public.ecr.aws/docker/library/redis:6.0-alpine
          imagePullPolicy: IfNotPresent
          name: redis
          ports:
            - containerPort: 6379
              name: redis
              protocol: TCP

         app.kubernetes.io/instance: checkout
         app.kubernetes.io/name: checkout
         app.kubernetes.io/team: database
     spec:
+      affinity:
+        podAntiAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            - labelSelector:
+                matchExpressions:
+                  - key: app.kubernetes.io/component
+                    operator: In
+                    values:
+                      - redis
+              topologyKey: kubernetes.io/hostname
       containers:
         - image: public.ecr.aws/docker/library/redis:6.0-alpine
           imagePullPolicy: IfNotPresent
           name: redis

In the above manifest, the podAntiAffinity section ensures:

Redis pods are distributed across different nodes.
This is enforced by preventing multiple pods with label app.kubernetes.io/component: redis from running on the same node.
The topologyKey: kubernetes.io/hostname ensures this rule applies at the node level.

Apply it with the following command:

~$kubectl delete -n checkout deployment checkout-redis

~$kubectl apply -k ~/environment/eks-workshop/modules/fundamentals/affinity/checkout-redis/

namespace/checkout unchanged

serviceaccount/checkout unchanged

configmap/checkout unchanged

service/checkout unchanged

service/checkout-redis unchanged

deployment.apps/checkout unchanged

deployment.apps/checkout-redis configured

~$kubectl rollout status deployment/checkout-redis \

-n checkout --timeout 180s

The podAntiAffinity section requires that no checkout-redis pods are already running on the node by matching the app.kubernetes.io/component=redis label.

~$kubectl scale -n checkout deployment/checkout-redis --replicas 2

Check the running pods to verify that there are now two of each running:

~$kubectl get pods -n checkout

NAME                             READY   STATUS    RESTARTS   AGE

checkout-5b68c8cddf-6ddwn        1/1     Running   0          4m14s

checkout-5b68c8cddf-rd7xf        1/1     Running   0          4m12s

checkout-redis-7979df659-cjfbf   1/1     Running   0          19s

checkout-redis-7979df659-pc6m9   1/1     Running   0          22s

We can also verify where the pods are running to ensure the podAffinity and podAntiAffinity policies are being followed:

~$kubectl get pods -n checkout \

-o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.nodeName}{"\n"}'

checkout-5b68c8cddf-bn8bp       ip-10-42-11-142.us-west-2.compute.internal

checkout-5b68c8cddf-clnps       ip-10-42-12-31.us-west-2.compute.internal

checkout-redis-7979df659-57xcb  ip-10-42-11-142.us-west-2.compute.internal

checkout-redis-7979df659-r7kkm  ip-10-42-12-31.us-west-2.compute.internal

All looks good on the pod scheduling, but we can further verify by scaling the checkout pod again to see where a third pod will deploy:

~$kubectl scale --replicas=3 deployment/checkout --namespace checkout

If we check the running pods we can see that the third checkout pod has been placed in a Pending state since two of the nodes already have a pod deployed and the third node does not have a checkout-redis pod running.

~$kubectl get pods -n checkout

NAME                             READY   STATUS    RESTARTS   AGE

checkout-5b68c8cddf-bn8bp        1/1     Running   0          4m59s

checkout-5b68c8cddf-clnps        1/1     Running   0          6m9s

checkout-5b68c8cddf-lb69n        0/1     Pending   0          6s

checkout-redis-7979df659-57xcb   1/1     Running   0          35s

checkout-redis-7979df659-r7kkm   1/1     Running   0          2m10s

Let's finish this section by removing the Pending pod:

~$kubectl scale --replicas=2 deployment/checkout --namespace checkout