Pod Affinity and Anti-Affinity
Pods can be constrained to run on specific nodes or under specific circumstances. This can include cases where you want only one application pod running per node or want pods to be paired together on a node. Additionally, when using node affinity pods can have preferred or mandatory restrictions.
For this lesson, we'll focus on inter-pod affinity and anti-affinity by scheduling the checkout-redis pods to run only one instance per node and by scheduling the checkout pods to only run one instance of it on nodes where a checkout-redis pod exists. This will ensure that our caching pods (checkout-redis) run locally with a checkout pod instance for best performance.
The first thing we want to do is see that the checkout and checkout-redis pods are running:
NAME READY STATUS RESTARTS AGE
checkout-698856df4d-vzkzw 1/1 Running 0 125m
checkout-redis-6cfd7d8787-kxs8r 1/1 Running 0 127m
We can see both applications have one pod running in the cluster. Now, let's find out where they are running:
checkout-698856df4d-vzkzw ip-10-42-11-142.us-west-2.compute.internal
checkout-redis-6cfd7d8787-kxs8r ip-10-42-10-225.us-west-2.compute.internal
Based on the results above, the checkout-698856df4d-vzkzw pod is running on the ip-10-42-11-142.us-west-2.compute.internal node and the checkout-redis-6cfd7d8787-kxs8r pod is running on the ip-10-42-10-225.us-west-2.compute.internal node.
In your environment the pods may be running on the same node initially
Let's set up a podAffinity and podAntiAffinity policy in the checkout deployment to ensure that one checkout pod runs per node, and that it will only run on nodes where a checkout-redis pod is already running. We'll use the requiredDuringSchedulingIgnoredDuringExecution to make this a requirement, rather than a preferred behavior.
The following kustomization adds an affinity section to the checkout deployment specifying both podAffinity and podAntiAffinity policies:
- Kustomize Patch
- Deployment/checkout
- Diff
apiVersion: apps/v1
kind: Deployment
metadata:
name: checkout
namespace: checkout
spec:
template:
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- redis
topologyKey: kubernetes.io/hostname
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- service
- key: app.kubernetes.io/instance
operator: In
values:
- checkout
topologyKey: kubernetes.io/hostname
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/type: app
name: checkout
namespace: checkout
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: service
app.kubernetes.io/instance: checkout
app.kubernetes.io/name: checkout
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
labels:
app.kubernetes.io/component: service
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/instance: checkout
app.kubernetes.io/name: checkout
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- redis
topologyKey: kubernetes.io/hostname
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- service
- key: app.kubernetes.io/instance
operator: In
values:
- checkout
topologyKey: kubernetes.io/hostname
containers:
- envFrom:
- configMapRef:
name: checkout
image: public.ecr.aws/aws-containers/retail-store-sample-checkout:1.2.1
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 3
name: checkout
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
memory: 512Mi
requests:
cpu: 250m
memory: 512Mi
securityContext:
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
volumeMounts:
- mountPath: /tmp
name: tmp-volume
securityContext:
fsGroup: 1000
serviceAccountName: checkout
volumes:
- emptyDir:
medium: Memory
name: tmp-volume
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/instance: checkout
app.kubernetes.io/name: checkout
spec:
+ affinity:
+ podAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ - labelSelector:
+ matchExpressions:
+ - key: app.kubernetes.io/component
+ operator: In
+ values:
+ - redis
+ topologyKey: kubernetes.io/hostname
+ podAntiAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ - labelSelector:
+ matchExpressions:
+ - key: app.kubernetes.io/component
+ operator: In
+ values:
+ - service
+ - key: app.kubernetes.io/instance
+ operator: In
+ values:
+ - checkout
+ topologyKey: kubernetes.io/hostname
containers:
- envFrom:
- configMapRef:
name: checkout
In the above manifest, the podAffinity section ensures:
- Checkout pods will only be scheduled on nodes where Redis pods are running.
- This is enforced by matching pods with label
app.kubernetes.io/component: redis. - The
topologyKey: kubernetes.io/hostnameensures this rule applies at the node level.
The podAntiAffinity section ensures:
- Only one checkout pod runs per node.
- This is achieved by preventing pods with labels
app.kubernetes.io/component: serviceandapp.kubernetes.io/instance: checkoutfrom running on the same node.
To make the change, run the following command to modify the checkout deployment in your cluster:
namespace/checkout unchanged
serviceaccount/checkout unchanged
configmap/checkout unchanged
service/checkout unchanged
service/checkout-redis unchanged
deployment.apps/checkout configured
deployment.apps/checkout-redis unchanged
The podAffinity section ensures that a checkout-redis pod is already running on the node — this is because we can assume the checkout pod requires checkout-redis to run correctly. The podAntiAffinity section requires that no checkout pods are already running on the node by matching the app.kubernetes.io/component=service label. Now, let's scale up the deployment to check the configuration is working:
Now validate where each pod is running:
checkout-6c7c9cdf4f-p5p6q ip-10-42-10-120.us-west-2.compute.internal
checkout-6c7c9cdf4f-wwkm4
checkout-redis-6cfd7d8787-gw59j ip-10-42-10-120.us-west-2.compute.internal
In this example, the first checkout pod runs on the same node as the existing checkout-redis pod, as it fulfills the podAffinity rule we set. The second one is still pending, because the podAntiAffinity rule we defined does not allow two checkout pods to get started on the same node. As the second node doesn't have a checkout-redis pod running, it will stay pending.
Next, we'll scale the checkout-redis to two instances for our two nodes, but first let's modify the checkout-redis deployment policy to spread out our checkout-redis instances across each node. To do this, we'll simply need to create a podAntiAffinity rule.
- Kustomize Patch
- Deployment/checkout-redis
- Diff
apiVersion: apps/v1
kind: Deployment
metadata:
name: checkout-redis
labels:
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/team: database
spec:
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- redis
topologyKey: kubernetes.io/hostname
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/team: database
name: checkout-redis
namespace: checkout
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: redis
app.kubernetes.io/instance: checkout
app.kubernetes.io/name: checkout
template:
metadata:
labels:
app.kubernetes.io/component: redis
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/instance: checkout
app.kubernetes.io/name: checkout
app.kubernetes.io/team: database
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- redis
topologyKey: kubernetes.io/hostname
containers:
- image: public.ecr.aws/docker/library/redis:6.0-alpine
imagePullPolicy: IfNotPresent
name: redis
ports:
- containerPort: 6379
name: redis
protocol: TCP
app.kubernetes.io/instance: checkout
app.kubernetes.io/name: checkout
app.kubernetes.io/team: database
spec:
+ affinity:
+ podAntiAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ - labelSelector:
+ matchExpressions:
+ - key: app.kubernetes.io/component
+ operator: In
+ values:
+ - redis
+ topologyKey: kubernetes.io/hostname
containers:
- image: public.ecr.aws/docker/library/redis:6.0-alpine
imagePullPolicy: IfNotPresent
name: redis
In the above manifest, the podAntiAffinity section ensures:
- Redis pods are distributed across different nodes.
- This is enforced by preventing multiple pods with label
app.kubernetes.io/component: redisfrom running on the same node. - The
topologyKey: kubernetes.io/hostnameensures this rule applies at the node level.
Apply it with the following command:
namespace/checkout unchanged
serviceaccount/checkout unchanged
configmap/checkout unchanged
service/checkout unchanged
service/checkout-redis unchanged
deployment.apps/checkout unchanged
deployment.apps/checkout-redis configured
The podAntiAffinity section requires that no checkout-redis pods are already running on the node by matching the app.kubernetes.io/component=redis label.
Check the running pods to verify that there are now two of each running:
NAME READY STATUS RESTARTS AGE
checkout-5b68c8cddf-6ddwn 1/1 Running 0 4m14s
checkout-5b68c8cddf-rd7xf 1/1 Running 0 4m12s
checkout-redis-7979df659-cjfbf 1/1 Running 0 19s
checkout-redis-7979df659-pc6m9 1/1 Running 0 22s
We can also verify where the pods are running to ensure the podAffinity and podAntiAffinity policies are being followed:
checkout-5b68c8cddf-bn8bp ip-10-42-11-142.us-west-2.compute.internal
checkout-5b68c8cddf-clnps ip-10-42-12-31.us-west-2.compute.internal
checkout-redis-7979df659-57xcb ip-10-42-11-142.us-west-2.compute.internal
checkout-redis-7979df659-r7kkm ip-10-42-12-31.us-west-2.compute.internal
All looks good on the pod scheduling, but we can further verify by scaling the checkout pod again to see where a third pod will deploy:
If we check the running pods we can see that the third checkout pod has been placed in a Pending state since two of the nodes already have a pod deployed and the third node does not have a checkout-redis pod running.
NAME READY STATUS RESTARTS AGE
checkout-5b68c8cddf-bn8bp 1/1 Running 0 4m59s
checkout-5b68c8cddf-clnps 1/1 Running 0 6m9s
checkout-5b68c8cddf-lb69n 0/1 Pending 0 6s
checkout-redis-7979df659-57xcb 1/1 Running 0 35s
checkout-redis-7979df659-r7kkm 1/1 Running 0 2m10s
Let's finish this section by removing the Pending pod: