EKS Horizontal Pod Autoscaling (HPA)
What is HPA?
Horizontal Pod Autoscaling automatically adjusts the number of pods in a deployment based on observed CPU/memory utilization or custom metrics.
1. Install Metrics Server
HPA needs metrics from the Metrics Server to make scaling decisions.
# Check if Metrics Server is installed
kubectl -n kube-system get deployment/metrics-server
# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
# Verify installation
kubectl get deployment metrics-server -n kube-system2. Deploy Sample Application
Create manifest file hpa-demo.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-demo-deployment
labels:
app: hpa-nginx
spec:
replicas: 1
selector:
matchLabels:
app: hpa-nginx
template:
metadata:
labels:
app: hpa-nginx
spec:
containers:
- name: hpa-nginx
image: dash04/oldapp:v1
ports:
- containerPort: 80
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "500Mi"
cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
name: hpa-demo-service-nginx
labels:
app: hpa-nginx
spec:
type: NodePort
selector:
app: hpa-nginx
ports:
- port: 80
targetPort: 80
nodePort: 31231Apply the configuration:
# Deploy Nginx application with Service
kubectl apply -f hpa-demo.yaml
# Verify deployment
kubectl get pod,svc,deploy
# Access application (if in public subnet)
# Get node IP: kubectl get nodes -o wide
# Access: http://<Worker-Node-Public-IP>:312313. Create HPA
Create an autoscaler targeting 50% CPU utilization, scaling between 1-10 pods:
# Create HPA
kubectl autoscale deployment hpa-demo-deployment --cpu-percent=50 --min=1 --max=10
# Check HPA
kubectl describe hpa/hpa-demo-deployment
kubectl get hpaHow it works:
- When CPU usage > 50% → increases pods (up to 10)
- When CPU usage < 50% → decreases pods (down to 1)
4. Generate Load to Trigger Scaling
# Run load test using Apache Bench
kubectl run apache-bench \
--image=httpd \
--restart=Never \
--rm -i --tty \
-- ab -n 1000000 -c 1000 http://hpa-demo-service-nginx.default.svc.cluster.local/
# Monitor HPA
kubectl get hpa hpa-demo-deployment -w
# Check pods scaling up
kubectl get pods
# Describe HPA for detailed metrics
kubectl describe hpa/hpa-demo-deploymentCommand Explanation:
kubectl run apache-bench: Creates a temporary pod--image=httpd: Uses Apache HTTPD image (containsabtool)--rm: Remove pod after completion-i --tty: Interactive terminalab: Apache Bench tool-n 1000000: 1 million requests-c 1000: 1000 concurrent connectionshttp://...: Target service URL
5. Scale Down (Cooldown)
After load stops, HPA waits 5 minutes (default) before scaling down to prevent rapid fluctuations.
6. Clean Up
# Delete HPA
kubectl delete hpa hpa-demo-deployment
# Delete application deployment
kubectl delete deploy hpa-demo-deploymentHPA Behavior Configuration (v1.18+)
You can customize scaling behavior using YAML:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo-deployment
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo-deployment
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5min before scaling down
policies:
- type: Percent
value: 100 # Remove all excess pods at once
periodSeconds: 15
scaleUp:
stabilizationWindowSeconds: 0 # Scale up immediately
policies:
- type: Percent
value: 100 # Double pods
periodSeconds: 15
- type: Pods
value: 4 # Or add 4 pods at minimum
periodSeconds: 15
selectPolicy: Max # Use most aggressive scaling optionKey Points:
- Metrics Server must be installed for CPU/memory-based scaling
- Default cooldown: 5 minutes for scale-down
- Scaling policies customizable in v1.18+
- HPA uses averaged metrics across all pods
- Supports custom metrics and multiple metrics