Blog

How we dynamically scale up and scale down the pods in Kubernetes using the Horizontal Pod Autoscaler

When we first deployed our backend onto the Linode Kubernetes Engine (LKE) with 3 nodes and fixed number of pods it was getting harder to scale up the Pods in k8s as the number of requests were increasing, so we had to look for a way to automatically scale up and down the pods based on the metrics like CPU metrics and we found our solution in HPA. In the next few paragraphs, We’ll discuss how to write the configuration and deploy to the kubernetes cluster.

When working with k8s you may want to dynamically scale up and down the pods based on certain metrics like CPU metrics, Memory metrics, etc. You can achieve this in Kubernetes(K8s) using the Horizontal Pod Autoscaler. The Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on CPU utilization or some other metrics.

In this blog, we are going to take a look at how we can create a deployment and create the Horizontal Pod Autoscaler to scale up and down the deployment. Let’s get started by creating a simple deployment for the nginx app:

nginx.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.9.2-alpine
        ports:
        - containerPort: 80
        resources:
          # You must specify requests for CPU to autoscale
          # based on CPU utilization
          requests:
            cpu: "250m"

When the above configuration is applied then kubernetes creates a deployment with 2 pods and each pod runs the nginx container and requests 250 milliCPUs each. Note that a value for the CPU requests is required for the HPA to work.

Let’s apply the configuration using the following command:

kubectl apply -f nginx.yaml

We can create an HPA either using the kubectl autoscale command or by kubectl apply command.

To create the HPA using the kubectl autoscale command run the following command.

kubectl autoscale deployment nginx --min=2 --max=10 --cpu-percent=50

Syntax:

kubectl autoscale (-f FILENAME | TYPE NAME | TYPE/NAME) [--min=MINPODS] --max=MAXPODS [--cpu-percent=CPU]

To create the HPA using the configuration file create a nginx-hpa.yaml file with the following code:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

The above kubectl autoscale and nginx-hpa.yaml file both creates an HPA that scales up and down the nginx pods and ensures that there are minimum of 2 replicas and maximum of  10 replicas when the average cpu utilization is greater than 50%. Also, note that HorizontalPodAutoscaler is available in the API versions autoscaling/v1 and autoscaling/v2beta2.

To get a list of HPA object run the following command:

kubectl get hpa -n default

To delete the HPA object run the following command:

kubectl delete hpa nginx -n default

Conclusion
This way we can scale up and scale down the pods in the kubernetes cluster based on the CPU metrics and make the full use of all the nodes in the k8s cluster. In the next blog post, I’ll discuss how to scale up and down the nodes in the k8s cluster on the Linode Kubernetes Engine using the Linode API in python.







In order to give you better service we use cookies. By continuing to use our website, you agree to the use of cookies as described in our Privacy Policy