How to horizontally autoscale pods in Kubernetes


In this article I demonstrate how to set up an autoscaler to scale up the pods when the CPU usage exceeds a certain threshold and back down again.

One of the many wonderful features of Kubernetes/OpenShift distribution as implemented at Safe Swiss Cloud is the HPA Horizontal Pod Autoscaler. As the name suggests, HPA will automatically spin up or spin down pods for you when a given CPU or memory load threshold is crossed.

Goal

To demonstrate by example how the HPA in OpenShift/Kubernetes can scale the number of application pods from 1 to 3 replicas as load increases and back down to 1 again as the load decreases.

Implementation

Here is an example of how to set up an autoscaler to scale up the pods when the CPU usage exceeds a certain threshold. I will demonstrate doing this using the graphical user interface of OpenShift 4.5. The same can be achieved using the CLI – this method is described in the official documentation https://docs.openshift.com/container-platform/4.5/nodes/pods/nodes-pods-autoscaling.html.

  1. Deploy a suitable Pod for our Test 

    First off we need to deploy some sort of pod that we can use to test our autoscaler. In this example I went for the OpenShift example https://github.com/sclorg/django-ex since this exposes a web server and external route, which is easy to load up with URL requests. 
    The single Django pod before autoscalingFigure 1: The single Django pod before autoscaling

  2. Add Metrics to the Deployment YAML

    Go to Administrator -> Deployments and edit the YAML of your deployment and search for the resources parameter. By default, this parameter is empty i.e. resources: {}. You need to remove the braces and add a value for the CPU as shown in Figure2. The 200m here is just an arbitrary starting value. If you don’t do this, the autoscaler will ignore our pods since it will not be able to fetch the CPU metrics from the running pods.  
    Adding the cpu: 200m reservation within the Build objectFigure 2: Adding the cpu: 200m reservation within the Build object

  3. Create the Horizontal Pod Autoscaler

    Go to Administrator -> Workloads -> Horizontal Pod Autoscalers and select Create Horizontal Pod Autoscaler and edit the resulting YAML. In the spec block, replace name with that of your deployment i.e. django-ex-git. For the purposes of our test, you can reduce targetAverageUtilization from 50 to 2 so the autoscaler will scale the pods up to a maximum of 3 replicas as soon as the pod CPU load exceeds just 2% rather than 50%.
    Editing the HorizontalPodAutoscaler objectFigure 3: Editing the HorizontalPodAutoscaler object

    Once you have saved your HPA, you should end up with an active autoscaler object. 
    The just created HorizontalPodAutoscaler objectFigure 4: The just created HorizontalPodAutoscaler object

    Checking the Conditions of the autoscaler object you should see these two conditions in Figure 5 below, both with a status of True. If you forgot to define the resources in Figure 2., you will see a message under Reason saying that the metrics values could not be read instead of the ValidMetric Found message.
    Part of the HorizontalPodAutoscaler object displayFigure 5: Part of the HorizontalPodAutoscaler object display

  4. Load up the Pod to make it Autoscale

    Now it’s time to put some load on our Django pod so HPA has a chance to autoscale. To do this, you can run a simple request loop from your laptop or desktop PC.

    $ for((i=0;i<500;i++)) do curl --connect-timeout 3 'http://django-ex-git-<your project>.apps.<your domain>'; done;

    Now the magic will happen.  After a short wait, the pods will scale from 1 to 3 and then back again to 1 once the load has been removed i.e. some time after the 500 script requests have completed.
    The Django application has now been scaled to 3 pods by the autoscalerFigure 6: The Django application has now been scaled to 3 pods by the autoscaler

    Looking at the in-built monitoring metrics for the pods, you can see that the green area below represents the initial pod and the two shades of blue, the two new pods that were spun up and stopped again. In order to reduce “thrashing”, there is a default 5 minute cool-down period imposed by Kubernetes when down-scaling (see https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/).

    In-built pod metrics showing the spin up / down of the extra two pods
    Figure 7: In-built pod metrics showing the spin up / down of the extra two pods

Summary

After you create a horizontal pod autoscaler, OpenShift begins to query the CPU and/or memory resource metrics on the pods. When these metrics are available, the horizontal pod autoscaler computes the ratio of the current metric utilization with the desired metric utilization, and scales up or down accordingly. The query and scaling occurs at a regular interval, but can take one to two minutes before metrics become available.

Kubernetes at Safe Swiss Cloud

Learn more about the Kubernetes/OpenShift distribution as implemented at Safe Swiss Cloud.


References:

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Please Note:
You may use one of these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>