Kubernetes: Exposing Services and Horizontal Pod Autoscaler

This post is the third part of the series about Kubernetes for beginners. In the second part, I introduced the K8s and its basic components. In the second part, I discussed containers, pods, and deployments. This post will discuss services(SVC) and horizontal pod autoscaler(HPA).

In this short post, I will discuss how to expose your services to the outside world and scale your applications based on the load.

Exposing Deployments: Services

When you create a Deployment, you’ll get Pods running your app. So, start local K8s cluster and deploy an app:

kubectl create deployment nginx --image=nginx

The above command will create a Deployment named nginx with a single Pod running the nginx image. You can check the status of Deployment and Pods:

$ kubectl get deployments
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/1     1            1           4m8s

The above line tells you that Deployment nginx has one Pod running and is up-to-date. Now, let’s check Pods:

$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
nginx-676b6c5bbc-qzxsh   1/1     Running   0          6m22s

Now, add one more Pod:

$ kubectl run busybox --image busybox --restart=Never --rm -it -- sh

I’m adding a Pod named busybox with the busybox image. I want to use this Pod to test the connection to the nginx Pod. The above command creates a Pod, runs a shell, and removes the Pod when you exit the shell. Before I test the connection, open a new terminal and run the following:

$ kubectl get pods -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP           NODE       NOMINATED NODE   READINESS GATES
busybox                  1/1     Running   0          2m16s   10.42.0.63   lima-k3s   <none>           <none>
nginx-676b6c5bbc-qzxsh   1/1     Running   0          43m     10.42.0.60   lima-k3s   <none>           <none>

Using the output wide option, you can see the IP addresses of the Pods. Now you see each Pod has its IP address. Go back to the first terminal(busybox) and run:

$ wget -qO- http://10.42.0.60

You should get a response from the nginx Pod. It should be the default Nginx page. Now let’s kill the nginx Pod:

$ kubectl delete pod nginx-676b6c5bbc-qzxsh
pod "nginx-676b6c5bbc-qzxsh" deleted

If you run the wget command again, you’ll get an error. The Pod is gone. Recheck Pods:

$ kubectl get pods -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP           NODE       NOMINATED NODE   READINESS GATES
busybox                  1/1     Running   0          8m49s   10.42.0.63   lima-k3s   <none>           <none>
nginx-676b6c5bbc-zbwl5   1/1     Running   0          2m4s    10.42.0.64   lima-k3s   <none>           <none>

Notice several things:

The old Pod is gone(nginx-676b6c5bbc-qzxsh). A new Pod(nginx-676b6c5bbc-zbwl5) replaces the old one.
The new Pod has a different IP address.
Deployment ensures that the desired number of Pods is running. If a Pod dies, it creates a new one.

So, if I want to call nginx again, I need to know its new IP. I need something to abstract the Pod IP addresses. For this purpose, I’ll use K8s Services. K8s Service is a K8s component(resource) for exposing one or more Pods. You can think of it as a load balancer for your Pods. Let’s create a Service for the nginx Deployment:

$ kubectl expose deployment nginx --port 80
service/nginx exposed

Explore services:

kubectl get svc
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.43.0.1      <none>        443/TCP   5d2h
nginx        ClusterIP   10.43.38.173   <none>        80/TCP    30s

The above output shows two services: kubernetes and nginx. The kubernetes Service is a default service for the K8s cluster. The nginx service is the one I created. It has a ClusterIP address and listens on port 80. The nginx service is a ClusterIP service. This means that the Service is accessible only from within the cluster. There are other types of services like NodePort, LoadBalancer, and ExternalName. But for now, I’ll stick with ClusterIP.

Now, try to access the ngnix Pod using the Service:

$ wget -qO- http://10.43.38.173

Note: The IP address is the ClusterIP address of the nginx Service. You can not just copy-paste the IP address from the output above. You need to use the IP address of the nginx Service you got from your local machine’s kubectl get svc command.

You should get the default Nginx page. The Service abstracts the Pod IP addresses. Now, kill a nginx Pod and try to access the Service:

$ kubectl delete pod nginx-676b6c5bbc-zbwl5
pod "nginx-676b6c5bbc-zbwl5" deleted

If you run the wget command again, you should get the default Nginx page. The Service ensures that the traffic is routed to the available Pods. When I killed a Pod, the K8s immediately created a new one. The Service is still routing traffic to the available Pods. The Service is a stable endpoint for your application.

Notice that if you delete and recreate the Service, the Service will get a new IP. Again, you must use the new IP address to access the Service. However, the good side is that inside the cluster, you can always use the Service’s name to access it. For example, you can run the following command from the busybox Pod:

$ wget -qO- http://nginx

This command will work even if you delete and recreate the Service. You can rely on this. The Service name is a stable endpoint for your application.

Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler(HPA) is a K8s component that automatically scales the number of Pods in a Deployment based on some metrics. The HPA is a powerful tool for managing the load on your application. Let’s see how it works.

First, I need to have the metrics server on my K8s cluster. Let’s check:

$ kubectl get deployments -A
NAMESPACE     NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
default       nginx                    1/1     1            1           38h
kube-system   coredns                  1/1     1            1           6d15h
kube-system   local-path-provisioner   1/1     1            1           6d15h
kube-system   metrics-server           1/1     1            1           6d15h
kube-system   traefik                  1/1     1            1           6d15h

The metrics-server is running. It means that I can use the HPA. Now, let’s create a HPA for the nginx Deployment:

$ kubectl autoscale deployment nginx --max 5 --min 2 --cpu-percent 20
horizontalpodautoscaler.autoscaling/nginx autoscaled

The above command creates an HPA for the nginx Deployment. Based on CPU usage, the HPA will scale the number of Pods between 2 and 5. If the CPU usage is above 20%, the HPA adds more Pods. If the CPU usage is below 20%, the HPA removes Pods.

Let’s see the HPA:

kubectl get hpa
NAME    REFERENCE          TARGETS              MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   cpu: <unknown>/20%   2         5         2          56s

The above output shows the HPA for the nginx Deployment. The HPA targets CPU usage and will scale the number of Pods between 2 and 5. The current number of Pods is 2.

The current CPU usage is unknown. The reason is that I did not have any CPU constraints. Before I show how HPA works, I should put some constraints on the Pods. I’ll limit the CPU usage of the nginx Pods. I’ll set the CPU limit to 10m for the nginx container:

$ kubectl set resources deployment nginx --limits=cpu=10m
deployment.apps/nginx resources updated

In K8s terms, 10m means 10 millicores. It is a fraction of a CPU core. Now, let’s generate some load on the nginx Deployment(do this from busybox Pod):

$ while true; do wget -q -O- http://nginx; done

Let’s observe the HPA:

$ kubectl get hpa --watch
NAME    REFERENCE          TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   cpu: 15%/20%   2         5         2          13m
nginx   Deployment/nginx   cpu: 35%/20%   2         5         2          13m
nginx   Deployment/nginx   cpu: 95%/20%   2         5         4          14m
nginx   Deployment/nginx   cpu: 90%/20%   2         5         5          14m
nginx   Deployment/nginx   cpu: 77%/20%   2         5         5          14m
nginx   Deployment/nginx   cpu: 74%/20%   2         5         5          14m

Notice that the HPA is scaling the number of Pods based on the CPU usage. When the CPU usage is above 20%, and the HPA adds more Pods.

Stop the load generation; do Ctrl+C in the busy box Pod. The HPA will scale down the number of Pods:

$ kubectl get hpa --watch
NAME    REFERENCE          TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   cpu: 90%/20%   2         5         5          14m
nginx   Deployment/nginx   cpu: 77%/20%   2         5         5          14m
nginx   Deployment/nginx   cpu: 74%/20%   2         5         5          14m
nginx   Deployment/nginx   cpu: 78%/20%   2         5         5          15m
nginx   Deployment/nginx   cpu: 74%/20%   2         5         5          15m
nginx   Deployment/nginx   cpu: 70%/20%   2         5         5          15m
nginx   Deployment/nginx   cpu: 50%/20%   2         5         5          15m
nginx   Deployment/nginx   cpu: 6%/20%    2         5         5          16m
nginx   Deployment/nginx   cpu: 0%/20%    2         5         5          16m
nginx   Deployment/nginx   cpu: 0%/20%    2         5         5          20m
nginx   Deployment/nginx   cpu: 0%/20%    2         5         2          21m

The HPA scaled down the number of Pods. When the CPU usage is below 20%, the HPA removes Pods. As you can see, it took some time to scale down the Pods. The HPA has a cooldown period. It waits for some time before it scales down the Pods. The cooldown period is needed to prevent flapping. The HPA will not scale down the Pods immediately. It waits for some time to see if the CPU usage is stable. There is a cooldown period, or stabilization window, for scaling down and scaling up. The default cooldown for scaling down is 5 minutes. The default cooldown for scaling up is 0 minutes(immediate).

Wrapping Up

In this post, I showed you some basic K8s Services and Horizontal Pod Autoscaler concepts. I did not get into details. I just scratched the surface. I used a lot of kubectl to interact with the K8s cluster. Knowing how to use kubectl is essential for working with K8s. It helps you to deploy, manage, and scale your application. You can use it to debug and troubleshoot your application, too. That is something that every developer needs to know. In the next post, I’ll go deeper into kubectl and show you how you can use it to get more information about your application in K8s. Stay tuned.

Exposing Deployments: Services#

Horizontal Pod Autoscaler#

Wrapping Up#

Exposing Deployments: Services

Horizontal Pod Autoscaler

Wrapping Up