This post is the third part of the series about Kubernetes for beginners. In the second part, I introduced the K8s and its basic components. In the second part, I discussed containers, pods, and deployments. This post will discuss services(SVC) and horizontal pod autoscaler(HPA).
In this short post, I will discuss how to expose your services to the outside world and scale your applications based on the load.
Exposing Deployments: Services
When you create a Deployment, you’ll get Pods running your app. So, start local K8s cluster and deploy an app:
kubectl create deployment nginx --image=nginx
The above command will create a Deployment named nginx with a single Pod running the
nginx image. You can check the status of Deployment and Pods:
$ kubectl get deployments
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/1     1            1           4m8s
The above line tells you that Deployment nginx has one Pod running and is up-to-date.
Now, let’s check Pods:
$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
nginx-676b6c5bbc-qzxsh   1/1     Running   0          6m22s
Now, add one more Pod:
$ kubectl run busybox --image busybox --restart=Never --rm -it -- sh
I’m adding a Pod named busybox with the busybox image. I want to use this Pod
to test the connection to the nginx Pod. The above command creates a Pod, runs
a shell, and removes the Pod when you exit the shell. Before I test the connection,
open a new terminal and run the following:
$ kubectl get pods -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP           NODE       NOMINATED NODE   READINESS GATES
busybox                  1/1     Running   0          2m16s   10.42.0.63   lima-k3s   <none>           <none>
nginx-676b6c5bbc-qzxsh   1/1     Running   0          43m     10.42.0.60   lima-k3s   <none>           <none>
Using the output wide option, you can see the IP addresses of the Pods. Now you
see each Pod has its IP address. Go back to the first terminal(busybox) and run:
$ wget -qO- http://10.42.0.60
You should get a response from the nginx Pod. It should be the default Nginx page.
Now let’s kill the nginx Pod:
$ kubectl delete pod nginx-676b6c5bbc-qzxsh
pod "nginx-676b6c5bbc-qzxsh" deleted
If you run the wget command again, you’ll get an error. The Pod is gone. Recheck Pods:
$ kubectl get pods -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP           NODE       NOMINATED NODE   READINESS GATES
busybox                  1/1     Running   0          8m49s   10.42.0.63   lima-k3s   <none>           <none>
nginx-676b6c5bbc-zbwl5   1/1     Running   0          2m4s    10.42.0.64   lima-k3s   <none>           <none>
Notice several things:
- The old Pod is gone(nginx-676b6c5bbc-qzxsh). A new Pod(nginx-676b6c5bbc-zbwl5) replaces the old one.
 - The new Pod has a different IP address.
 - Deployment ensures that the desired number of Pods is running. If a Pod dies, it creates a new one.
 
So, if I want to call nginx again, I need to know its new IP. I need something
to abstract the Pod IP addresses. For this purpose, I’ll use K8s Services. K8s
Service is a K8s component(resource) for exposing one or more Pods. You can think
of it as a load balancer for your Pods. Let’s create a Service for the nginx Deployment:
$ kubectl expose deployment nginx --port 80
service/nginx exposed
Explore services:
kubectl get svc
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.43.0.1      <none>        443/TCP   5d2h
nginx        ClusterIP   10.43.38.173   <none>        80/TCP    30s
The above output shows two services: kubernetes and nginx. The kubernetes
Service is a default service for the K8s cluster. The nginx service is the one
I created. It has a ClusterIP address and listens on port 80. The nginx service
is a ClusterIP service. This means that the Service is accessible only from within
the cluster. There are other types of services like NodePort, LoadBalancer, and ExternalName.
But for now, I’ll stick with ClusterIP.
Now, try to access the ngnix Pod using the Service:
$ wget -qO- http://10.43.38.173
Note: The IP address is the ClusterIP address of the
nginxService. You can not just copy-paste the IP address from the output above. You need to use the IP address of thenginxService you got from your local machine’skubectl get svccommand.
You should get the default Nginx page. The Service abstracts the Pod IP addresses.
Now, kill a nginx Pod and try to access the Service:
$ kubectl delete pod nginx-676b6c5bbc-zbwl5
pod "nginx-676b6c5bbc-zbwl5" deleted
If you run the wget command again, you should get the default Nginx page. The
Service ensures that the traffic is routed to the available Pods. When I killed
a Pod, the K8s immediately created a new one. The Service is still routing traffic
to the available Pods. The Service is a stable endpoint for your application.
Notice that if you delete and recreate the Service, the Service will get a new IP.
Again, you must use the new IP address to access the Service. However, the good
side is that inside the cluster, you can always use the Service’s name to access
it. For example, you can run the following command from the busybox Pod:
$ wget -qO- http://nginx
This command will work even if you delete and recreate the Service. You can rely on this. The Service name is a stable endpoint for your application.
Horizontal Pod Autoscaler
The Horizontal Pod Autoscaler(HPA) is a K8s component that automatically scales the number of Pods in a Deployment based on some metrics. The HPA is a powerful tool for managing the load on your application. Let’s see how it works.
First, I need to have the metrics server on my K8s cluster. Let’s check:
$ kubectl get deployments -A
NAMESPACE     NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
default       nginx                    1/1     1            1           38h
kube-system   coredns                  1/1     1            1           6d15h
kube-system   local-path-provisioner   1/1     1            1           6d15h
kube-system   metrics-server           1/1     1            1           6d15h
kube-system   traefik                  1/1     1            1           6d15h
The metrics-server is running. It means that I can use the HPA. Now, let’s create
a HPA for the nginx Deployment:
$ kubectl autoscale deployment nginx --max 5 --min 2 --cpu-percent 20
horizontalpodautoscaler.autoscaling/nginx autoscaled
The above command creates an HPA for the nginx Deployment. Based on CPU usage,
the HPA will scale the number of Pods between 2 and 5. If the CPU usage is above 20%,
the HPA adds more Pods. If the CPU usage is below 20%, the HPA removes Pods.
Let’s see the HPA:
kubectl get hpa
NAME    REFERENCE          TARGETS              MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   cpu: <unknown>/20%   2         5         2          56s
The above output shows the HPA for the nginx Deployment. The HPA targets CPU usage
and will scale the number of Pods between 2 and 5. The current number of Pods is 2.
The current CPU usage is unknown. The reason is that I did not have any CPU constraints.
Before I show how HPA works, I should put some constraints on the Pods. I’ll limit
the CPU usage of the nginx Pods. I’ll set the CPU limit to 10m for the nginx container:
$ kubectl set resources deployment nginx --limits=cpu=10m
deployment.apps/nginx resources updated
In K8s terms, 10m means 10 millicores. It is a fraction of a CPU core. Now, let’s
generate some load on the nginx Deployment(do this from busybox Pod):
$ while true; do wget -q -O- http://nginx; done
Let’s observe the HPA:
$ kubectl get hpa --watch
NAME    REFERENCE          TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   cpu: 15%/20%   2         5         2          13m
nginx   Deployment/nginx   cpu: 35%/20%   2         5         2          13m
nginx   Deployment/nginx   cpu: 95%/20%   2         5         4          14m
nginx   Deployment/nginx   cpu: 90%/20%   2         5         5          14m
nginx   Deployment/nginx   cpu: 77%/20%   2         5         5          14m
nginx   Deployment/nginx   cpu: 74%/20%   2         5         5          14m
Notice that the HPA is scaling the number of Pods based on the CPU usage. When the CPU usage is above 20%, and the HPA adds more Pods.
Stop the load generation; do Ctrl+C in the busy box Pod. The HPA will scale down
the number of Pods:
$ kubectl get hpa --watch
NAME    REFERENCE          TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   cpu: 90%/20%   2         5         5          14m
nginx   Deployment/nginx   cpu: 77%/20%   2         5         5          14m
nginx   Deployment/nginx   cpu: 74%/20%   2         5         5          14m
nginx   Deployment/nginx   cpu: 78%/20%   2         5         5          15m
nginx   Deployment/nginx   cpu: 74%/20%   2         5         5          15m
nginx   Deployment/nginx   cpu: 70%/20%   2         5         5          15m
nginx   Deployment/nginx   cpu: 50%/20%   2         5         5          15m
nginx   Deployment/nginx   cpu: 6%/20%    2         5         5          16m
nginx   Deployment/nginx   cpu: 0%/20%    2         5         5          16m
nginx   Deployment/nginx   cpu: 0%/20%    2         5         5          20m
nginx   Deployment/nginx   cpu: 0%/20%    2         5         2          21m
The HPA scaled down the number of Pods. When the CPU usage is below 20%, the HPA removes Pods. As you can see, it took some time to scale down the Pods. The HPA has a cooldown period. It waits for some time before it scales down the Pods. The cooldown period is needed to prevent flapping. The HPA will not scale down the Pods immediately. It waits for some time to see if the CPU usage is stable. There is a cooldown period, or stabilization window, for scaling down and scaling up. The default cooldown for scaling down is 5 minutes. The default cooldown for scaling up is 0 minutes(immediate).
Wrapping Up
In this post, I showed you some basic K8s Services and Horizontal Pod Autoscaler
concepts. I did not get into details. I just scratched the surface. I used a lot of
kubectl to interact with the K8s cluster. Knowing how to use kubectl is essential
for working with K8s. It helps you to deploy, manage, and scale your application.
You can use it to debug and troubleshoot your application, too. That is something
that every developer needs to know. In the next post, I’ll go deeper into kubectl
and show you how you can use it to get more information about your application in K8s.
Stay tuned.
