This post is the third part of the series about Kubernetes for beginners. In the second part, I introduced the K8s and its basic components. In the second part, I discussed containers, pods, and deployments. This post will discuss services(SVC) and horizontal pod autoscaler(HPA).
In this short post, I will discuss how to expose your services to the outside world and scale your applications based on the load.
Exposing Deployments: Services
When you create a Deployment, you’ll get Pods running your app. So, start local K8s cluster and deploy an app:
kubectl create deployment nginx --image=nginx
The above command will create a Deployment named nginx
with a single Pod running the
nginx
image. You can check the status of Deployment and Pods:
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
nginx 1/1 1 1 4m8s
The above line tells you that Deployment nginx
has one Pod running and is up-to-date.
Now, let’s check Pods:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-676b6c5bbc-qzxsh 1/1 Running 0 6m22s
Now, add one more Pod:
$ kubectl run busybox --image busybox --restart=Never --rm -it -- sh
I’m adding a Pod named busybox
with the busybox
image. I want to use this Pod
to test the connection to the nginx
Pod. The above command creates a Pod, runs
a shell, and removes the Pod when you exit the shell. Before I test the connection,
open a new terminal and run the following:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 2m16s 10.42.0.63 lima-k3s <none> <none>
nginx-676b6c5bbc-qzxsh 1/1 Running 0 43m 10.42.0.60 lima-k3s <none> <none>
Using the output wide
option, you can see the IP addresses of the Pods. Now you
see each Pod has its IP address. Go back to the first terminal(busybox) and run:
$ wget -qO- http://10.42.0.60
You should get a response from the nginx
Pod. It should be the default Nginx page.
Now let’s kill the nginx Pod:
$ kubectl delete pod nginx-676b6c5bbc-qzxsh
pod "nginx-676b6c5bbc-qzxsh" deleted
If you run the wget
command again, you’ll get an error. The Pod is gone. Recheck Pods:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 8m49s 10.42.0.63 lima-k3s <none> <none>
nginx-676b6c5bbc-zbwl5 1/1 Running 0 2m4s 10.42.0.64 lima-k3s <none> <none>
Notice several things:
- The old Pod is gone(nginx-676b6c5bbc-qzxsh). A new Pod(nginx-676b6c5bbc-zbwl5) replaces the old one.
- The new Pod has a different IP address.
- Deployment ensures that the desired number of Pods is running. If a Pod dies, it creates a new one.
So, if I want to call nginx
again, I need to know its new IP. I need something
to abstract the Pod IP addresses. For this purpose, I’ll use K8s Services. K8s
Service is a K8s component(resource) for exposing one or more Pods. You can think
of it as a load balancer for your Pods. Let’s create a Service for the nginx
Deployment:
$ kubectl expose deployment nginx --port 80
service/nginx exposed
Explore services:
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 5d2h
nginx ClusterIP 10.43.38.173 <none> 80/TCP 30s
The above output shows two services: kubernetes
and nginx
. The kubernetes
Service is a default service for the K8s cluster. The nginx
service is the one
I created. It has a ClusterIP address and listens on port 80. The nginx
service
is a ClusterIP service. This means that the Service is accessible only from within
the cluster. There are other types of services like NodePort, LoadBalancer, and ExternalName.
But for now, I’ll stick with ClusterIP.
Now, try to access the ngnix
Pod using the Service:
$ wget -qO- http://10.43.38.173
Note: The IP address is the ClusterIP address of the
nginx
Service. You can not just copy-paste the IP address from the output above. You need to use the IP address of thenginx
Service you got from your local machine’skubectl get svc
command.
You should get the default Nginx page. The Service abstracts the Pod IP addresses.
Now, kill a nginx
Pod and try to access the Service:
$ kubectl delete pod nginx-676b6c5bbc-zbwl5
pod "nginx-676b6c5bbc-zbwl5" deleted
If you run the wget
command again, you should get the default Nginx page. The
Service ensures that the traffic is routed to the available Pods. When I killed
a Pod, the K8s immediately created a new one. The Service is still routing traffic
to the available Pods. The Service is a stable endpoint for your application.
Notice that if you delete and recreate the Service, the Service will get a new IP.
Again, you must use the new IP address to access the Service. However, the good
side is that inside the cluster, you can always use the Service’s name to access
it. For example, you can run the following command from the busybox
Pod:
$ wget -qO- http://nginx
This command will work even if you delete and recreate the Service. You can rely on this. The Service name is a stable endpoint for your application.
Horizontal Pod Autoscaler
The Horizontal Pod Autoscaler(HPA) is a K8s component that automatically scales the number of Pods in a Deployment based on some metrics. The HPA is a powerful tool for managing the load on your application. Let’s see how it works.
First, I need to have the metrics server on my K8s cluster. Let’s check:
$ kubectl get deployments -A
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
default nginx 1/1 1 1 38h
kube-system coredns 1/1 1 1 6d15h
kube-system local-path-provisioner 1/1 1 1 6d15h
kube-system metrics-server 1/1 1 1 6d15h
kube-system traefik 1/1 1 1 6d15h
The metrics-server
is running. It means that I can use the HPA. Now, let’s create
a HPA for the nginx
Deployment:
$ kubectl autoscale deployment nginx --max 5 --min 2 --cpu-percent 20
horizontalpodautoscaler.autoscaling/nginx autoscaled
The above command creates an HPA for the nginx
Deployment. Based on CPU usage,
the HPA will scale the number of Pods between 2 and 5. If the CPU usage is above 20%,
the HPA adds more Pods. If the CPU usage is below 20%, the HPA removes Pods.
Let’s see the HPA:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx cpu: <unknown>/20% 2 5 2 56s
The above output shows the HPA for the nginx
Deployment. The HPA targets CPU usage
and will scale the number of Pods between 2 and 5. The current number of Pods is 2.
The current CPU usage is unknown. The reason is that I did not have any CPU constraints.
Before I show how HPA works, I should put some constraints on the Pods. I’ll limit
the CPU usage of the nginx
Pods. I’ll set the CPU limit to 10m for the nginx
container:
$ kubectl set resources deployment nginx --limits=cpu=10m
deployment.apps/nginx resources updated
In K8s terms, 10m
means 10 millicores. It is a fraction of a CPU core. Now, let’s
generate some load on the nginx
Deployment(do this from busybox Pod):
$ while true; do wget -q -O- http://nginx; done
Let’s observe the HPA:
$ kubectl get hpa --watch
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx cpu: 15%/20% 2 5 2 13m
nginx Deployment/nginx cpu: 35%/20% 2 5 2 13m
nginx Deployment/nginx cpu: 95%/20% 2 5 4 14m
nginx Deployment/nginx cpu: 90%/20% 2 5 5 14m
nginx Deployment/nginx cpu: 77%/20% 2 5 5 14m
nginx Deployment/nginx cpu: 74%/20% 2 5 5 14m
Notice that the HPA is scaling the number of Pods based on the CPU usage. When the CPU usage is above 20%, and the HPA adds more Pods.
Stop the load generation; do Ctrl+C
in the busy box Pod. The HPA will scale down
the number of Pods:
$ kubectl get hpa --watch
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx cpu: 90%/20% 2 5 5 14m
nginx Deployment/nginx cpu: 77%/20% 2 5 5 14m
nginx Deployment/nginx cpu: 74%/20% 2 5 5 14m
nginx Deployment/nginx cpu: 78%/20% 2 5 5 15m
nginx Deployment/nginx cpu: 74%/20% 2 5 5 15m
nginx Deployment/nginx cpu: 70%/20% 2 5 5 15m
nginx Deployment/nginx cpu: 50%/20% 2 5 5 15m
nginx Deployment/nginx cpu: 6%/20% 2 5 5 16m
nginx Deployment/nginx cpu: 0%/20% 2 5 5 16m
nginx Deployment/nginx cpu: 0%/20% 2 5 5 20m
nginx Deployment/nginx cpu: 0%/20% 2 5 2 21m
The HPA scaled down the number of Pods. When the CPU usage is below 20%, the HPA removes Pods. As you can see, it took some time to scale down the Pods. The HPA has a cooldown period. It waits for some time before it scales down the Pods. The cooldown period is needed to prevent flapping. The HPA will not scale down the Pods immediately. It waits for some time to see if the CPU usage is stable. There is a cooldown period, or stabilization window, for scaling down and scaling up. The default cooldown for scaling down is 5 minutes. The default cooldown for scaling up is 0 minutes(immediate).
Wrapping Up
In this post, I showed you some basic K8s Services and Horizontal Pod Autoscaler
concepts. I did not get into details. I just scratched the surface. I used a lot of
kubectl
to interact with the K8s cluster. Knowing how to use kubectl
is essential
for working with K8s. It helps you to deploy, manage, and scale your application.
You can use it to debug and troubleshoot your application, too. That is something
that every developer needs to know. In the next post, I’ll go deeper into kubectl
and show you how you can use it to get more information about your application in K8s.
Stay tuned.