Skip to main content
Version: v0.11 🚧

Using KusionStack Operating to operate Pods gracefully

Applications always provide its service along with traffic routing. On Kubernetes, they should be a set of Pods and a corresponding Kubernetes Service resource to expose the service.

However, during operations such as updating Pod revisions, there is a risk that client request traffic may be lost. This can lead to a poor user experience for developers.

This tutorial will demonstrate how to operate Pods gracefully in a KusionStack Operating way on Aliyun ACK with SLB as a Service backend provider.

You can also get the same point from this video, which shows the same case using both KusionStack Kusion and Operating. The sample used in this video can be found from KusionStack Catalog.

Preparing

First, ensure that you have an Aliyun ACK Kubernetes cluster set up in order to provision an Aliyun SLB.

Next, install KusionStack Operating on this Kubernetes cluster following installation doc.

Get started

Create a new namespace

To begin, create a new namespace for this tutorial:

$ kubectl create ns operating-tutorial

Provision Pods and Services

You can create a set of Pods to run up a demo application service by creating CollaSet resource using following command:

echo '
apiVersion: apps.kusionstack.io/v1alpha1
kind: CollaSet
metadata:
name: server
spec:
replicas: 3
selector:
matchLabels:
app: server
template:
metadata:
labels:
app: server
spec:
containers:
- image: wu8685/echo:1.3
name: server
command:
- /server
resources:
limits:
cpu: "0.1"
ephemeral-storage: 100Mi
memory: 100Mi
requests:
cpu: "0.1"
ephemeral-storage: 100Mi
memory: 100Mi
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 3
' | kubectl -n operating-tutorial apply -f -

There should be 3 Pods created.

$ kubectl -n operating-tutorial get pod
NAME READY STATUS RESTARTS AGE
server-c5lsr 1/1 Running 0 2m23s
server-p6wrx 1/1 Running 0 2m23s
server-zn62c 1/1 Running 0 2m23s

Then create a Kubernetes Service by running following command, which will provision Aliyun SLB to expose service.

echo '
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-spec: slb.s1.small
service.beta.kubernetes.io/backend-type: eni
labels:
kusionstack.io/control: "true" # this label is required
name: server
spec:
ports:
- port: 80
protocol: TCP
targetPort: 8080
selector:
app: server
type: LoadBalancer
' | kubectl -n operating-tutorial apply -f -

A service with external IP should be provisioned.

$ kubectl -n operating-tutorial get svc server
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
server LoadBalancer 192.168.225.55 47.101.49.182 80:30146/TCP 51s

The label kusionstack.io/control: "true" on Service is very important. It means this service resource will be recognized by ResourceConsist framework, and then participate in PodOpsLifecycle to control the Aliyun SLB to switch off traffic before updating each Pod and switch on traffic after it finished, in order to protect the service.

Provision a client

Then we will provision a client to access the service we created before. Please replace <EXTERNAL_IP> in the following CollaSet yaml with the external IP from Kubernetes Service created above, and apply again.

echo '
apiVersion: apps.kusionstack.io/v1alpha1
kind: CollaSet
metadata:
name: client
spec:
replicas: 1
selector:
matchLabels:
app: client
template:
metadata:
labels:
app: client
spec:
containers:
- image: wu8685/echo:1.3
name: nginx
command:
- /client
args:
- -url
- http://<EXTERNAL_IP>/echo # EXTERNAL_IP should be replaced
- -m
- POST
- d
- operating-tutorial
- -qps
- "10"
- -worker
- "10"
- -timeout
- "10000"
resources:
limits:
cpu: "0.1"
ephemeral-storage: 1Gi
memory: 100Mi
requests:
cpu: "0.1"
ephemeral-storage: 1Gi
memory: 100Mi
' | kubectl -n operating-tutorial apply -f -

A client Pod should be created.

$ kubectl -n operating-tutorial get pod
NAME READY STATUS RESTARTS AGE
client-nc426 1/1 Running 0 30s
server-c5lsr 1/1 Running 0 19m
server-p6wrx 1/1 Running 0 19m
server-zn62c 1/1 Running 0 19m

This client will continuously access the service using the configuration provided in the command. You can monitor the response codes from its logs:

kubectl -n operating-tutorial logs -f client-nc426
worker-0 another loop, request: 50, failed: 0
worker-1 another loop, request: 50, failed: 0
worker-0 another loop, request: 50, failed: 0
worker-1 another loop, request: 50, failed: 0
worker-0 another loop, request: 50, failed: 0
worker-1 another loop, request: 50, failed: 0
worker-0 another loop, request: 50, failed: 0
worker-1 another loop, request: 50, failed: 0

The accesses are all successful.

Update Pod revision

To trigger a Pod revision update, run the following command to edit the container image and command in the PodTemplate of CollaSet:

echo '
apiVersion: apps.kusionstack.io/v1alpha1
kind: CollaSet
metadata:
name: server
spec:
replicas: 3
selector:
matchLabels:
app: server
template:
metadata:
labels:
app: server
spec:
containers:
- image: wu8685/echo:1.2
name: server
command:
- /app/echo
resources:
limits:
cpu: "0.1"
ephemeral-storage: 100Mi
memory: 100Mi
requests:
cpu: "0.1"
ephemeral-storage: 100Mi
memory: 100Mi
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 3
' | kubectl -n operating-tutorial apply -f -

It will trigger all Pods updated simultaneously. So the application server has no Pod to serve. We can observe the error from client logs.

worker-1 fails to request POST http://47.101.49.182/echo : Post "http://47.101.49.182/echo": read tcp 10.244.1.11:54040->47.101.49.182:80: read: connection reset by peer
worker-0 fails to request POST http://47.101.49.182/echo : Post "http://47.101.49.182/echo": read tcp 10.244.1.11:34438->47.101.49.182:80: read: connection reset by peer
worker-1 fails to request POST http://47.101.49.182/echo : Post "http://47.101.49.182/echo": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
worker-0 fails to request POST http://47.101.49.182/echo : Post "http://47.101.49.182/echo": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
worker-1 fails to request POST http://47.101.49.182/echo : Post "http://47.101.49.182/echo": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
worker-1 another loop, request: 20, failed: 3
worker-0 fails to request POST http://47.101.49.182/echo : Post "http://47.101.49.182/echo": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
worker-0 another loop, request: 20, failed: 3
worker-1 fails to request POST http://47.101.49.182/echo : Post "http://47.101.49.182/echo": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Provision PodTransistionRule

To avoid this problem, provision a PodTransitionRule with a maxUnavailable 50% rule by running the following command:

echo '
apiVersion: apps.kusionstack.io/v1alpha1
kind: PodTransitionRule
metadata:
labels:
name: server
spec:
rules:
- availablePolicy:
maxUnavailableValue: 50%
name: maxUnavailable
selector:
matchLabels:
app: server
' | kubectl -n operating-tutorial apply -f -

After updating the CollaSet of the server to trigger an update, you will see the Pods rolling update one by one, ensuring that at least one Pod is always available to serve.

kubectl -n operating-tutorial get pod
NAME READY STATUS RESTARTS AGE
client-rrfbj 1/1 Running 0 25s
server-457sn 0/1 Running 0 5s
server-bd5sz 0/1 Running 0 5s
server-l842s 1/1 Running 0 2m4s

You can see from the client logs that no access requests fail during this update.

worker-0 another loop, request: 50, failed: 0
worker-1 another loop, request: 50, failed: 0
worker-0 another loop, request: 50, failed: 0
worker-1 another loop, request: 50, failed: 0
worker-0 another loop, request: 50, failed: 0
worker-1 another loop, request: 50, failed: 0
worker-0 another loop, request: 50, failed: 0
worker-0 another loop, request: 50, failed: 0
worker-1 another loop, request: 50, failed: 0
worker-1 another loop, request: 50, failed: 0
worker-0 another loop, request: 50, failed: 0

Clean tutorial namespace

At the end of this tutorial, you can clean up the resources by deleting the namespace:

$ kubectl delete ns operating-tutorial

Comparison with the Native Approach

Kubernetes provides preStop and postStart hook in each container, by which users can also interact with service outside Kubernetes like Aliyun SLB service. However, KusionStack Operating offers several advantages:

  • Pod level vs Container level

Operating offers a Pod level hooks which have more complete information than one container, especially there are several containers in one Pod.

  • Plugin-able

Through KusionStack Operating, you can decouple operations executed before or after Pods actually change. For example, traffic control can be added or removed without modifying the Pod's preStop configuration.

  • Rollback option

In case of issues, rollback becomes a viable option when using the Operating approach to update Pods. Since Operating does not modify the Pods or their containers during the update, if the traffic service experiences problems, there is an opportunity to cancel the update.