Posts

Airgapped Kubernetes Cluster with containerd

After evaluating several local Kubernetes solutions I encountered repeated manual steps (downloading bootstrap images, pulling images from registry mirrors and retagging them, and loading images into clusters) that were time-consuming and error-prone. I decided to set up a properly air-gapped Kubernetes cluster using kubeadm and containerd, leveraging containerd’s registry mirror support. This post documents the steps I followed. Prerequisites This guide assumes a Debian/Ubuntu host. Installing the latest Docker Engine will also provide containerd as a dependency.

Posts

Setup kind cluster using cilium cni in wsl2

Prologue One or two years ago, I tried several times to install kind cluster to using cilium cni. However I didn’t make it. Today I really want to setup one after reading an Kind cluster with Cilium and no kube-proxy and considering that major kubernetes distributions are using cilium cni now. After about 3 hours, I finally got it running successfully. Things don’t go smoothly. Here are my steps to setup it up.

Posts

mount configmap as volume with ini files

Today I spent about 1 hours in the task Deploy Lamp Stack on Kubernetes Cluster - Task. This task deepen my understanding of ini file handling and volume mounting with subpath. I think there are 2 tricky parts make the task interesting. First, the similarity between --from-literal=variables_order=EGPCS and variables_order = "EGPCS" in php.ini make one try to create a configmap at first. However it is not the case. the actual volumemount will mount each key as a file

Posts

My journey to CKA and CKS

Today I got certified as Certified Kubernetes Security Specialist (CKS). It marks a milestone in my mastery of kubernetes after a long journey. It is a good time to reflect on lessons what I learnt. My experiences with kubernetes from 2018. I have a habit of comparing other optiosn before making my decisions. Before 2018, I used vagrant and other configuration management tools (puppet, chef, ansible) to setup my lab environments.

Posts

CKS/CKA tips

The fastest way to get nodes Many times, I run "kubectl get nodes" to get nodes I need to connect to make changes. In the question description section of tasks, master nodes and worker nodes are given. I think the fastest way to get nodes and I wasted some time on getting that info from commands. Get events Now kubectl has events command. The events command has intuitive options and less keystokes.

Posts

My tilt asciinema cast

Today I read agones site and noticed it play asciinema cast in its home page. I wondered if I can put my asciinema cast into my blog. Here is my one. AsciinemaPlayer.create('/casts/497308.cast', document.getElementById('497308'), { }); The cast is about how to use titl for local micro-service development based on kubernetes.

Posts

Resolve the error that The Pod xxx is invalid: spec.containers[0].volumeMounts[1].name: Not found

I encountered the issue yesterday during my exercise in killer CKA similator. However I made the same mistakes again. I think it worth to write it down to avoid I make the same mistake again. In my day to day works, I usually add volumes before containers in the spec section of pods. In exams, sometimes I need to add extra volumes into a existing pod extracted from a running pod, and I still add volumes in the old way.

Posts

Kubernetes beyond 5k nodes etcd-sharding

Background I was once asked how to run big scaled kubernetes clusters regarding etcd size limitation. For example how to handle 8G limitations. I was not satisfied with my answers such as increasing memory, compaction & defragmentation as I knew there were several big scale clusters in several companies and 8GB might be a hard limit. However I didn’t have any clues how they make that happened. Since then, I kept tabs on etcd by reading etcd community meeting (Public).

Posts

Exploring Dapr-Introduction, Experiences, and Live Demo

I was very excited to be invited to give a talk about dapr. Below is my talk. My demo in the session is not successful as things in kubernetes and dapr field are changing very fast. My lab environment didn’t work as expected. Finially I make it work today, I share it here below. Here is my talk at 18th July from time 2235 in below video. Prerequisites kubectl, kind, docker, dapr, git, vscode in WSL

Posts

Leader election in Kubernetes control plane

Leader election in Kubernetes control plane - #HeptioProTip give a way how to find out leaders of Kubernetes control plane components kube-scheduler and kube-controller-manager. However it was based on a old version of kubernetes, and the mechanism is changed in later versions. Today I was asked how to find the leaders. Here is the new way to find. $ kubectl -n kube-system get lease NAME HOLDER AGE kube-apiserver-c4vwjftbvpc5os2vvzle4qg27a kube-apiserver-c4vwjftbvpc5os2vvzle4qg27a_b187371d-e48c-4216-8228-707a0ecf6100 2m57s kube-apiserver-dz2dqprdpsgnm756t5rnov7yka kube-apiserver-dz2dqprdpsgnm756t5rnov7yka_0b531f66-0c31-453c-9277-a6c1aa81da94 86s kube-apiserver-fyloo45sdenffw2ugwaz3likua kube-apiserver-fyloo45sdenffw2ugwaz3likua_3e322f9a-9724-4e3a-9fc6-a512e9424164 2m11s kube-controller-manager kind-control-plane_bec39b96-87c4-4bce-8775-2eeb4eb4c1e8 2m53s kube-scheduler kind-control-plane_db6f36d8-ceaa-40eb-b821-75f8ae829f22 2m53s $ kubectl -n kube-system get pods -l component=kube-controller-manager,tier=control-plane \ -o custom-columns=NAME:.

Notes

kubernetes On-premise

It is a big topic to setup kubernetes correctly. I will keep updating this when I find something new or missed in this.

Posts

What's pod sandbox

The explanation of pod sandbox at the abstraction that replaces the "pause" container that is used to keep namespaces open in every Kubernetes pod today. I doubted about that. What’s the point to introduce a new concept? Yerterday I went down the rabbit hole to understand it. This morning I finally got the hang of it. Kubernetes blog said it is an environment. It maybe a VM, a group of containers.

Notes

Kubelet architecture

In the past I learnt a lot of kubernetes, however there is no much information about kubelet or not deep enough. I couldn’t answer the questions about kubelet correctly in an interview. Here I put all the information I collected from the internet here for my references. kubelet component architecture how does kubelet work Container Lifecycle Management Through the CRI kubelet, CRI and CNI sequence diagram kubelet, CRI and CNI interaction diagram The process of creating a pod Handler The work of podWorkers syncPod kubelet Runtime Creating a sandbox for a pod References: https://www.

Notes

kubernetes metrics options

Components Description Metrics Scaler Metrics Server Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and Vertical Pod Autoscaler. Metrics API can also be accessed by kubectl top, making it easier to debug autoscaling pipelines. Metrics Server is not meant for non-autoscaling purposes CPU/Memory

Posts

Playbook: etcd debugging

etcd debugging flowchart, copy the flowchat from “Stories from the Playbook” for easy reference and put here to make it searchable in my site. flowchart TD oversized{MVCC DB oversized}--|Yes|logIntoContainer(log into container) logIntoContainer -- checkSize(check size of db) checkSize -- compatOrDefrag(compat or defrag) compatOrDefrag -- resizeDisk(resize machine disk) resizeDisk -- triggerRepair(Trigger repair) triggerRepair -- END oversized --|No|crashLoop{crash looping} crashLoop --|Yes| moreTime2Init(allow etc more time to init) moreTime2Init -- upgradeVersion(upgrade version) upgradeVersion -- resizeDisk crashLoop --|No| leaderElectionIssue{Leader Election Issue?

Posts

Lessons learnt after two years usage of cert-manager

Yesterday I spent one or two hours to resolve a pending order issue. I encountered the issue before when I configured cert-manager with ACME. However after a short while detour to flutter(ios,android) development, I couldn’t quickly locate the root causes. This made me think that it is better off to note down the lessons I leant here. DNS01 vs HTTP01 HTTP01 is quite easy to setup for one domain name.

Posts

Do we need a microservice framework

Today I was asked which framework was used in my design of microservice systems. I answered those concerns of microservice are handled natively by kubernetes when my systems are deployed into kubernetes clusters. I will not use those frameworks in my design of microservices. The question sparked several questions in my mind. Here are those questions. Why do we still need microservice frameworks when kubernetes is the de-facto platform now?

Posts

Kubernetes metric options

In normal kubernetes metrics, you will find cAdvisor, Metrics Server, Kubernetes API Server, Node Exporter, and Kube-State-Metrics as source of metrics. In last 2 days, I noticed prometheus-adapter. Today KEDA popped up in my mind. I checked its website and found it can be source of metric as well. cAdvisor, Metrics Server, Kubernetes API Server, Node Exporter, and Kube-State-Metrics can be considered as in-cluster source of metric. However KEDA is about out-cluster source of metrics.

Posts

Shift-lefts in kubernetes with datree

Shift-left is a things nowadays. Recently I read an articles about that beyond testing and security. Today I gave it a try using datree and found that there are still a lot of places to improve in one of my clusters. It is not just a small number. There maybe are many new concepts, configurations and best practicies behind the numbers. kubectl datree test > test.log rg -n ❌ test.

Posts

Setup k8s monitoring

Kuberneters dashboard doesn’t give enough information about node and cluster information during rececent loading test. I sought to other options. Prometheus and grafana are the de-facto standards. It’s a no-brainer choice. The most important things is how to make them working together. Setup Prometheus and Grafana kubectl create namespace monitoring helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update helm install prometheus prometheus-community/prometheus -n monitoring helm repo add grafana https://grafana.github.io/helm-charts helm install grafana grafana/grafana -n monitoring kubectl get secret --namespace monitoring grafana -o jsonpath="{.

Posts

Use cert-manager to secure kubernetes cert-manager behind nginx ingress

Today I had a case to expose serveral kubernetes dashboard with cert-manager. Initiallly I thought it should be quite easy to setup, but the reality was quite different. My intial yaml is as following. apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: ingress-global-dash namespace: kubernetes-dashboard labels: name: ingress-global-dash use-http01-solver: "true" annotations: cert-manager.io/cluster-issuer: "test-issuer" spec: ingressClassName: nginx rules: - host: "dashboard.example.com" http: paths: - pathType: Prefix path: "/" backend: service: name: kubernetes-dashboard port: number: 80 #later changed to 443 according to port of kubernetes-dashboard svc tls: # < placing a host in the TLS config will determine what ends up in the cert's subjectAltNames - hosts: - dashboard.

Posts

Istio troubleshooting in new scenario that one service to expose into multiple domains and multiple ingress gateways

Since my first time to configure istio with proxy protocol supports in aws, istio and envoyproxy changed a lot. In the past several days, I was exposed to a different scenario that one service to expose into multiple domains. Things are getting not straighforward, I struggled to make it success yesterday. Today another layer of complexity was added into my cluster: an extra ingressgateway for some services. For multiple ingress gateways, I followed this article, however my setup is more complex than that.

Posts

istio virtual service with tls - Connection reset by peer

I got following similar errors when setup my istio clusters. Mark bundle as not supporting multiuse 301 istio 301 or 404 error:02FFF036:system library:func(4095):Connection reset by peer * Trying 20.190.14.28:443... * TCP_NODELAY set * Connected to kiali.example.com (20.190.14.28) port 443 (#0) * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /home/ng/anaconda3/ssl/cacert.pem CApath: none * TLSv1.3 (OUT), TLS handshake, Client hello (1): * OpenSSL SSL_connect: Connection reset by peer in connection to kiali.

Posts

Reflection on kubernete usage

Today I explained the current infrastructure fleet configurations to our devops team, I have the thought to make a reflection on the things or lessons I learnt in the past and think about what should I do in the next. The technologies and tools I used in my projects is as following. There are still a lot of things to learn considering the list in my list and the huge ecosystem of kubernetes.

Posts

Resize volumes when PVCs and PVs are okay and the size of file systems in pods doesn't change

Here is an issue with aws-ebs-csi-driver: The size of file system doesn’t change when pvc is expanded. I got the same issue when I tried to do the Curl elk in pods to delete indices this afternoon. I got the message "resize2fs 1.44.5 (15-Dec-2018) open: No such file or directory while opening /dev/nvme1n1" as well when I tried to resize the file system /dev/nvme1n1 in my pod. As the issue is about csidriver, it is not in the the result of running command "kubectl get csidriver" on my cluster.

Posts

Curl elk in pods to delete indices

Today my staging kibana didn’t show logs. I made the decision to work out a solution to solve issue in hard way this time. I don’t want me in the same situation without solutoins. When things go wrong, you can’t login kibana to do management or maintainance works. The left option is managing the data from the command line. In the past I figured out to use curl cli in pod to get some information of elk.

Posts

same device mounted on differences mount points

As in my previous article, I gave the following information of my pod. I still have some time before bed, I couldn’t help to seek the reason of that. /usr/share/nginx/html # df -h Filesystem Size Used Available Use% Mounted on overlay 80.0G 34.5G 45.5G 43% / tmpfs 64.0M 0 64.0M 0% /dev tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup /dev/nvme0n1p1 80.0G 34.5G 45.5G 43% /dev/termination-log /dev/nvme0n1p1 80.0G 34.5G 45.5G 43% /etc/resolv.

Posts

Resize Pod volumes in eks

I did resize volume of kubernetes in the past, however I encountered an interesting issue when I did the resizing in different way. According to doc, I should only change the requested size in pvc. Today I changed the size of pv first, then pvc. Here was the interesting thing: all things of pv and pvc are fine, but the size of the file system in pod was not changed.

Posts

Get back my missed keypair of EKS

Today I needed to scale one of my kubernetes clusterss. Those keys are not in my new laptop since I used Mac Air M1. I didn’t see the increase of nodes several minutes after I run the eksctl scale command. I logged into aws console and found there were several "Failed" messages in activity history of the autoscaling group. All the failed messages shown "Launching a new EC2 instance. Status Reason: The aaaa-nodegroup-ng-1-67:8e:b8:8e:33:83:93:68 key pair does not exist.

Posts

Istio Envoy passthrough goes wrong when port 80 are used for SMTP protocol instead of standard ports

I wrote this on September 7, 2021 and published it on linkedin. However I found that it can be hard to search if I put there. so I put here anoter copy TLDR: if your external SMTP is using port 80 instead of standard ports in an istio mesh, create a Service Entry for the external SMTP. These two days, a strange timeout issue happened in one of our kubernetes clusters when trying to send emails via SMTP even though the same configuration works perfectly on our development machines.

Posts

Kubernetes and immutable infrastructure: docker image digest and image labels

Summary: use digest as the way to refer to docker images in kubernetes resources, put commit id in image labels. An idempotency and immutable infrastructure has a slew of benefits. I am a firm believer of it and I did my best to keep several projects in that way. In the past few weeks, I helped one friend to resolve system structure, performance issue and development experience of one of his projects.

Posts

A strange issue of kubernetes: New pod kept in CrashloopBackOff in newly joined node.

Today I encountered a strange issue with one kubernete cluster. Newly created pods are not created on a newly joined node. There is no errors in logs of pods and the node, events of kubernete. All seems normal. I created a simple nginx pod with nodeName as the problematic node. The pod can be scheduled and created in the node. However the pod was kept in CrashloopBackOff state. There is definitely something wrong with server.

Posts

Setup vscode development for dapr in WSL

Dapr development is quite difficult to setup correctly. You can get the idea from (this issue) as the issue is still open now since it was created. I followed the instruction to install dapr vs code extension in dapr. My environment is WSL based. I tried several times to try to debug the dapr applications and failed. I noticed in the following screenshot that there was a warning icon before kubernetes.

Posts

Install calico cni in alicloud ack and advanced dns troubleshooting in kubernetes

2 weeks I started to learn alicloud to prepare the migration from aws to alicloud. I started the migration this week. Our applications are deployed on kubernetes, I focused on ACK first. The setup journey was bumpy, and most of the issues can be resolved by googling and trial loops. The most difficult one is that dns resolution issue. I tried the steps in "Debugging DNS Resolution" [1]. All steps were good except the nslookup step.

Posts

Exposing TCP and UDP services in nginx ingress

Add command line flags in ingress controller: --tcp-services-configmap --udp-services-configmap Create configmap apiVersion: v1 kind: ConfigMap metadata: name: tcp-services namespace: ingress-nginx data: 5432: "default/postgres:5432" Patch the ingress-nginx-controller to allow port 5432 spec: template: spec: containers: - name: controller ports: - containerPort: 6379 hostPort: 6379 Add inbound rule to security group of node groups Reference: https://minikube.

Posts

log to stdin of a pod in nested shells

During practicing the lifecycle handlers of pod, I found it is quite difficult to get the logs logged from preStop handlers. Initially I thought terminationMessagePath maybe is the answer, but no such luck. The preStop handler will be run before the pod is deleted. After deleted, there is no way to get the logs unless those logs are kept in other place such as central log servers etc. Those settings are way complex for a simple practice and time-consuming.

Posts

vscode debug python containers running kubernetes via attachment

python-guestbook at https://github.com/GoogleCloudPlatform/cloud-code-samples is used to practice debug python containers in kubernetes. Google cloud python code vscode plugin 1.7.0 failed to run python on kubernetes or there is no obvious way to do that as depicted in the below pic. Steps to make it: run skaffold debug Import ptvsd in the python file to debug, add breakout() before the line to debug

Posts

Jetbrain debug containers in kubernetes

I can’t debug the dockerdev app following the steps in https://blog.jetbrains.com/go/2020/05/11/using-kubernetes-from-goland/, even after I cloned its source code https://github.com/jackliusr/dockerdev/tree/kubernetes-debug. I read several articles before I tried the aboved one. The successful debug after several trial and errors. I added skaffold.yaml to the project and installed Cloud Code. skaffold.yaml can be found at https://github.com/jackliusr/dockerdev/tree/kubernetes-debug. Other jetbrain configuration can be seen in the below pictures. cloud code is a very good tool.

Posts

Setup kubeflow pipeline on local KIND cluster

# https://www.kubeflow.org/docs/pipelines/installation/localcluster-deployment/#deploying-kubeflow-pipelines # env/platform-agnostic-pns hasn't been publically released, so we install from master temporarily export PIPELINE_VERSION=1.0.1 kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION" kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION" # expose gui kubectl port-forward -n kubeflow --address 0.0.0.0 svc/ml-pipeline-ui 8080:80

Posts

Create a multiple-nodes KIND kubernetes cluster

KIND supports configuration items and you can find them at https://kind.sigs.k8s.io/docs/user/configuration. Following is the outline of the doc. Cluster-Wide Options ︎ Networking: ︎ IP Family ︎ API Server: port and listen address Pod Subnet ︎ Service Subnet ︎ Disable Default CNI: can be used to try other CNI for CKA and CKAD networks kube-proxy mode : iptables, ipvs Nodes: role control-plane, worker Per-Node Options ︎ Extra Mounts ︎ Extra Port Mappings ︎ Kubeadm Config Patches: Kubeadm Config Patches, kubeadm InitConfiguration and JoinConfiguration My asciicast is as following:

Posts

Pod CrashLoopBackOff Reason

Today one of my deployments always got CrashLoopBackOff. Searching internet don’t gave me the root cause of those errors even I went through all the steps. Steps from internets are all about describe, logs, liveProbe etc. I noticed the reason and exit code in last state when I described one of the pod. It should be the focus point to find root cause of CrashBoopBackOff. Containers: nginx: Container ID: containerd://9570e7e67d83692fdbe0e0871919a81222137fcaee2eaecb1eff6b772ec805b1 Image: nginx:1.