Module 15 — End-to-End Verification

You have built a Kubernetes cluster from scratch — certificates, etcd, control plane, worker nodes, networking, DNS, load balancing, and a deployed application. This final module runs comprehensive verification tests to confirm everything works together.

Each section tests a different cluster capability. Run all tests from your Mac using the kubectl context configured in Module 13.

1. Cluster Component Health

1.1 Node status

kubectl get nodes -o wide

Expected:

NAME      STATUS   ROLES    AGE   VERSION   INTERNAL-IP      ...
worker1   Ready    <none>   1h    v1.31.0   192.168.56.23    ...
worker2   Ready    <none>   1h    v1.31.0   192.168.56.24    ...

Both nodes are Ready with the correct IPs and Kubernetes version.

1.2 Component status

kubectl get componentstatuses

Expected:

NAME                 STATUS    MESSAGE   ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   ok
etcd-1               Healthy   ok

1.3 System pods

kubectl get pods -n kube-system

Expected: CoreDNS pods are Running (2 replicas).

1.4 etcd health

From a control plane node:

ssh cp1 "sudo ETCDCTL_API=3 etcdctl endpoint health \
  --endpoints=https://192.168.56.21:2379,https://192.168.56.22:2379 \
  --cacert=/etc/etcd/ca.pem \
  --cert=/etc/etcd/etcd.pem \
  --key=/etc/etcd/etcd-key.pem"

Expected: Both endpoints show is healthy: successfully committed proposal.

Checkpoint: All cluster components are healthy — nodes Ready, control plane Healthy, etcd healthy, CoreDNS running.

2. Application Verification

2.1 Pod status

kubectl get pods -n customerapp -o wide

Expected: All pods (postgres, backend x2, nginx) are Running and distributed across workers.

2.2 Health endpoint

curl -s http://192.168.56.23:30080/health

Expected: Health response from the backend (e.g., {"status":"ok"}).

2.3 Test through both workers

The NodePort Service is accessible on every worker node:

curl -s http://192.168.56.23:30080/health
curl -s http://192.168.56.24:30080/health

Both should return the same response.

curl -s -X POST http://192.168.56.23:30080/login \
  -H "Content-Type: application/json" \
  -d '{"username":"admin","password":"admin123"}'

Expected: A response with a session token or success message.

2.5 CRUD test

Create a customer:

curl -s -X POST http://192.168.56.23:30080/customers \
  -H "Content-Type: application/json" \
  -d '{"name":"Verification Test","email":"verify@test.com"}'

List customers:

curl -s http://192.168.56.23:30080/customers

The created customer should appear in the list.

Checkpoint: The application is fully functional — health checks pass, login works, CRUD operations succeed through both worker node IPs.

3. DNS Verification

3.1 Cluster service resolution

kubectl run dns-verify --image=busybox:1.36 --restart=Never --rm -it \
  -- nslookup kubernetes.default

Expected: Resolves to 10.32.0.1 (the API server's ClusterIP).

3.2 Application service resolution

kubectl run dns-verify --image=busybox:1.36 --restart=Never --rm -it \
  -n customerapp \
  -- nslookup postgres.customerapp.svc.cluster.local

Expected: Resolves to the ClusterIP of the postgres Service.

3.3 Cross-namespace resolution

kubectl run dns-verify --image=busybox:1.36 --restart=Never --rm -it \
  -- nslookup kube-dns.kube-system.svc.cluster.local

Expected: Resolves to 10.32.0.10 (the CoreDNS Service IP).

3.4 External DNS resolution

kubectl run dns-verify --image=busybox:1.36 --restart=Never --rm -it \
  -- nslookup google.com

Expected: Resolves to a public IP address.

Checkpoint: DNS works for cluster services, cross-namespace lookups, and external domains.

4. Scaling

4.1 Scale the backend to 3 replicas

kubectl scale deployment backend -n customerapp --replicas=3

4.2 Watch pods being created

kubectl get pods -n customerapp -l app=backend -o wide -w

Press Ctrl+C after all 3 pods are Running.

Expected: The scheduler distributes pods across worker1 and worker2. You should see pods on both nodes.

4.3 Verify all replicas serve traffic

for i in 1 2 3 4 5; do
  curl -s http://192.168.56.23:30080/health
  echo
done

All requests should succeed. The Service load-balances across all 3 backend pods.

4.4 Scale back to 2 replicas

kubectl scale deployment backend -n customerapp --replicas=2

Verify one pod is terminated:

kubectl get pods -n customerapp -l app=backend

Expected: 2 pods in Running status.

Checkpoint: Scaling up creates new pods across nodes. Scaling down terminates excess pods gracefully.

5. Rolling Update

5.1 Build and push a v2 image

On the app-server (192.168.56.12) where the registry and source code live:

ssh app-server

Make a small change to the application (e.g., update the health endpoint response or version string), rebuild, and push:

cd ~/customerapp
# Make a small visible change (add a version to the health endpoint, for example)

docker build -t 192.168.56.12:5000/customerapp:v2 .
docker push 192.168.56.12:5000/customerapp:v2

Tip: If you do not want to modify the app code, you can simply retag and push the same image as v2. The rollout will still replace all pods:
docker tag 192.168.56.12:5000/customerapp:v1 192.168.56.12:5000/customerapp:v2
docker push 192.168.56.12:5000/customerapp:v2

5.2 Trigger the rolling update

From your Mac:

kubectl set image deployment/backend -n customerapp \
  backend=192.168.56.12:5000/customerapp:v2

5.3 Watch the rollout

kubectl rollout status deployment/backend -n customerapp

Expected:

Waiting for deployment "backend" rollout to finish: 1 out of 2 new replicas have been updated...
Waiting for deployment "backend" rollout to finish: 1 old replicas are pending termination...
deployment "backend" successfully rolled out

During the rollout, Kubernetes creates new pods with v2 and terminates old v1 pods one at a time — ensuring zero downtime.

5.4 Verify the new version

kubectl get pods -n customerapp -l app=backend -o jsonpath='{.items[*].spec.containers[0].image}'
echo

Expected: All pods run 192.168.56.12:5000/customerapp:v2.

5.5 Test the app still works

curl -s http://192.168.56.23:30080/health

5.6 View rollout history

kubectl rollout history deployment/backend -n customerapp

5.7 Rollback (optional)

If the new version has issues, roll back to the previous version:

kubectl rollout undo deployment/backend -n customerapp

Verify pods are back to v1:

kubectl get pods -n customerapp -l app=backend -o jsonpath='{.items[*].spec.containers[0].image}'
echo

Checkpoint: Rolling update replaces pods without downtime. Rollback restores the previous version.

6. Node Failure Simulation

6.1 Check current pod distribution

kubectl get pods -n customerapp -o wide

Note which pods are on worker1.

6.2 Drain worker1

Draining a node evicts all pods and marks the node as unschedulable:

kubectl drain worker1 --ignore-daemonsets --delete-emptydir-data --force

--ignore-daemonsets — do not evict DaemonSet pods (there are none in this setup, but it is good practice)
--delete-emptydir-data — allow eviction of pods with emptyDir volumes
--force — evict pods not managed by a controller (standalone pods)

6.3 Watch pods reschedule

kubectl get pods -n customerapp -o wide

Expected: Pods that were on worker1 are now rescheduled to worker2. The exception is PostgreSQL — it is pinned to worker1 via nodeName and will be in Pending state.

Note: The PostgreSQL pod uses nodeName: worker1, so it cannot be rescheduled to worker2. In production you would use StatefulSets with distributed storage to handle this. For this training, the brief PostgreSQL downtime demonstrates why stateful workloads need special consideration.

6.4 Verify the app still works (partially)

curl -s http://192.168.56.24:30080/health

The backend and nginx should still work (they are on worker2). Database-dependent operations may fail until PostgreSQL is back.

6.5 Uncordon worker1

kubectl uncordon worker1

This marks worker1 as schedulable again. Existing pods do NOT automatically move back — only new pods will consider worker1 for scheduling.

6.6 Verify PostgreSQL recovers

The PostgreSQL pod should start on worker1 once it is uncordoned:

kubectl get pods -n customerapp -l app=postgres -o wide -w

Wait until it shows Running.

6.7 Verify full app functionality

curl -s http://192.168.56.23:30080/health
curl -s http://192.168.56.23:30080/customers

Everything should be working again.

Checkpoint: Draining a node evicts pods to the remaining node. Uncordoning restores the node for scheduling.

7. Secret Encryption at Rest

In Module 08 you created an encryption config for encrypting Secrets in etcd. Verify it works.

7.1 Create a test secret

kubectl create secret generic test-encryption \
  -n customerapp \
  --from-literal=secret-data="this-should-be-encrypted"

7.2 Read the secret from etcd directly

From a control plane node (cp1):

ssh cp1 "sudo ETCDCTL_API=3 etcdctl get /registry/secrets/customerapp/test-encryption \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/etcd/ca.pem \
  --cert=/etc/etcd/etcd.pem \
  --key=/etc/etcd/etcd-key.pem \
  --hex"

7.3 Verify encryption

The output should contain hex data. Look for the k8s:enc:aescbc:v1:key1 prefix in the raw value. This confirms the Secret is encrypted using AES-CBC with the key you generated in Module 08.

If the data were unencrypted, you would see the raw this-should-be-encrypted string in plain text. The encrypted data looks like random bytes.

7.4 Verify Kubernetes can still read it

kubectl get secret test-encryption -n customerapp -o jsonpath='{.data.secret-data}' | base64 -d
echo

Expected: this-should-be-encrypted

Kubernetes transparently decrypts the data when reading through the API server.

7.5 Clean up

kubectl delete secret test-encryption -n customerapp

Checkpoint: Secrets are encrypted at rest in etcd. The API server transparently encrypts and decrypts data.

8. Cluster Health Report

Create a script that runs all verification checks and prints a summary. This is useful for quick cluster validation at any time.

8.1 Create the script

On your Mac:

cat > ~/k8s-cluster/cluster-health.sh <<'SCRIPT'
#!/bin/bash
set -euo pipefail

echo "============================================"
echo "  Kubernetes Cluster Health Report"
echo "  $(date)"
echo "============================================"
echo

# Nodes
echo "--- Nodes ---"
kubectl get nodes -o wide
echo

# Component status
echo "--- Component Status ---"
kubectl get componentstatuses 2>/dev/null || echo "(componentstatuses API deprecated)"
echo

# System pods
echo "--- System Pods ---"
kubectl get pods -n kube-system -o wide
echo

# Application pods
echo "--- Application Pods (customerapp) ---"
kubectl get pods -n customerapp -o wide
echo

# Services
echo "--- Services (customerapp) ---"
kubectl get svc -n customerapp
echo

# App health check
echo "--- Application Health ---"
HEALTH=$(curl -s --max-time 5 http://192.168.56.23:30080/health 2>/dev/null || echo "UNREACHABLE")
echo "  Worker1 (192.168.56.23:30080): ${HEALTH}"
HEALTH=$(curl -s --max-time 5 http://192.168.56.24:30080/health 2>/dev/null || echo "UNREACHABLE")
echo "  Worker2 (192.168.56.24:30080): ${HEALTH}"
echo

# DNS check
echo "--- DNS Check ---"
kubectl run dns-check --image=busybox:1.36 --restart=Never --rm -it --quiet \
  -- nslookup kubernetes.default 2>/dev/null || echo "  DNS check failed"
echo

# Cluster info
echo "--- Cluster Info ---"
kubectl cluster-info
echo

echo "============================================"
echo "  Health report complete."
echo "============================================"
SCRIPT

chmod +x ~/k8s-cluster/cluster-health.sh

8.2 Run the report

~/k8s-cluster/cluster-health.sh

Review the output. All sections should show healthy components, running pods, and successful health checks.

Checkpoint: The health report script runs and shows all-green status.

9. What You Have Now — Full Cluster Summary

Congratulations. You have built a complete Kubernetes cluster from scratch. Here is everything you created across Modules 06–15:

Module	What you built
06 — Cluster VMs	5 VMs with static IPs and SSH access
07 — Certificate Authority & TLS	CA + 10 certificate pairs for all components
08 — Kubeconfig Files	6 kubeconfigs + encryption config distributed to nodes
09 — etcd Cluster	2-node etcd cluster with peer/client TLS
10 — Control Plane	kube-apiserver, controller-manager, scheduler on cp1/cp2
11 — Worker Nodes	containerd, kubelet, kube-proxy on worker1/worker2
12 — CNI Networking	Bridge plugin + static routes for cross-node pod networking
13 — CoreDNS & HAProxy	Cluster DNS + API server load balancing
14 — Deploy App	PostgreSQL + Go backend + Nginx on Kubernetes
15 — Verification	Scaling, rolling updates, failover, encryption at rest

Cluster capabilities verified

Capability	Status
Multi-node control plane with leader election	Verified
Worker node registration and scheduling	Verified
Cross-node pod networking	Verified
Service discovery via CoreDNS	Verified
API server high availability via HAProxy	Verified
Application deployment with Deployments and Services	Verified
Horizontal scaling	Verified
Rolling updates with zero downtime	Verified
Node failure resilience (drain/uncordon)	Verified
Secret encryption at rest	Verified
Persistent storage with PersistentVolumes	Verified
Private registry with authentication	Verified

10. What's Next

You have completed the Kubernetes The Hard Way track. Here are areas to explore next:

Cluster management:

Helm — package manager for Kubernetes. Deploy complex applications with a single helm install command.
Ingress controllers — replace NodePort with proper HTTP routing (Nginx Ingress, Traefik, or Envoy-based controllers).
cert-manager — automate TLS certificate management with Let's Encrypt.

Observability:

Prometheus + Grafana — metrics collection and dashboarding for cluster and application monitoring.
Loki — log aggregation. Centralize logs from all pods into a single queryable store.
OpenTelemetry — distributed tracing for understanding request flows across services.

Networking:

Calico or Cilium — replace the basic bridge CNI with a production-grade CNI that supports network policies, BGP, and eBPF.
Network Policies — restrict pod-to-pod traffic based on labels (zero-trust networking).

Security:

Pod Security Standards — enforce security baselines (no privileged containers, no host networking, read-only root filesystem).
OPA/Gatekeeper — policy engine for validating Kubernetes resources before they are created.
Falco — runtime threat detection for containers.

Storage:

Longhorn or Rook-Ceph — distributed storage that works across nodes (replaces single-node hostPath).
CSI drivers — integrate with cloud storage providers.

GitOps:

Flux or ArgoCD — continuous deployment from Git. Push a change to your repo, and the cluster automatically updates to match.

Each of these tools builds on the fundamentals you now understand. Because you built the cluster by hand, you know exactly what each tool is abstracting away.

1. Cluster Component Health​

1.1 Node status​

1.2 Component status​

1.3 System pods​

1.4 etcd health​

2. Application Verification​

2.1 Pod status​

2.2 Health endpoint​

2.3 Test through both workers​

2.4 Login test​

2.5 CRUD test​

3. DNS Verification​

3.1 Cluster service resolution​

3.2 Application service resolution​

3.3 Cross-namespace resolution​

3.4 External DNS resolution​

4. Scaling​

4.1 Scale the backend to 3 replicas​

4.2 Watch pods being created​

4.3 Verify all replicas serve traffic​

4.4 Scale back to 2 replicas​

5. Rolling Update​

5.1 Build and push a v2 image​

5.2 Trigger the rolling update​

5.3 Watch the rollout​

5.4 Verify the new version​

5.5 Test the app still works​

5.6 View rollout history​

5.7 Rollback (optional)​

6. Node Failure Simulation​

6.1 Check current pod distribution​

6.2 Drain worker1​

6.3 Watch pods reschedule​

6.4 Verify the app still works (partially)​

6.5 Uncordon worker1​

6.6 Verify PostgreSQL recovers​

6.7 Verify full app functionality​

7. Secret Encryption at Rest​

7.1 Create a test secret​

7.2 Read the secret from etcd directly​

7.3 Verify encryption​

7.4 Verify Kubernetes can still read it​

7.5 Clean up​

8. Cluster Health Report​

8.1 Create the script​

8.2 Run the report​

9. What You Have Now — Full Cluster Summary​

Cluster capabilities verified​

10. What's Next​

1. Cluster Component Health

1.1 Node status

1.2 Component status

1.3 System pods

1.4 etcd health

2. Application Verification

2.1 Pod status

2.2 Health endpoint

2.3 Test through both workers

2.4 Login test

2.5 CRUD test

3. DNS Verification

3.1 Cluster service resolution

3.2 Application service resolution

3.3 Cross-namespace resolution

3.4 External DNS resolution

4. Scaling

4.1 Scale the backend to 3 replicas

4.2 Watch pods being created

4.3 Verify all replicas serve traffic

4.4 Scale back to 2 replicas

5. Rolling Update

5.1 Build and push a v2 image

5.2 Trigger the rolling update

5.3 Watch the rollout

5.4 Verify the new version

5.5 Test the app still works

5.6 View rollout history

5.7 Rollback (optional)

6. Node Failure Simulation

6.1 Check current pod distribution

6.2 Drain worker1

6.3 Watch pods reschedule

6.4 Verify the app still works (partially)

6.5 Uncordon worker1

6.6 Verify PostgreSQL recovers

6.7 Verify full app functionality

7. Secret Encryption at Rest

7.1 Create a test secret

7.2 Read the secret from etcd directly

7.3 Verify encryption

7.4 Verify Kubernetes can still read it

7.5 Clean up

8. Cluster Health Report

8.1 Create the script

8.2 Run the report

9. What You Have Now — Full Cluster Summary

Cluster capabilities verified

10. What's Next