Kubernetes Troubleshooting Skill
A complete toolkit for diagnosing Kubernetes applications. Find pods by labels, then execute commands inside containers for deep diagnostics.
When to Use
- User reports service errors, exceptions, failures, timeouts
- Need to check application logs or process status
- Diagnose network, memory, or disk issues
- Keywords: error, exception, failed, timeout, crash, not working, logs, troubleshoot, diagnose, pod, container
Workflow
- Search pods - Find target pods by label selector
- Execute diagnostics - Run commands inside containers
Scripts
1. Search Pods
Find pods by label selector:
uv run python .claude/skills/k8s-troubleshoot/scripts/search_pods.py -l "app=nginx" -n default
| Parameter | Required | Description |
|---|
-l, --label-selector | Yes | Label selector, e.g., app=nginx or project-id=123,pipeline-id=456 |
-n, --namespace | No | Namespace (default: default). Use all for all namespaces |
Output: JSON with success, podCount, pods (name, namespace, phase, containers)
2. Execute Command in Pod
Run diagnostic commands inside a container:
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p "pod-name" -n default -cmd "tail -n 100 /root/logs/app.log"
| Parameter | Required | Description |
|---|
-p, --pod | Yes | Pod name |
-n, --namespace | No | Namespace (default: default) |
-c, --container | No | Container name (for multi-container pods) |
-cmd, --command | Yes | Command to execute |
Output: JSON with success, pod, namespace, command, output
Common Diagnostic Patterns
View application logs
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "tail -n 100 /root/logs/app.log"
Check Nacos config (dubbo3 issues)
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "cat /root/logs/nacos/config.log | grep nacos"
Check processes
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "ps aux | head -20"
Check network
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "netstat -tlnp"
Check disk and memory
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "df -h && free -m"
Troubleshooting Tips
| Issue | Diagnostic Command |
|---|
| dubbo3 no provider | Check /root/logs/nacos/config.log for nacos address |
| Service not responding | Check process status with ps aux and logs |
| Connection issues | Check network with netstat -tlnp |
| OOM errors | Check memory with free -m |