Quick Health Checks (Beginner Friendly)
Think of “quick health checks” as a five‑minute sniff test: are the essentials up and answering?
We keep it simple and practical—no jargon, just checks you can run.
What we’re checking​
- Identity is working: services can prove who they are (SPIRE)
- Policies answer: the rules engine is reachable (OPA)
- The Gateway is up: the front door responds and can reach core services
Local checks (from your laptop)​
- Is the policy engine (OPA) alive?
- Endpoint: http://localhost:8181/health → Expect HTTP 200 and a tiny JSON body
- Is the Gateway up?
- If running locally, hit http://localhost:7070/api/status → Expect HTTP 200 and JSON
- Can we list basics (with dev headers)?
- Security doors: GET /api/security/doors
- Energy zones: GET /api/energy/zones
- Add headers so the Gateway treats you like a developer:
- x-citadel-user-id: smoketest
- x-citadel-username: smoketest
- x-spiffe-id: spiffe://citadel.mesh/user/smoketest
If you see “permission denied”, double‑check the headers. That’s the zero‑trust guardrails doing their job.
In‑cluster checks (the Kubernetes way)​
Because we run with strict “only talk if allowed” rules, random pods can’t hit services by default. Use one of these two patterns:
- Port‑forward from your laptop
- Port‑forward the Gateway Service to a local port (e.g. 18090)
- Then browse to / and /api/status on http://127.0.0.1:18090
- Run a labeled one‑off Job (the smoketest)
- We ship a small curl‑based Job (disabled by default) that checks:
- Gateway /api/status returns 200
- OPA /health returns 200
- It carries a label (citadel.smoketest=true) so network policies can allow just enough to run and nothing more.
Why we do it this way​
- Safety first: we don’t bypass policy to test the system
- Clear signals: success/failure you can see instantly
- Reproducible: same checks run locally and in‑cluster
When something fails​
- OPA /health fails → Policy engine unreachable; check the OPA pod
- Gateway /api/status fails → Gateway down or can’t bind; check the logs and port
- Security/Energy endpoints fail with 403 → Headers missing or roles not allowed
Once these pass, you’re safe to move on to deeper validation and dashboards.
Optional: Automation helper (Playwright)​
If you prefer one command instead of multiple curls, we include a tiny automation that does the same checks:
- Location:
mcp-servers/playwright-smoketest - What it checks: Dashboard URL (if running), OPA
/health, Gateway/api/status, plus a couple of protected endpoints using dev headers
Environment variables (override as needed):
DASHBOARD_URL(default:https://localhost:5000)OPA_URL(default:http://localhost:8181/health)GATEWAY_URL(default:http://localhost:7070/api/status)GATEWAY_BASE(base URL for protected endpoints)ORCHESTRATOR_URL(default:http://localhost:8080/health)
It prints a single JSON summary: which checks passed (HTTP 200) and which didn’t.