Skip to main content

Quick Health Checks (Beginner Friendly)

Think of “quick health checks” as a five‑minute sniff test: are the essentials up and answering?

We keep it simple and practical—no jargon, just checks you can run.

What we’re checking​

  • Identity is working: services can prove who they are (SPIRE)
  • Policies answer: the rules engine is reachable (OPA)
  • The Gateway is up: the front door responds and can reach core services

Local checks (from your laptop)​

  1. Is the policy engine (OPA) alive?
  1. Is the Gateway up?
  1. Can we list basics (with dev headers)?
  • Security doors: GET /api/security/doors
  • Energy zones: GET /api/energy/zones
  • Add headers so the Gateway treats you like a developer:
    • x-citadel-user-id: smoketest
    • x-citadel-username: smoketest
    • x-spiffe-id: spiffe://citadel.mesh/user/smoketest

If you see “permission denied”, double‑check the headers. That’s the zero‑trust guardrails doing their job.

In‑cluster checks (the Kubernetes way)​

Because we run with strict “only talk if allowed” rules, random pods can’t hit services by default. Use one of these two patterns:

  1. Port‑forward from your laptop
  • Port‑forward the Gateway Service to a local port (e.g. 18090)
  • Then browse to / and /api/status on http://127.0.0.1:18090
  1. Run a labeled one‑off Job (the smoketest)
  • We ship a small curl‑based Job (disabled by default) that checks:
    • Gateway /api/status returns 200
    • OPA /health returns 200
  • It carries a label (citadel.smoketest=true) so network policies can allow just enough to run and nothing more.

Why we do it this way​

  • Safety first: we don’t bypass policy to test the system
  • Clear signals: success/failure you can see instantly
  • Reproducible: same checks run locally and in‑cluster

When something fails​

  • OPA /health fails → Policy engine unreachable; check the OPA pod
  • Gateway /api/status fails → Gateway down or can’t bind; check the logs and port
  • Security/Energy endpoints fail with 403 → Headers missing or roles not allowed

Once these pass, you’re safe to move on to deeper validation and dashboards.

Optional: Automation helper (Playwright)​

If you prefer one command instead of multiple curls, we include a tiny automation that does the same checks:

  • Location: mcp-servers/playwright-smoketest
  • What it checks: Dashboard URL (if running), OPA /health, Gateway /api/status, plus a couple of protected endpoints using dev headers

Environment variables (override as needed):

  • DASHBOARD_URL (default: https://localhost:5000)
  • OPA_URL (default: http://localhost:8181/health)
  • GATEWAY_URL (default: http://localhost:7070/api/status)
  • GATEWAY_BASE (base URL for protected endpoints)
  • ORCHESTRATOR_URL (default: http://localhost:8080/health)

It prints a single JSON summary: which checks passed (HTTP 200) and which didn’t.