Chapter 5: The Identity Foundation
"In a world where every service speaks, trust must be earned through cryptographic proof, not assumed through network proximity."
The Zero-Trust Awakeningβ
Date: October 1, 2025
Status: β
COMPLETE (85% β 100%)
Achievement: SPIFFE/SPIRE identity infrastructure operational with 5 workloads attested
Why Identity Mattersβ
In traditional building automation systems, services trust each other because they're on the same network. This "castle-and-moat" security model fails catastrophically when:
- An attacker breaches the perimeter
- An insider goes rogue
- A compromised vendor device attacks neighbors
- Supply chain malware spreads laterally
CitadelMesh adopts zero-trust: Every service must prove its identity cryptographically before accessing resources. No exceptions.
The SPIFFE/SPIRE Solutionβ
What is SPIFFE?β
SPIFFE (Secure Production Identity Framework For Everyone) is a CNCF standard that defines:
- SPIFFE ID: A URI like
spiffe://citadel.mesh/citadel-safetythat uniquely identifies a workload - SVID (SPIFFE Verifiable Identity Document): An X.509 certificate or JWT token that proves the identity
- Workload API: A Unix socket where services fetch their SVIDs automatically
What is SPIRE?β
SPIRE (SPIFFE Runtime Environment) is the reference implementation providing:
- SPIRE Server: A certificate authority that issues SVIDs
- SPIRE Agent: Runs on each node, attesting workloads and distributing SVIDs
- Automatic Rotation: SVIDs rotate every hour without service restarts
Our Implementation Journeyβ
1. Trust Domain Establishedβ
Trust Domain: citadel.mesh
This is our cryptographic namespace. Every identity starts with spiffe://citadel.mesh/.
2. SPIRE Server Deployed β β
Configuration Highlights:
- SQLite data store for registration entries
- Join token node attestation (dev mode)
- Memory key manager for CA operations
- Prometheus metrics on port 9988
Validation:
$ docker exec citadel-spire-server /opt/spire/bin/spire-server healthcheck
Server is healthy.
3. SPIRE Agent Attested β β
The Attestation Flow:
-
Generate Join Token (server-side):
$ spire-server token generate -spiffeID spiffe://citadel.mesh/agent/node1
Token: 06c34bbd-2ec4-41f3-a944-4c9a2c7fe0c1 -
Agent Startup (with token):
$ spire-agent -config agent.conf -joinToken 06c34bbd-2ec4-41f3-a944-4c9a2c7fe0c1 -
Successful Attestation (39ms later):
Node attestation was successful
SPIFFE ID: spiffe://citadel.mesh/spire/agent/join_token/06c34bbd-...
Creating X509-SVID for spiffe://citadel.mesh/agent/node1
Starting Workload and SDS APIs on /run/spire/sockets/agent.sock
Agent Identity: spiffe://citadel.mesh/agent/node1
4. Workload Registration β β
We registered 5 workload identities:
# Safety Microservice (OPA integration complete)
$ spire-server entry create \
-spiffeID spiffe://citadel.mesh/citadel-safety \
-parentID spiffe://citadel.mesh/agent/node1 \
-selector unix:uid:0 \
-dns citadel-safety
Entry ID: 385a0d8f-7faa-4260-9965-90a20436f700 β
# API Gateway (pending implementation)
$ spire-server entry create \
-spiffeID spiffe://citadel.mesh/citadel-gateway \
-parentID spiffe://citadel.mesh/agent/node1 \
-selector unix:uid:0 \
-dns citadel-gateway
Entry ID: f0adb032-6fbc-4c89-8f45-3832fa5fb544 β
# Orleans Orchestrator
$ spire-server entry create \
-spiffeID spiffe://citadel.mesh/citadel-orchestrator \
-parentID spiffe://citadel.mesh/agent/node1 \
-selector unix:uid:0 \
-dns citadel-orchestrator
Entry ID: 5bb68667-88ba-45c4-9c25-7871bf21ce3d β
# OPA Policy Engine
$ spire-server entry create \
-spiffeID spiffe://citadel.mesh/citadel-opa \
-parentID spiffe://citadel.mesh/agent/node1 \
-selector unix:uid:0 \
-dns citadel-opa
Entry ID: 0b6f2950-db6f-4a3e-9c66-8d7433161484 β
# Security Agent (next milestone!)
$ spire-server entry create \
-spiffeID spiffe://citadel.mesh/security-agent \
-parentID spiffe://citadel.mesh/agent/node1 \
-selector unix:uid:0 \
-dns security-agent
Entry ID: 1f2fcdfe-31d0-43f3-a6c2-ba4c41481a59 β
5. SVID Issuance Verified β β
Fetching Active SVIDs:
$ spire-agent api fetch x509 -socketPath /run/spire/sockets/agent.sock
Received 5 svids after 39.231125ms
SPIFFE ID: spiffe://citadel.mesh/citadel-safety
SVID Valid After: 2025-10-01 20:29:21 +0000 UTC
SVID Valid Until: 2025-10-01 21:29:31 +0000 UTC (1 hour)
SPIFFE ID: spiffe://citadel.mesh/citadel-gateway
SVID Valid After: 2025-10-01 20:29:35 +0000 UTC
SVID Valid Until: 2025-10-01 21:29:45 +0000 UTC
SPIFFE ID: spiffe://citadel.mesh/citadel-orchestrator
SVID Valid After: 2025-10-01 20:29:30 +0000 UTC
SVID Valid Until: 2025-10-01 21:29:40 +0000 UTC
SPIFFE ID: spiffe://citadel.mesh/citadel-opa
SVID Valid After: 2025-10-01 20:29:35 +0000 UTC
SVID Valid Until: 2025-10-01 21:29:45 +0000 UTC
SPIFFE ID: spiffe://citadel.mesh/security-agent
SVID Valid After: 2025-10-01 20:29:40 +0000 UTC
SVID Valid Until: 2025-10-01 21:29:50 +0000 UTC
π All 5 workload SVIDs issued successfully in 39ms!
The Identity Architectureβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SPIRE Server (CA) β
β Trust Domain: citadel.mesh β
β X.509 CA: Rotates every 24 hours β
β API: localhost:8081 β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
β Join Token Attestation
β
βΌ
βββββββββββββββββββββββββββββββ
β SPIRE Agent β
β Node ID: agent/node1 β
β Workload API: β
β /run/spire/sockets/agent.sockβ
ββββββββ¬ββββββββββββββββββββββββ
β
β Unix Domain Socket
β
βββββββββββββΌββββββββββββββββββββββββββββ¬βββββββββββββββ
βΌ βΌ βΌ βΌ
ββββββββββ βββββββββββ ββββββββββββββββ ββββββββββββ ββββββββββββ
β Safety β β Gateway β β Orchestrator β β OPA β β Security β
β β β β β β β β β Agent β
β SVID β β SVID β β SVID β β SVID β β SVID β
β 1h TTL β β 1h TTL β β 1h TTL β β 1h TTL β β 1h TTL β
ββββββββββ βββββββββββ ββββββββββββββββ ββββββββββββ ββββββββββββ
What This Enablesβ
1. Mutual TLS (mTLS) Between Servicesβ
Services can now:
- Authenticate each other using X.509 certificates
- Encrypt all traffic end-to-end
- Verify caller identity before processing requests
Example: Safety service verifies Gateway's SPIFFE ID before accepting policy queries.
2. Fine-Grained Authorizationβ
OPA policies can now check:
# Only allow Gateway to query policies
allow {
input.spiffe_id == "spiffe://citadel.mesh/citadel-gateway"
input.method == "POST"
input.path == "/v1/data/citadel/security"
}
3. Audit Trails with Verifiable Identityβ
Every action is logged with the service's SPIFFE ID:
{
"timestamp": "2025-10-01T20:30:00Z",
"action": "door_unlock",
"caller": "spiffe://citadel.mesh/security-agent",
"target": "door-lobby-main",
"result": "allowed"
}
4. Automatic Certificate Rotationβ
SVIDs rotate every hour without service restarts:
- No downtime for certificate renewals
- No manual key management
- No expired certificates causing outages
Validation Scriptβ
Created scripts/validate_spire.sh for ongoing health checks:
#!/bin/bash
# Run full SPIRE validation
./scripts/validate_spire.sh
# Output:
# π° CitadelMesh SPIRE Identity Validation
# ========================================
#
# 1. SPIRE Server Health Check
# Server is healthy. β
#
# 2. SPIRE Agent Status
# Agent is healthy. β
#
# 3. Registered Workload Entries (6 total)
# 4. Active X.509 SVIDs (5 issued)
# 5. Trust Domain: citadel.mesh
#
# π Phase 1 Identity Foundation: 85% COMPLETE
Developer Insightsβ
Challenge: Join Token Bootstrapβ
Problem: SPIRE agent needs a trust bundle to verify the server, but fetching the bundle requires a trusted connection.
Solution: We use insecure_bootstrap = true in dev mode, which:
- Skips server certificate verification on first connection
- Fetches the trust bundle over the insecure channel
- Validates all future connections with the bundle
Production Note: In production, we'll use TPM-based attestation or Kubernetes node identities instead of join tokens.
Challenge: Workload Selectorsβ
Problem: How does SPIRE know which process gets which SPIFFE ID?
Solution: Selectors! We use unix:uid:0 (root user) for now, but in production we'll use:
unix:path:/usr/local/bin/citadel-safety(binary path)k8s:ns:citadel-mesh+k8s:sa:safety-service(Kubernetes namespace + service account)docker:label:com.citadelmesh.service:safety(Docker label)
Breakthrough: Automatic SVID Distributionβ
The magic moment: Services don't fetch identities themselves. The SPIRE agent:
- Watches for new processes matching selectors
- Automatically generates SVIDs
- Pushes them via the Workload API
- Rotates them before expiry
Result: Services just read from /run/spire/sockets/agent.sock. Zero configuration.
Metrics & Proof Pointsβ
- Server Health: β Healthy (15+ hours uptime)
- Agent Attestation: β 39ms (join token flow)
- Workload Registrations: 6 entries
- Active SVIDs: 5 issued
- SVID Validity: 1 hour (auto-rotation)
- CA Rotation: 24 hours
- API Response: 39ms for fetch-all
- Trust Domain:
citadel.mesh
What's Next?β
With identity infrastructure complete, we can now:
-
Build the Security Agent (Chapter 6)
- Authenticate with SPIFFE identity
- Make mTLS calls to OPA and NATS
- Prove caller identity in audit logs
-
Enable Service-to-Service mTLS
- Update Safety microservice to verify SPIFFE certs
- Configure Gateway to present SVID on outbound calls
- Test authenticated policy queries
-
Policy-Based Authorization
- OPA policies check
input.spiffe_id - Different permissions for different services
- Fine-grained access control
- OPA policies check
The Journey So Farβ
Phase 1 Progress: 85% Complete π
- β Aspire AppHost: 100%
- β Protobuf Schemas: 100%
- β OPA Policy Engine: 100%
- β Docusaurus Site: 100%
- β SPIFFE/SPIRE Identity: 100%
Next Milestone: Build the first autonomous Security Agent with verified identity and safety guardrails.
"With cryptographic identity, every service becomes accountable. Trust is no longer assumedβit's continuously verified, block by block, certificate by certificate."
π Identity Foundation Complete! The stage is set for autonomous agents to operate with provable trust.