Edge Architecture and Deployment
CitadelMesh is edge-first: each building operates autonomously with local compute, storage, and policy enforcement. Unlike traditional BMS deployments that assume a trusted LAN, CitadelMesh treats every edge node as an independent zero-trust island with continuously attested identities.
This document covers the deployment model, offline autonomy, and fail-safe design that distinguish CitadelMesh from βcloud-firstβ or βcontroller-roomβ automation stacks.
Edge-First Philosophyβ
Buildings Must Work Offline: Cloud connectivity enhances capabilities but is not required for core operations.
Design Principles:
- Local Autonomy: Critical control loops run entirely at the edge and keep working during WAN outages.
- Cloud Sync: Cloud provides orchestration, analytics, long-term storage, and multi-site coordination only when connectivity is available.
- Graceful Degradation: Loss of cloud connectivity reduces convenience features, not safety or compliance.
- Security Isolation: Each workload authenticates via SPIFFE/SPIRE regardless of network location; βon the VLANβ never implies trust.
Edge Hardware Profileβ
Industrial PC Specificationβ
Recommended Spec (per building/large zone):
- CPU: 4-8 cores (Intel i5/i7 or AMD Ryzen)
- RAM: 16-32 GB
- Storage: 512 GB - 1 TB NVMe SSD (RAID 1 for redundancy)
- Network: Dual Gigabit Ethernet (redundant links)
- Optional: GPU/NPU for video analytics
- Security: TPM 2.0 for secure boot and key storage
- Power: UPS-backed with 4-hour runtime
Example Products:
- Dell Edge Gateway 5200
- HP t740 Thin Client
- Advantech ARK-1123 Series
- Custom build with Ubuntu LTS
Trust Fabricβ
Each edge cluster joins the citadel.mesh trust domain. SPIRE agents handle node attestation, and workloads must obtain short-lived SVIDs before they can publish to NATS, hit the gateway, or call adapters. This replaces shared secrets and IP allow lists with verifiable identity. See Identity Foundation for selector design and federation strategy.
Edge Software Stackβ
Operating Systemβ
Ubuntu LTS Core (22.04 LTS or newer)
- Minimal attack surface
- Long-term support (5 years)
- Secure boot with TPM
- Automated security patching
- CIS hardened baseline
Container Orchestrationβ
K3s (lightweight Kubernetes)
Why K3s over full Kubernetes:
- Lightweight: 40 MB binary, low memory footprint
- Edge-Optimized: Runs on resource-constrained hardware
- Single Binary: Easy deployment and updates
- Compatible: Same APIs as full Kubernetes (AKS/EKS compatible)
- Embedded Components: Built-in ingress, storage, service LB
# Install K3s
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644
# Verify installation
kubectl get nodes
Messaging Infrastructureβ
MQTT (Mosquitto): Low-latency pub/sub for device telemetry
# mosquitto.conf
listener 1883
protocol mqtt
listener 8883
protocol mqtt
cafile /etc/mosquitto/ca.crt
certfile /etc/mosquitto/server.crt
keyfile /etc/mosquitto/server.key
require_certificate true # mTLS
NATS JetStream: Reliable event streaming with persistence
# NATS deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nats
spec:
serviceName: nats
replicas: 1
selector:
matchLabels:
app: nats
template:
spec:
containers:
- name: nats
image: nats:2.10-alpine
args:
- "-c"
- "/etc/nats/nats-server.conf"
- "-js" # Enable JetStream
volumeMounts:
- name: config
mountPath: /etc/nats
- name: data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
Storageβ
TimescaleDB: Time-series telemetry and metrics
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: timescaledb
spec:
serviceName: timescaledb
replicas: 1
template:
spec:
containers:
- name: timescaledb
image: timescale/timescaledb:latest-pg15
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi # 90-day retention
Object Cache: Local file storage for video, images
# Use hostPath or local PV for video cache
mkdir -p /var/lib/citadel/cache
Policy Engineβ
OPA (Open Policy Agent): Policy enforcement sidecar
apiVersion: apps/v1
kind: Deployment
metadata:
name: opa
spec:
template:
spec:
containers:
- name: opa
image: openpolicyagent/opa:latest
args:
- "run"
- "--server"
- "--addr=0.0.0.0:8181"
- "/policies"
volumeMounts:
- name: policies
mountPath: /policies
volumes:
- name: policies
configMap:
name: opa-policies
Networking and Securityβ
Network Architectureβ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Building Edge β
β β
β βββββββββββββββββββββββββββββββββββββββ β
β β Management VLAN (10.0.1.0/24) β β
β β - K3s control plane β β
β β - SPIRE server β β
β βββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββ β
β β Agent VLAN (10.0.10.0/24) β β
β β - Security/Energy/Automation β β
β β - NATS/MQTT β β
β βββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββ β
β β BMS VLAN (10.0.20.0/24) β β
β β - EcoStruxure EBO β β
β β - BACnet devices β β
β βββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββ β
β β Security VLAN (10.0.30.0/24) β β
β β - Security Expert β β
β β - Avigilon β β
β βββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββ β
β β Fire/Life Safety VLAN (Isolated) β β
β β - Bosch fire panel (read-only) β β
β βββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
β WireGuard Tunnel (encrypted)
βΌ
ββββββββββββββββββ
β Cloud Control β
β Plane β
ββββββββββββββββββ
Firewall Rulesβ
# Default deny
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT DROP
# Allow loopback
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Allow K3s API (from management only)
iptables -A INPUT -p tcp --dport 6443 -s 10.0.1.0/24 -j ACCEPT
# Allow NATS (internal only)
iptables -A INPUT -p tcp --dport 4222 -s 10.0.10.0/24 -j ACCEPT
# Allow WireGuard to cloud
iptables -A OUTPUT -p udp --dport 51820 -d <cloud-ip> -j ACCEPT
# Deny all other outbound (explicit egress allowlist)
iptables -A OUTPUT -j LOG --log-prefix "EGRESS-DENY: "
iptables -A OUTPUT -j DROP
mTLS with SPIFFEβ
All components authenticate via SPIFFE SVIDs:
# SPIRE Agent DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: spire-agent
namespace: spire
spec:
selector:
matchLabels:
app: spire-agent
template:
spec:
hostPID: true
containers:
- name: spire-agent
image: ghcr.io/spiffe/spire-agent:1.8.0
volumeMounts:
- name: spire-agent-socket
mountPath: /run/spire/sockets
volumes:
- name: spire-agent-socket
hostPath:
path: /run/spire/sockets
type: DirectoryOrCreate
Offline Autonomyβ
Offline Operation Modesβ
Normal Mode (Cloud Connected):
- Full feature set
- Cloud analytics and ML models
- Cross-site coordination
- Cloud-based approvals
Degraded Mode (Cloud Disconnected):
- Local control loops continue
- OPA policies enforced locally
- NATS queues events for sync
- Local approvals only (if configured)
Emergency Mode (Severe failure):
- Life safety preserved
- Emergency unlock/lock override
- Minimal logging (local only)
Queue-Based Syncβ
NATS JetStream queues events during disconnection:
// NATS stream config
stream := jetstream.StreamConfig{
Name: "citadel-events",
Subjects: []string{"telemetry.>", "control.>", "incidents.>"},
Storage: jetstream.FileStorage,
Retention: jetstream.WorkQueuePolicy,
MaxAge: 72 * time.Hour, // 72-hour retention
MaxBytes: 10 * 1024 * 1024 * 1024, // 10 GB
}
When cloud reconnects, queued events sync automatically.
Local Dashboardsβ
Kiosk UI available on local network:
http://citadel-edge.local:8080
Provides:
- Current building state
- Recent incidents
- Manual override controls (with local approval)
- System health status
Deployment Processβ
GitOps with ArgoCDβ
# ArgoCD Application for edge site
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: citadel-edge-building-a
namespace: argocd
spec:
project: citadel
source:
repoURL: https://github.com/org/citadel-config
targetRevision: main
path: sites/building-a
destination:
server: https://kubernetes.default.svc
namespace: citadel
syncPolicy:
automated:
prune: true
selfHeal: true
Signed Container Imagesβ
All images signed with Sigstore:
# Sign image
cosign sign --key cosign.key citadel/security-agent:v1.0.0
# Verify in admission controller
cosign verify --key cosign.pub citadel/security-agent:v1.0.0
Configuration Managementβ
Helm charts + Kustomize for site-specific config:
# sites/building-a/kustomization.yaml
bases:
- ../../base
patchesStrategicMerge:
- agents-config.yaml
- adapters-config.yaml
configMapGenerator:
- name: site-config
literals:
- SITE_ID=building-a
- TIMEZONE=America/New_York
- NATS_URL=nats://nats.citadel.svc:4222
Monitoring and Diagnosticsβ
Health Checksβ
# Agent health endpoint
@app.get("/health")
async def health():
return {
"status": "healthy",
"agent_id": config.agent_id,
"spiffe_id": svid.spiffe_id,
"nats_connected": event_bus.is_connected(),
"opa_available": await opa_client.ping(),
"uptime_seconds": time.time() - start_time
}
OpenTelemetry Collectionβ
# OTel Collector deployment
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
spec:
template:
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector:latest
volumeMounts:
- name: config
mountPath: /etc/otel
volumes:
- name: config
configMap:
name: otel-collector-config
Related Documentationβ
- Overview - Edge in overall architecture
- Cloud Integration - Edge-to-cloud sync
- Observability - Edge monitoring
- Identity Foundation - SPIFFE at edge