Skip to main content

Edge Architecture and Deployment

CitadelMesh is edge-first: each building operates autonomously with local compute, storage, and policy enforcement. Unlike traditional BMS deployments that assume a trusted LAN, CitadelMesh treats every edge node as an independent zero-trust island with continuously attested identities.

This document covers the deployment model, offline autonomy, and fail-safe design that distinguish CitadelMesh from β€œcloud-first” or β€œcontroller-room” automation stacks.

Edge-First Philosophy​

Buildings Must Work Offline: Cloud connectivity enhances capabilities but is not required for core operations.

Design Principles:

  1. Local Autonomy: Critical control loops run entirely at the edge and keep working during WAN outages.
  2. Cloud Sync: Cloud provides orchestration, analytics, long-term storage, and multi-site coordination only when connectivity is available.
  3. Graceful Degradation: Loss of cloud connectivity reduces convenience features, not safety or compliance.
  4. Security Isolation: Each workload authenticates via SPIFFE/SPIRE regardless of network location; β€œon the VLAN” never implies trust.

Edge Hardware Profile​

Industrial PC Specification​

Recommended Spec (per building/large zone):

  • CPU: 4-8 cores (Intel i5/i7 or AMD Ryzen)
  • RAM: 16-32 GB
  • Storage: 512 GB - 1 TB NVMe SSD (RAID 1 for redundancy)
  • Network: Dual Gigabit Ethernet (redundant links)
  • Optional: GPU/NPU for video analytics
  • Security: TPM 2.0 for secure boot and key storage
  • Power: UPS-backed with 4-hour runtime

Example Products:

  • Dell Edge Gateway 5200
  • HP t740 Thin Client
  • Advantech ARK-1123 Series
  • Custom build with Ubuntu LTS

Trust Fabric​

Each edge cluster joins the citadel.mesh trust domain. SPIRE agents handle node attestation, and workloads must obtain short-lived SVIDs before they can publish to NATS, hit the gateway, or call adapters. This replaces shared secrets and IP allow lists with verifiable identity. See Identity Foundation for selector design and federation strategy.

Edge Software Stack​

Operating System​

Ubuntu LTS Core (22.04 LTS or newer)

  • Minimal attack surface
  • Long-term support (5 years)
  • Secure boot with TPM
  • Automated security patching
  • CIS hardened baseline

Container Orchestration​

K3s (lightweight Kubernetes)

Why K3s over full Kubernetes:

  • Lightweight: 40 MB binary, low memory footprint
  • Edge-Optimized: Runs on resource-constrained hardware
  • Single Binary: Easy deployment and updates
  • Compatible: Same APIs as full Kubernetes (AKS/EKS compatible)
  • Embedded Components: Built-in ingress, storage, service LB
# Install K3s
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644

# Verify installation
kubectl get nodes

Messaging Infrastructure​

MQTT (Mosquitto): Low-latency pub/sub for device telemetry

# mosquitto.conf
listener 1883
protocol mqtt

listener 8883
protocol mqtt
cafile /etc/mosquitto/ca.crt
certfile /etc/mosquitto/server.crt
keyfile /etc/mosquitto/server.key
require_certificate true # mTLS

NATS JetStream: Reliable event streaming with persistence

# NATS deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nats
spec:
serviceName: nats
replicas: 1
selector:
matchLabels:
app: nats
template:
spec:
containers:
- name: nats
image: nats:2.10-alpine
args:
- "-c"
- "/etc/nats/nats-server.conf"
- "-js" # Enable JetStream
volumeMounts:
- name: config
mountPath: /etc/nats
- name: data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi

Storage​

TimescaleDB: Time-series telemetry and metrics

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: timescaledb
spec:
serviceName: timescaledb
replicas: 1
template:
spec:
containers:
- name: timescaledb
image: timescale/timescaledb:latest-pg15
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi # 90-day retention

Object Cache: Local file storage for video, images

# Use hostPath or local PV for video cache
mkdir -p /var/lib/citadel/cache

Policy Engine​

OPA (Open Policy Agent): Policy enforcement sidecar

apiVersion: apps/v1
kind: Deployment
metadata:
name: opa
spec:
template:
spec:
containers:
- name: opa
image: openpolicyagent/opa:latest
args:
- "run"
- "--server"
- "--addr=0.0.0.0:8181"
- "/policies"
volumeMounts:
- name: policies
mountPath: /policies
volumes:
- name: policies
configMap:
name: opa-policies

Networking and Security​

Network Architecture​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Building Edge β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Management VLAN (10.0.1.0/24) β”‚ β”‚
β”‚ β”‚ - K3s control plane β”‚ β”‚
β”‚ β”‚ - SPIRE server β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Agent VLAN (10.0.10.0/24) β”‚ β”‚
β”‚ β”‚ - Security/Energy/Automation β”‚ β”‚
β”‚ β”‚ - NATS/MQTT β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ BMS VLAN (10.0.20.0/24) β”‚ β”‚
β”‚ β”‚ - EcoStruxure EBO β”‚ β”‚
β”‚ β”‚ - BACnet devices β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Security VLAN (10.0.30.0/24) β”‚ β”‚
β”‚ β”‚ - Security Expert β”‚ β”‚
β”‚ β”‚ - Avigilon β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Fire/Life Safety VLAN (Isolated) β”‚ β”‚
β”‚ β”‚ - Bosch fire panel (read-only) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”‚ WireGuard Tunnel (encrypted)
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Cloud Control β”‚
β”‚ Plane β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Firewall Rules​

# Default deny
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT DROP

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT

# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow K3s API (from management only)
iptables -A INPUT -p tcp --dport 6443 -s 10.0.1.0/24 -j ACCEPT

# Allow NATS (internal only)
iptables -A INPUT -p tcp --dport 4222 -s 10.0.10.0/24 -j ACCEPT

# Allow WireGuard to cloud
iptables -A OUTPUT -p udp --dport 51820 -d <cloud-ip> -j ACCEPT

# Deny all other outbound (explicit egress allowlist)
iptables -A OUTPUT -j LOG --log-prefix "EGRESS-DENY: "
iptables -A OUTPUT -j DROP

mTLS with SPIFFE​

All components authenticate via SPIFFE SVIDs:

# SPIRE Agent DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: spire-agent
namespace: spire
spec:
selector:
matchLabels:
app: spire-agent
template:
spec:
hostPID: true
containers:
- name: spire-agent
image: ghcr.io/spiffe/spire-agent:1.8.0
volumeMounts:
- name: spire-agent-socket
mountPath: /run/spire/sockets
volumes:
- name: spire-agent-socket
hostPath:
path: /run/spire/sockets
type: DirectoryOrCreate

Offline Autonomy​

Offline Operation Modes​

Normal Mode (Cloud Connected):

  • Full feature set
  • Cloud analytics and ML models
  • Cross-site coordination
  • Cloud-based approvals

Degraded Mode (Cloud Disconnected):

  • Local control loops continue
  • OPA policies enforced locally
  • NATS queues events for sync
  • Local approvals only (if configured)

Emergency Mode (Severe failure):

  • Life safety preserved
  • Emergency unlock/lock override
  • Minimal logging (local only)

Queue-Based Sync​

NATS JetStream queues events during disconnection:

// NATS stream config
stream := jetstream.StreamConfig{
Name: "citadel-events",
Subjects: []string{"telemetry.>", "control.>", "incidents.>"},
Storage: jetstream.FileStorage,
Retention: jetstream.WorkQueuePolicy,
MaxAge: 72 * time.Hour, // 72-hour retention
MaxBytes: 10 * 1024 * 1024 * 1024, // 10 GB
}

When cloud reconnects, queued events sync automatically.

Local Dashboards​

Kiosk UI available on local network:

http://citadel-edge.local:8080

Provides:

  • Current building state
  • Recent incidents
  • Manual override controls (with local approval)
  • System health status

Deployment Process​

GitOps with ArgoCD​

# ArgoCD Application for edge site
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: citadel-edge-building-a
namespace: argocd
spec:
project: citadel
source:
repoURL: https://github.com/org/citadel-config
targetRevision: main
path: sites/building-a
destination:
server: https://kubernetes.default.svc
namespace: citadel
syncPolicy:
automated:
prune: true
selfHeal: true

Signed Container Images​

All images signed with Sigstore:

# Sign image
cosign sign --key cosign.key citadel/security-agent:v1.0.0

# Verify in admission controller
cosign verify --key cosign.pub citadel/security-agent:v1.0.0

Configuration Management​

Helm charts + Kustomize for site-specific config:

# sites/building-a/kustomization.yaml
bases:
- ../../base

patchesStrategicMerge:
- agents-config.yaml
- adapters-config.yaml

configMapGenerator:
- name: site-config
literals:
- SITE_ID=building-a
- TIMEZONE=America/New_York
- NATS_URL=nats://nats.citadel.svc:4222

Monitoring and Diagnostics​

Health Checks​

# Agent health endpoint
@app.get("/health")
async def health():
return {
"status": "healthy",
"agent_id": config.agent_id,
"spiffe_id": svid.spiffe_id,
"nats_connected": event_bus.is_connected(),
"opa_available": await opa_client.ping(),
"uptime_seconds": time.time() - start_time
}

OpenTelemetry Collection​

# OTel Collector deployment
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
spec:
template:
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector:latest
volumeMounts:
- name: config
mountPath: /etc/otel
volumes:
- name: config
configMap:
name: otel-collector-config

See Also​