Edge Architecture and Deployment
CitadelMesh is edge-first: each building operates autonomously with local compute, storage, and policy enforcement. This document describes the edge deployment model, offline autonomy, and fail-safe design.
Edge-First Philosophy
Buildings Must Work Offline: Cloud connectivity enhances capabilities but is not required for core operations.
Design Principles:
- Local Autonomy: Critical control loops run entirely at edge
- Cloud Sync: Cloud provides orchestration, analytics, long-term storage
- Graceful Degradation: Loss of cloud connectivity reduces features, not safety
- Security Isolation: Edge enforces zero-trust even within building network
Edge Hardware Profile
Industrial PC Specification
Recommended Spec (per building/large zone):
- CPU: 4-8 cores (Intel i5/i7 or AMD Ryzen)
- RAM: 16-32 GB
- Storage: 512 GB - 1 TB NVMe SSD (RAID 1 for redundancy)
- Network: Dual Gigabit Ethernet (redundant links)
- Optional: GPU/NPU for video analytics
- Security: TPM 2.0 for secure boot and key storage
- Power: UPS-backed with 4-hour runtime
Example Products:
- Dell Edge Gateway 5200
- HP t740 Thin Client
- Advantech ARK-1123 Series
- Custom build with Ubuntu LTS
Edge Software Stack
graph TB
subgraph "Edge Node"
subgraph "Orchestration Layer"
K3s[K3s Kubernetes]
end
subgraph "Agent Layer"
Security[Security Agent Pod]
Energy[Energy Agent Pod]
Automation[Automation Agent Pod]
end
subgraph "Adapter Layer"
SSE[Security Expert Adapter]
Avigilon[Avigilon Adapter]
EBO[EBO Adapter]
HA[Home Assistant Adapter]
end
subgraph "Messaging Layer"
MQTT[Mosquitto MQTT]
NATS[NATS JetStream]
end
subgraph "Storage Layer"
TSDB[TimescaleDB]
Cache[File Cache]
end
subgraph "Security Layer"
SPIRE[SPIRE Agent]
OPA[OPA Engine]
end
subgraph "Observability Layer"
OTel[OTel Collector]
end
end
K3s --> Security
K3s --> Energy
K3s --> Automation
K3s --> SSE
K3s --> Avigilon
K3s --> EBO
K3s --> HA
Security --> NATS
Energy --> NATS
Automation --> NATS
SSE --> NATS
Avigilon --> NATS
EBO --> NATS
HA --> NATS
NATS --> TSDB
Avigilon --> Cache
Security --> SPIRE
Energy --> SPIRE
Security --> OPA
Energy --> OPA
Security --> OTel
Energy --> OTel
Operating System
Ubuntu LTS Core (22.04 LTS or newer)
- Minimal attack surface
- Long-term support (5 years)
- Secure boot with TPM
- Automated security patching
- CIS hardened baseline
Container Orchestration
K3s (lightweight Kubernetes)
Why K3s over full Kubernetes:
- Lightweight: 40 MB binary, low memory footprint
- Edge-Optimized: Runs on resource-constrained hardware
- Single Binary: Easy deployment and updates
- Compatible: Same APIs as full Kubernetes (AKS/EKS compatible)
- Embedded Components: Built-in ingress, storage, service LB
# Install K3s
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644
# Verify installation
kubectl get nodes
Messaging Infrastructure
MQTT (Mosquitto): Low-latency pub/sub for device telemetry
# mosquitto.conf
listener 1883
protocol mqtt
listener 8883
protocol mqtt
cafile /etc/mosquitto/ca.crt
certfile /etc/mosquitto/server.crt
keyfile /etc/mosquitto/server.key
require_certificate true # mTLS
NATS JetStream: Reliable event streaming with persistence
# NATS deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nats
spec:
serviceName: nats
replicas: 1
selector:
matchLabels:
app: nats
template:
spec:
containers:
- name: nats
image: nats:2.10-alpine
args:
- "-c"
- "/etc/nats/nats-server.conf"
- "-js" # Enable JetStream
volumeMounts:
- name: config
mountPath: /etc/nats
- name: data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
Storage
TimescaleDB: Time-series telemetry and metrics
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: timescaledb
spec:
serviceName: timescaledb
replicas: 1
template:
spec:
containers:
- name: timescaledb
image: timescale/timescaledb:latest-pg15
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi # 90-day retention
Object Cache: Local file storage for video, images
# Use hostPath or local PV for video cache
mkdir -p /var/lib/citadel/cache
Policy Engine
OPA (Open Policy Agent): Policy enforcement sidecar
apiVersion: apps/v1
kind: Deployment
metadata:
name: opa
spec:
template:
spec:
containers:
- name: opa
image: openpolicyagent/opa:latest
args:
- "run"
- "--server"
- "--addr=0.0.0.0:8181"
- "/policies"
volumeMounts:
- name: policies
mountPath: /policies
volumes:
- name: policies
configMap:
name: opa-policies
Networking and Security
Network Architecture
┌─────────────────────────────────────────────┐
│ Building Edge │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Management VLAN (10.0.1.0/24) │ │
│ │ - K3s control plane │ │
│ │ - SPIRE server │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Agent VLAN (10.0.10.0/24) │ │
│ │ - Security/Energy/Automation │ │
│ │ - NATS/MQTT │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ BMS VLAN (10.0.20.0/24) │ │
│ │ - EcoStruxure EBO │ │
│ │ - BACnet devices │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Security VLAN (10.0.30.0/24) │ │
│ │ - Security Expert │ │
│ │ - Avigilon │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Fire/Life Safety VLAN (Isolated) │ │
│ │ - Bosch fire panel (read-only) │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
│
│ WireGuard Tunnel (encrypted)
▼
┌────────────────┐
│ Cloud Control │
│ Plane │
└────────────────┘
Firewall Rules
# Default deny
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT DROP
# Allow loopback
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Allow K3s API (from management only)
iptables -A INPUT -p tcp --dport 6443 -s 10.0.1.0/24 -j ACCEPT
# Allow NATS (internal only)
iptables -A INPUT -p tcp --dport 4222 -s 10.0.10.0/24 -j ACCEPT
# Allow WireGuard to cloud
iptables -A OUTPUT -p udp --dport 51820 -d <cloud-ip> -j ACCEPT
# Deny all other outbound (explicit egress allowlist)
iptables -A OUTPUT -j LOG --log-prefix "EGRESS-DENY: "
iptables -A OUTPUT -j DROP
mTLS with SPIFFE
All components authenticate via SPIFFE SVIDs:
# SPIRE Agent DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: spire-agent
namespace: spire
spec:
selector:
matchLabels:
app: spire-agent
template:
spec:
hostPID: true
containers:
- name: spire-agent
image: ghcr.io/spiffe/spire-agent:1.8.0
volumeMounts:
- name: spire-agent-socket
mountPath: /run/spire/sockets
volumes:
- name: spire-agent-socket
hostPath:
path: /run/spire/sockets
type: DirectoryOrCreate
Offline Autonomy
Offline Operation Modes
Normal Mode (Cloud Connected):
- Full feature set
- Cloud analytics and ML models
- Cross-site coordination
- Cloud-based approvals
Degraded Mode (Cloud Disconnected):
- Local control loops continue
- OPA policies enforced locally
- NATS queues events for sync
- Local approvals only (if configured)
Emergency Mode (Severe failure):
- Life safety preserved
- Emergency unlock/lock override
- Minimal logging (local only)
Queue-Based Sync
NATS JetStream queues events during disconnection:
// NATS stream config
stream := jetstream.StreamConfig{
Name: "citadel-events",
Subjects: []string{"telemetry.>", "control.>", "incidents.>"},
Storage: jetstream.FileStorage,
Retention: jetstream.WorkQueuePolicy,
MaxAge: 72 * time.Hour, // 72-hour retention
MaxBytes: 10 * 1024 * 1024 * 1024, // 10 GB
}
When cloud reconnects, queued events sync automatically.
Local Dashboards
Kiosk UI available on local network:
http://citadel-edge.local:8080
Provides:
- Current building state
- Recent incidents
- Manual override controls (with local approval)
- System health status
Deployment Process
GitOps with ArgoCD
# ArgoCD Application for edge site
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: citadel-edge-building-a
namespace: argocd
spec:
project: citadel
source:
repoURL: https://github.com/org/citadel-config
targetRevision: main
path: sites/building-a
destination:
server: https://kubernetes.default.svc
namespace: citadel
syncPolicy:
automated:
prune: true
selfHeal: true
Signed Container Images
All images signed with Sigstore:
# Sign image
cosign sign --key cosign.key citadel/security-agent:v1.0.0
# Verify in admission controller
cosign verify --key cosign.pub citadel/security-agent:v1.0.0
Configuration Management
Helm charts + Kustomize for site-specific config:
# sites/building-a/kustomization.yaml
bases:
- ../../base
patchesStrategicMerge:
- agents-config.yaml
- adapters-config.yaml
configMapGenerator:
- name: site-config
literals:
- SITE_ID=building-a
- TIMEZONE=America/New_York
- NATS_URL=nats://nats.citadel.svc:4222
Monitoring and Diagnostics
Health Checks
# Agent health endpoint
@app.get("/health")
async def health():
return {
"status": "healthy",
"agent_id": config.agent_id,
"spiffe_id": svid.spiffe_id,
"nats_connected": event_bus.is_connected(),
"opa_available": await opa_client.ping(),
"uptime_seconds": time.time() - start_time
}
OpenTelemetry Collection
# OTel Collector deployment
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
spec:
template:
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector:latest
volumeMounts:
- name: config
mountPath: /etc/otel
volumes:
- name: config
configMap:
name: otel-collector-config
Related Documentation
- Overview - Edge in overall architecture
- Cloud Integration - Edge-to-cloud sync
- Observability - Edge monitoring
- Identity Foundation - SPIFFE at edge