Skip to main content

Edge Architecture and Deployment

CitadelMesh is edge-first: each building operates autonomously with local compute, storage, and policy enforcement. This document describes the edge deployment model, offline autonomy, and fail-safe design.

Edge-First Philosophy

Buildings Must Work Offline: Cloud connectivity enhances capabilities but is not required for core operations.

Design Principles:

  1. Local Autonomy: Critical control loops run entirely at edge
  2. Cloud Sync: Cloud provides orchestration, analytics, long-term storage
  3. Graceful Degradation: Loss of cloud connectivity reduces features, not safety
  4. Security Isolation: Edge enforces zero-trust even within building network

Edge Hardware Profile

Industrial PC Specification

Recommended Spec (per building/large zone):

  • CPU: 4-8 cores (Intel i5/i7 or AMD Ryzen)
  • RAM: 16-32 GB
  • Storage: 512 GB - 1 TB NVMe SSD (RAID 1 for redundancy)
  • Network: Dual Gigabit Ethernet (redundant links)
  • Optional: GPU/NPU for video analytics
  • Security: TPM 2.0 for secure boot and key storage
  • Power: UPS-backed with 4-hour runtime

Example Products:

  • Dell Edge Gateway 5200
  • HP t740 Thin Client
  • Advantech ARK-1123 Series
  • Custom build with Ubuntu LTS

Edge Software Stack

graph TB
subgraph "Edge Node"
subgraph "Orchestration Layer"
K3s[K3s Kubernetes]
end

subgraph "Agent Layer"
Security[Security Agent Pod]
Energy[Energy Agent Pod]
Automation[Automation Agent Pod]
end

subgraph "Adapter Layer"
SSE[Security Expert Adapter]
Avigilon[Avigilon Adapter]
EBO[EBO Adapter]
HA[Home Assistant Adapter]
end

subgraph "Messaging Layer"
MQTT[Mosquitto MQTT]
NATS[NATS JetStream]
end

subgraph "Storage Layer"
TSDB[TimescaleDB]
Cache[File Cache]
end

subgraph "Security Layer"
SPIRE[SPIRE Agent]
OPA[OPA Engine]
end

subgraph "Observability Layer"
OTel[OTel Collector]
end
end

K3s --> Security
K3s --> Energy
K3s --> Automation
K3s --> SSE
K3s --> Avigilon
K3s --> EBO
K3s --> HA

Security --> NATS
Energy --> NATS
Automation --> NATS
SSE --> NATS
Avigilon --> NATS
EBO --> NATS
HA --> NATS

NATS --> TSDB
Avigilon --> Cache

Security --> SPIRE
Energy --> SPIRE
Security --> OPA
Energy --> OPA

Security --> OTel
Energy --> OTel

Operating System

Ubuntu LTS Core (22.04 LTS or newer)

  • Minimal attack surface
  • Long-term support (5 years)
  • Secure boot with TPM
  • Automated security patching
  • CIS hardened baseline

Container Orchestration

K3s (lightweight Kubernetes)

Why K3s over full Kubernetes:

  • Lightweight: 40 MB binary, low memory footprint
  • Edge-Optimized: Runs on resource-constrained hardware
  • Single Binary: Easy deployment and updates
  • Compatible: Same APIs as full Kubernetes (AKS/EKS compatible)
  • Embedded Components: Built-in ingress, storage, service LB
# Install K3s
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644

# Verify installation
kubectl get nodes

Messaging Infrastructure

MQTT (Mosquitto): Low-latency pub/sub for device telemetry

# mosquitto.conf
listener 1883
protocol mqtt

listener 8883
protocol mqtt
cafile /etc/mosquitto/ca.crt
certfile /etc/mosquitto/server.crt
keyfile /etc/mosquitto/server.key
require_certificate true # mTLS

NATS JetStream: Reliable event streaming with persistence

# NATS deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nats
spec:
serviceName: nats
replicas: 1
selector:
matchLabels:
app: nats
template:
spec:
containers:
- name: nats
image: nats:2.10-alpine
args:
- "-c"
- "/etc/nats/nats-server.conf"
- "-js" # Enable JetStream
volumeMounts:
- name: config
mountPath: /etc/nats
- name: data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi

Storage

TimescaleDB: Time-series telemetry and metrics

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: timescaledb
spec:
serviceName: timescaledb
replicas: 1
template:
spec:
containers:
- name: timescaledb
image: timescale/timescaledb:latest-pg15
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi # 90-day retention

Object Cache: Local file storage for video, images

# Use hostPath or local PV for video cache
mkdir -p /var/lib/citadel/cache

Policy Engine

OPA (Open Policy Agent): Policy enforcement sidecar

apiVersion: apps/v1
kind: Deployment
metadata:
name: opa
spec:
template:
spec:
containers:
- name: opa
image: openpolicyagent/opa:latest
args:
- "run"
- "--server"
- "--addr=0.0.0.0:8181"
- "/policies"
volumeMounts:
- name: policies
mountPath: /policies
volumes:
- name: policies
configMap:
name: opa-policies

Networking and Security

Network Architecture

┌─────────────────────────────────────────────┐
│ Building Edge │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Management VLAN (10.0.1.0/24) │ │
│ │ - K3s control plane │ │
│ │ - SPIRE server │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Agent VLAN (10.0.10.0/24) │ │
│ │ - Security/Energy/Automation │ │
│ │ - NATS/MQTT │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ BMS VLAN (10.0.20.0/24) │ │
│ │ - EcoStruxure EBO │ │
│ │ - BACnet devices │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Security VLAN (10.0.30.0/24) │ │
│ │ - Security Expert │ │
│ │ - Avigilon │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Fire/Life Safety VLAN (Isolated) │ │
│ │ - Bosch fire panel (read-only) │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────────┘

│ WireGuard Tunnel (encrypted)

┌────────────────┐
│ Cloud Control │
│ Plane │
└────────────────┘

Firewall Rules

# Default deny
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT DROP

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT

# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow K3s API (from management only)
iptables -A INPUT -p tcp --dport 6443 -s 10.0.1.0/24 -j ACCEPT

# Allow NATS (internal only)
iptables -A INPUT -p tcp --dport 4222 -s 10.0.10.0/24 -j ACCEPT

# Allow WireGuard to cloud
iptables -A OUTPUT -p udp --dport 51820 -d <cloud-ip> -j ACCEPT

# Deny all other outbound (explicit egress allowlist)
iptables -A OUTPUT -j LOG --log-prefix "EGRESS-DENY: "
iptables -A OUTPUT -j DROP

mTLS with SPIFFE

All components authenticate via SPIFFE SVIDs:

# SPIRE Agent DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: spire-agent
namespace: spire
spec:
selector:
matchLabels:
app: spire-agent
template:
spec:
hostPID: true
containers:
- name: spire-agent
image: ghcr.io/spiffe/spire-agent:1.8.0
volumeMounts:
- name: spire-agent-socket
mountPath: /run/spire/sockets
volumes:
- name: spire-agent-socket
hostPath:
path: /run/spire/sockets
type: DirectoryOrCreate

Offline Autonomy

Offline Operation Modes

Normal Mode (Cloud Connected):

  • Full feature set
  • Cloud analytics and ML models
  • Cross-site coordination
  • Cloud-based approvals

Degraded Mode (Cloud Disconnected):

  • Local control loops continue
  • OPA policies enforced locally
  • NATS queues events for sync
  • Local approvals only (if configured)

Emergency Mode (Severe failure):

  • Life safety preserved
  • Emergency unlock/lock override
  • Minimal logging (local only)

Queue-Based Sync

NATS JetStream queues events during disconnection:

// NATS stream config
stream := jetstream.StreamConfig{
Name: "citadel-events",
Subjects: []string{"telemetry.>", "control.>", "incidents.>"},
Storage: jetstream.FileStorage,
Retention: jetstream.WorkQueuePolicy,
MaxAge: 72 * time.Hour, // 72-hour retention
MaxBytes: 10 * 1024 * 1024 * 1024, // 10 GB
}

When cloud reconnects, queued events sync automatically.

Local Dashboards

Kiosk UI available on local network:

http://citadel-edge.local:8080

Provides:

  • Current building state
  • Recent incidents
  • Manual override controls (with local approval)
  • System health status

Deployment Process

GitOps with ArgoCD

# ArgoCD Application for edge site
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: citadel-edge-building-a
namespace: argocd
spec:
project: citadel
source:
repoURL: https://github.com/org/citadel-config
targetRevision: main
path: sites/building-a
destination:
server: https://kubernetes.default.svc
namespace: citadel
syncPolicy:
automated:
prune: true
selfHeal: true

Signed Container Images

All images signed with Sigstore:

# Sign image
cosign sign --key cosign.key citadel/security-agent:v1.0.0

# Verify in admission controller
cosign verify --key cosign.pub citadel/security-agent:v1.0.0

Configuration Management

Helm charts + Kustomize for site-specific config:

# sites/building-a/kustomization.yaml
bases:
- ../../base

patchesStrategicMerge:
- agents-config.yaml
- adapters-config.yaml

configMapGenerator:
- name: site-config
literals:
- SITE_ID=building-a
- TIMEZONE=America/New_York
- NATS_URL=nats://nats.citadel.svc:4222

Monitoring and Diagnostics

Health Checks

# Agent health endpoint
@app.get("/health")
async def health():
return {
"status": "healthy",
"agent_id": config.agent_id,
"spiffe_id": svid.spiffe_id,
"nats_connected": event_bus.is_connected(),
"opa_available": await opa_client.ping(),
"uptime_seconds": time.time() - start_time
}

OpenTelemetry Collection

# OTel Collector deployment
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
spec:
template:
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector:latest
volumeMounts:
- name: config
mountPath: /etc/otel
volumes:
- name: config
configMap:
name: otel-collector-config

See Also