Skip to main content

Architecture Overview

CitadelMesh is a polyglot, zero-trust multi-agent platform that enables autonomous building operations while maintaining strict safety guarantees. This document provides a high-level view of the system design, component relationships, and key architectural decisions.

System Goals

CitadelMesh is designed around four core principles:

  1. Autonomy with Safety: Closed-loop control with hard constraints, OPA-enforced policies, and human approval gates for high-impact actions
  2. Interoperability: Heterogeneous building protocols normalized to canonical data models through vendor-agnostic abstractions
  3. Resilience: Edge-first deployment model with offline autonomy and graceful degradation
  4. Auditability: Deterministic agent execution with full replay capability and policy explanation traces

High-Level Architecture

graph TB
subgraph "Building Edge"
Agents[Agent Runtime<br/>LangGraph Agents]
Adapters[Protocol Adapters<br/>BACnet, OPC UA, REST]
Safety[Safety Layer<br/>OPA Policies]
EventBus[Event Bridge<br/>MQTT + NATS]
Store[Local Store<br/>TimescaleDB]

Agents --> Safety
Safety --> EventBus
Adapters --> EventBus
EventBus --> Store
end

subgraph "Vendor Systems"
SSE[Schneider Security Expert]
Avigilon[Avigilon CCTV]
EBO[EcoStruxure EBO]
HA[Home Assistant]
PME[Schneider PME]
end

subgraph "Cloud Control Plane"
Orchestrator[Agent Orchestrator]
Services[.NET Microservices<br/>Aspire + Orleans]
Twin[Digital Twin Service]
DataLake[Data Lake + Analytics]
Observability[OpenTelemetry<br/>Collector]
end

Adapters --> SSE
Adapters --> Avigilon
Adapters --> EBO
Adapters --> HA
Adapters --> PME

EventBus --> Orchestrator
EventBus --> Observability
Agents --> Services
Services --> Twin
EventBus --> DataLake

Logical Components

Edge Node (Per Building/Zone)

Each building or zone runs a self-contained edge deployment:

  • Agent Runtime: LangGraph-based agents packaged as containers, with optional WASM plugins
  • .NET Sidecars: Optional Dapr sidecars for pub/sub and bindings; Semantic Kernel-based skills for .NET integration
  • Protocol Adapters: Unified adapters for BACnet/SC, OPC UA, KNX, Modbus, MQTT, and Matter
  • Vendor Adapters: System-specific integrations for Schneider Security Expert, Avigilon, EcoStruxure (EBO), Home Assistant, Schneider PME, Bosch Fire/Intrusion
  • Event Bridge: MQTT broker + NATS JetStream for reliable event delivery
  • Safety Layer: OPA/Rego policy engine with rate limits and guardrails
  • Local Stores: TimescaleDB (or SQLite + litefs) for telemetry; object cache for video/files
  • UI Kiosk: Optional onsite control interface
  • Grid/DER I/O: OpenADR client, IEC 61850 gateway, IEEE 2030.5/SEP2 client, OCPP for EVSE

Cloud Control Plane

Centralized orchestration and analytics:

  • Orchestrator: Deploys agent graphs, manages versions, handles feature flags
  • .NET Services: Aspire-composed microservices for scheduling, alarms, and long-lived sessions; Orleans actors for stateful workloads
  • Knowledge Services: Vector store, RAG, digital twin graph database
  • Observability: OpenTelemetry collector aggregating traces, metrics, and logs
  • A2A/MCP Gateways: Cross-agent and tool interoperability layer
  • Data Lake + TSDB: Long-term storage for analytics and ML training

Agent Topology

CitadelMesh uses a multi-agent architecture where specialized agents collaborate:

graph LR
Security[Security Agent] --> Ops[Ops Agent]
Energy[Energy Agent] --> Ops
Automation[Automation Agent] --> Ops
Twin[Twin Agent] --> Security
Twin --> Energy
Twin --> Automation
DER[DER/Grid Agent] --> Energy
Compliance[Compliance Agent] --> Ops

Agent Responsibilities

  • Security Agent: Fuses camera analytics, access control, and intrusion sensors; executes security playbooks; escalates incidents
  • Energy Agent: Optimizes HVAC and lighting based on tariffs, weather, and occupancy forecasts; implements safe RL controllers
  • Automation Agent: Orchestrates scenes, schedules, and user intents; integrates with building management systems
  • Ops Agent: Incident triage with human-in-the-loop; report generation; SLA tracking
  • Twin Agent: Maintains digital twin state; reconciles vendor systems; emits derived KPIs
  • DER/Grid Agent: Orchestrates batteries, solar, generators, and EVSE; handles demand response events
  • Compliance Agent: Monitors controls against policies; produces compliance attestations

All agents communicate over the event bus using CloudEvents with signed JWTs and mTLS (SPIFFE identities). Cross-runtime RPC uses gRPC with protobuf contracts.

Data Flow

sequenceDiagram
participant Vendor as Vendor System
participant Adapter as Protocol Adapter
participant Bus as Event Bus
participant OPA as OPA Policy Engine
participant Agent as Agent
participant Twin as Digital Twin

Vendor->>Adapter: Telemetry/Event
Adapter->>Bus: CloudEvent(Protobuf)
Bus->>Twin: State Update
Bus->>Agent: Event Notification
Agent->>Agent: Process in LangGraph
Agent->>OPA: Validate Action
OPA-->>Agent: Allow/Deny + Reason
Agent->>Bus: Command(signed)
Bus->>Adapter: Execute Command
Adapter->>Vendor: Vendor-Specific Call
Vendor-->>Adapter: Result
Adapter->>Bus: CommandResult Event

Data Contract Standards

All data flowing through CitadelMesh follows strict contracts:

  • Canonical Telemetry: { entity_id, metric, value, unit, timestamp, quality, attributes }
  • Control Command: { id, target_id, action, params, ttl_seconds, safety_token, issued_by }
  • Incident: { id, severity, signals[], hypothesis, actions[], owner }

Contracts are serialized with protobuf, versioned schemas, and evolved via CI checks.

Key Architectural Decisions

Protocol-First Design

We prioritize durable, vendor-neutral protocols over framework lock-in:

  1. CloudEvents 1.0 for universal event envelopes
  2. Protobuf for compact, versioned payloads
  3. gRPC for low-latency cross-runtime RPC
  4. MCP (Model Context Protocol) for tool server interoperability

This approach ensures we can swap agent frameworks, languages, or vendors without rewriting integration layers.

Edge-First Deployment

Buildings must operate autonomously when cloud connectivity is degraded:

  • Local event brokers and databases continue operating
  • Queued commands reconcile on reconnection
  • Critical safety policies enforced at the edge
  • Minimal viable operations via local dashboards

Zero-Trust Security Model

Every component assumes hostile networks:

  • SPIFFE/SPIRE for workload identity with mTLS everywhere
  • Least privilege tokens scoped to specific capabilities
  • Policy enforcement at every boundary (OPA)
  • Audit trails for all tool calls and actuation events
  • Secret management via Vault or cloud KMS

Polyglot Runtime

Different workloads demand different languages:

  • Python/LangGraph for fast agent iteration and rich AI ecosystem
  • .NET/Aspire/Orleans for high-throughput stateful services
  • TypeScript for frontend and Node.js tooling
  • Dapr sidecars for language-agnostic building blocks

Cross-language RPC via gRPC keeps everything connected.

Safety-First Control

Autonomous operation requires multiple safety layers:

  1. Hard Constraints: OPA policies block invalid actions (e.g., temp bounds, egress locks)
  2. Shadow Mode: Evaluate new policies without actuation
  3. Multi-Step Approvals: Human gates for high-impact changes
  4. Circuit Breakers: Automatic rollback on anomaly detection
  5. Safety Tokens: Signed approvals required for critical actions

Performance Characteristics

  • Event Latency: P99 < 100ms for telemetry processing
  • Command Latency: P99 < 500ms for control commands
  • Throughput: 10,000+ events/sec per edge node
  • Storage: 90-day retention at edge; unlimited in cloud
  • Offline Autonomy: Minimum 72 hours of full operations

Scalability

CitadelMesh is designed to scale from single buildings to enterprise portfolios:

  • Horizontal Scaling: Add edge nodes per building/zone
  • Agent Scaling: Deploy multiple instances of the same agent type
  • Event Bus Scaling: NATS JetStream clustering for high throughput
  • Database Scaling: PostgreSQL read replicas and TimescaleDB compression
  • Cloud Scaling: Kubernetes HPA for .NET microservices
  • Multi-Tenancy: Isolated namespaces and policy boundaries per tenant

Deployment Model

  • Edge: K3s cluster on industrial PC; containers per agent; MQTT + NATS
  • Cloud: Managed Kafka/Redpanda, Postgres/TimescaleDB, S3-compatible storage, Kubernetes
  • CI/CD: GitOps (ArgoCD/Flux), signed container images (Sigstore), SBOMs

Observability Strategy

  • Traces: OpenTelemetry from all agents and adapters
  • Metrics: Prometheus-compatible metrics for SLIs/SLOs
  • Logs: Structured JSON logs shipped to centralized aggregation
  • Dashboards: Aspire dashboards for .NET services; Grafana for infrastructure
  • Replay: Deterministic LangGraph execution replay for incident analysis

See Also