Skip to main content

Production Deployment Guide

Deploy CitadelMesh to production with K3s, Helm, and enterprise-grade reliability.

Deployment Architecture

K3s Cluster (Edge)
├── SPIRE (Zero-trust identity)
├── NATS JetStream (Event bus)
├── PostgreSQL (Persistence)
├── Redis (Caching)
├── OPA (Policy engine)
├── .NET Microservices
├── Python Agent Runtime
└── MCP Adapters

Prerequisites

  • K3s cluster (edge or cloud)
  • Helm 3.0+ installed
  • kubectl configured
  • Docker registry access
  • DNS configured for ingress

K3s Cluster Setup

1. Install K3s (Edge Deployment)

Master node:

# Install K3s with embedded etcd
curl -sfL https://get.k3s.io | sh -s - server \
--cluster-init \
--write-kubeconfig-mode 644 \
--disable traefik \
--disable servicelb

# Get token for worker nodes
sudo cat /var/lib/rancher/k3s/server/node-token

Worker nodes:

# Join cluster
curl -sfL https://get.k3s.io | K3S_URL=https://MASTER_IP:6443 \
K3S_TOKEN=NODE_TOKEN sh -

Verify:

kubectl get nodes
# Should show all nodes in Ready state

2. Install Core Services

Cert-Manager (for TLS):

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml

NGINX Ingress:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install nginx-ingress ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace

Longhorn (Distributed Storage):

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.5.0/deploy/longhorn.yaml

Deploy CitadelMesh with Helm

1. Create Helm Chart

File: helm/citadelmesh/Chart.yaml

apiVersion: v2
name: citadelmesh
description: Autonomous Multi-Agent Building Intelligence Platform
version: 1.0.0
appVersion: "1.0.0"

dependencies:
- name: postgresql
version: 12.x
repository: https://charts.bitnami.com/bitnami
- name: redis
version: 17.x
repository: https://charts.bitnami.com/bitnami
- name: nats
version: 1.x
repository: https://nats-io.github.io/k8s/helm/charts/

File: helm/citadelmesh/values.yaml

# Global settings
global:
domain: citadel.example.com
environment: production

# SPIRE Identity
spire:
enabled: true
server:
image: ghcr.io/spiffe/spire-server:1.9.6
replicas: 3
agent:
image: ghcr.io/spiffe/spire-agent:1.9.6

# PostgreSQL
postgresql:
enabled: true
auth:
username: citadel
database: citadel-db
existingSecret: citadel-postgres-secret
primary:
persistence:
size: 20Gi
storageClass: longhorn
replication:
enabled: true
replicas: 2

# Redis
redis:
enabled: true
architecture: replication
auth:
enabled: true
existingSecret: citadel-redis-secret
master:
persistence:
size: 5Gi
storageClass: longhorn
replica:
replicaCount: 2

# NATS JetStream
nats:
enabled: true
nats:
jetstream:
enabled: true
memoryStore:
enabled: true
size: 2Gi
fileStore:
enabled: true
size: 10Gi
storageClass: longhorn

# OPA Policy Engine
opa:
enabled: true
image: openpolicyagent/opa:latest-static
replicas: 3
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 512Mi
cpu: 500m

# Safety Service (.NET)
safety:
enabled: true
image: citadel/safety-service:1.0.0
replicas: 3
resources:
requests:
memory: 512Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 1000m
env:
- name: OPA_URL
value: http://opa:8181
- name: ASPNETCORE_ENVIRONMENT
value: Production

# Orchestrator Service (.NET)
orchestrator:
enabled: true
image: citadel/orchestrator:1.0.0
replicas: 3
resources:
requests:
memory: 512Mi
cpu: 250m

# Agent Runtime (Python)
agentRuntime:
enabled: true
image: citadel/agent-runtime:1.0.0
agents:
security:
enabled: true
replicas: 2
hvac:
enabled: true
replicas: 2
energy:
enabled: true
replicas: 1

# MCP Adapters
mcpAdapters:
ecostruxure:
enabled: true
image: citadel/ecostruxure-ebo:1.0.0
replicas: 2
avigilon:
enabled: true
image: citadel/avigilon:1.0.0
replicas: 2

# Observability
observability:
jaeger:
enabled: true
prometheus:
enabled: true
grafana:
enabled: true

2. Create Kubernetes Resources

File: helm/citadelmesh/templates/deployment-opa.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "citadelmesh.fullname" . }}-opa
labels:
app: opa
spec:
replicas: {{ .Values.opa.replicas }}
selector:
matchLabels:
app: opa
template:
metadata:
labels:
app: opa
spec:
containers:
- name: opa
image: {{ .Values.opa.image }}
args:
- "run"
- "--server"
- "--addr=0.0.0.0:8181"
- "/policies"
ports:
- containerPort: 8181
name: http
volumeMounts:
- name: policies
mountPath: /policies
readOnly: true
resources:
{{- toYaml .Values.opa.resources | nindent 12 }}
livenessProbe:
httpGet:
path: /health
port: 8181
initialDelaySeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8181
initialDelaySeconds: 5
volumes:
- name: policies
configMap:
name: {{ include "citadelmesh.fullname" . }}-policies

File: helm/citadelmesh/templates/configmap-policies.yaml

apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "citadelmesh.fullname" . }}-policies
data:
security.rego: |
{{ .Files.Get "policies/security.rego" | nindent 4 }}
energy.rego: |
{{ .Files.Get "policies/energy.rego" | nindent 4 }}
hvac-setpoint.rego: |
{{ .Files.Get "policies/hvac/setpoint_control.rego" | nindent 4 }}

3. Install Chart

# Add local chart repo
helm repo add citadelmesh ./helm

# Create namespace
kubectl create namespace citadelmesh

# Create secrets
kubectl create secret generic citadel-postgres-secret \
--from-literal=password=$(openssl rand -base64 32) \
-n citadelmesh

kubectl create secret generic citadel-redis-secret \
--from-literal=password=$(openssl rand -base64 32) \
-n citadelmesh

# Install chart
helm install citadelmesh ./helm/citadelmesh \
--namespace citadelmesh \
--values helm/citadelmesh/values-production.yaml \
--wait

Verify:

kubectl get pods -n citadelmesh
# All pods should be Running

kubectl get svc -n citadelmesh
# Services should be available

Secret Management

Using Sealed Secrets

# Install Sealed Secrets controller
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.24.0/controller.yaml

# Create sealed secret
kubectl create secret generic api-keys \
--from-literal=ecostruxure-key=YOUR_KEY \
--dry-run=client -o yaml | \
kubeseal -o yaml > sealed-secret.yaml

# Apply sealed secret
kubectl apply -f sealed-secret.yaml -n citadelmesh

Using External Secrets Operator

# Install ESO
helm install external-secrets \
external-secrets/external-secrets \
-n external-secrets-system \
--create-namespace

# Create SecretStore (AWS Secrets Manager example)
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secrets
namespace: citadelmesh
spec:
provider:
aws:
service: SecretsManager
region: us-west-2
auth:
jwt:
serviceAccountRef:
name: citadel-sa

# Create ExternalSecret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: citadel-api-keys
namespace: citadelmesh
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets
target:
name: api-keys
data:
- secretKey: ecostruxure-key
remoteRef:
key: citadel/ecostruxure
property: api_key

Monitoring and Alerting

Prometheus & Grafana

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.adminPassword=SECURE_PASSWORD

# Add CitadelMesh dashboards
kubectl apply -f monitoring/dashboards/ -n monitoring

Custom Metrics:

# ServiceMonitor for OPA
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: opa
namespace: citadelmesh
spec:
selector:
matchLabels:
app: opa
endpoints:
- port: http
path: /metrics

Alerting Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: citadelmesh-alerts
namespace: citadelmesh
spec:
groups:
- name: citadelmesh
rules:
- alert: OPAPolicyDenialRate
expr: |
rate(opa_policy_denied_total[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "High policy denial rate"
description: "OPA is denying >10 requests/min"

- alert: AgentDown
expr: |
up{job="citadel-agent"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Agent is down"
description: "Agent {{ $labels.agent_id }} is not responding"

Backup and Disaster Recovery

PostgreSQL Backup

apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-backup
namespace: citadelmesh
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:16
command:
- /bin/sh
- -c
- |
pg_dump -h $POSTGRES_HOST \
-U $POSTGRES_USER \
-d $POSTGRES_DB \
| gzip > /backup/citadel-$(date +%Y%m%d).sql.gz
# Upload to S3/Azure Blob/GCS
envFrom:
- secretRef:
name: postgres-credentials
volumeMounts:
- name: backup
mountPath: /backup
volumes:
- name: backup
persistentVolumeClaim:
claimName: backup-pvc
restartPolicy: OnFailure

Policy Backup

# Backup OPA policies
kubectl get configmap citadelmesh-policies \
-n citadelmesh \
-o yaml > policies-backup.yaml

# Backup to git
git add policies-backup.yaml
git commit -m "Backup policies $(date +%Y%m%d)"
git push origin main

High Availability Configuration

SPIRE HA

# SPIRE Server with 3 replicas and etcd
spire-server:
replicaCount: 3
dataStore:
sql:
databaseType: postgres
connectionString: "..."
upstreamAuthority:
disk:
certFilePath: /certs/ca.crt
keyFilePath: /certs/ca.key

NATS HA

nats:
cluster:
enabled: true
replicas: 3
jetstream:
fileStore:
enabled: true
size: 20Gi
storageClass: longhorn-3replicas

OPA HA with Cache

opa:
replicaCount: 3
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: opa
topologyKey: kubernetes.io/hostname

Rolling Updates

# Update agent image
helm upgrade citadelmesh ./helm/citadelmesh \
--namespace citadelmesh \
--set agentRuntime.image=citadel/agent-runtime:1.1.0 \
--reuse-values

# Monitor rollout
kubectl rollout status deployment/citadel-agent-runtime -n citadelmesh

# Rollback if needed
helm rollback citadelmesh -n citadelmesh

Security Hardening

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: opa-policy
namespace: citadelmesh
spec:
podSelector:
matchLabels:
app: opa
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: safety-service
- podSelector:
matchLabels:
app: mcp-adapter
ports:
- protocol: TCP
port: 8181

Pod Security Standards

apiVersion: v1
kind: Namespace
metadata:
name: citadelmesh
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted

Troubleshooting

Check Pod Logs

# OPA logs
kubectl logs -n citadelmesh -l app=opa --tail=100

# Agent logs
kubectl logs -n citadelmesh -l app=citadel-agent --tail=100

# Follow logs
kubectl logs -n citadelmesh -l app=opa -f

Debug Networking

# Test OPA from pod
kubectl run -it --rm debug \
--image=curlimages/curl \
--restart=Never \
-- curl http://opa.citadelmesh.svc:8181/health

# Check service endpoints
kubectl get endpoints -n citadelmesh

Performance Tuning

# Increase OPA resources
kubectl patch deployment opa -n citadelmesh \
--patch '{"spec":{"template":{"spec":{"containers":[{"name":"opa","resources":{"limits":{"memory":"1Gi","cpu":"1000m"}}}]}}}}'

# Scale agents
kubectl scale deployment citadel-agent-security \
--replicas=5 \
-n citadelmesh

Maintenance

Upgrade Checklist

  • Backup all data (PostgreSQL, policies, configs)
  • Test upgrade in staging environment
  • Review breaking changes
  • Update Helm values
  • Run helm upgrade with --dry-run first
  • Execute upgrade during maintenance window
  • Verify all pods running
  • Run smoke tests
  • Monitor metrics and logs

Health Checks

# Check all deployments
kubectl get deployments -n citadelmesh

# Check pod health
kubectl get pods -n citadelmesh \
-o custom-columns=NAME:.metadata.name,STATUS:.status.phase,RESTARTS:.status.containerStatuses[0].restartCount

# Run health check script
./scripts/health-check.sh

Next Steps


Production-ready CitadelMesh! 🚀 Deploy with confidence.