Technology Stack
CitadelMesh is intentionally polyglot, choosing the best language and framework for each component. This document explains our technology choices, decision rationale, and when to use each stack.
Guiding principle: Every technology decision reinforces the pillars outlined in the Architecture Overviewβmulti-agent autonomy, edge-first zero trust, and verifiable automation. Use this page to map components back to those differentiators.
Stack Overviewβ
Python Stack (Agents)β
When to Use Pythonβ
Best for:
- Agent development (LangGraph, LlamaIndex, AutoGen)
- Rapid iteration and experimentation
- AI/ML integration (Transformers, PyTorch)
- Protocol adapters with vendor SDKs
Not ideal for:
- High-throughput stateful services
- Low-latency requirements (< 5ms)
- Large-scale parallel processing
Core Librariesβ
LangGraphβ
Purpose: Deterministic agent state machines
from langgraph.graph import StateGraph, END
from typing import TypedDict
class AgentState(TypedDict):
messages: List[str]
next_action: str
workflow = StateGraph(AgentState)
workflow.add_node("process", process_node)
workflow.add_node("decide", decide_node)
workflow.set_entry_point("process")
workflow.add_edge("process", "decide")
workflow.add_edge("decide", END)
graph = workflow.compile()
Why LangGraph:
- Deterministic execution (replayable)
- Built-in checkpointing
- Human-in-the-loop support
- Excellent OpenTelemetry integration
Pydanticβ
Purpose: Data validation and serialization
from pydantic import BaseModel, Field
class Command(BaseModel):
id: str = Field(..., min_length=26, max_length=26) # ULID
target_id: str
action: str
params: dict[str, str] = {}
priority: int = Field(default=2, ge=0, le=4)
# Automatic validation
command = Command(id=ulid(), target_id="door.lobby", action="unlock")
NATS.pyβ
Purpose: Event bus client
import nats
from nats.js import JetStreamContext
nc = await nats.connect("nats://nats.citadel.svc:4222")
js = nc.jetstream()
# Publish event
await js.publish("telemetry.canonical.building_a", event_data)
# Subscribe to stream
sub = await js.pull_subscribe("telemetry.>", "agent-consumer")
msgs = await sub.fetch(batch=10)
Python Environmentβ
# pyproject.toml
[project]
name = "citadel-agents"
version = "1.0.0"
requires-python = ">=3.11"
dependencies = [
"langgraph>=0.2",
"langchain>=0.3",
"pydantic>=2.0",
"nats-py>=2.7",
"grpcio>=1.60",
"protobuf>=5.0",
"opentelemetry-api>=1.22",
"opentelemetry-sdk>=1.22",
"structlog>=24.0"
]
[tool.uv]
dev-dependencies = [
"pytest>=8.0",
"pytest-asyncio>=0.23",
"ruff>=0.3"
]
.NET Stack (Services)β
When to Use .NETβ
Best for:
- High-throughput stateful services
- Long-lived actors (Orleans)
- Cloud-native microservices (Aspire)
- Low-latency requirements
Not ideal for:
- Quick prototyping
- AI/ML-heavy workloads
- Vendor SDKs only in Python/JS
Core Frameworksβ
.NET Aspireβ
Purpose: Cloud-native service composition and orchestration
// Program.cs
var builder = DistributedApplication.CreateBuilder(args);
// Add infrastructure
var cache = builder.AddRedis("cache");
var postgres = builder.AddPostgres("postgres");
var nats = builder.AddNats("nats");
// Add services
builder.AddProject<SchedulerService>("scheduler")
.WithReference(cache)
.WithReference(postgres)
.WithReference(nats);
builder.AddProject<AlarmService>("alarms")
.WithReference(postgres)
.WithReference(nats);
builder.Build().Run();
Why Aspire:
- Built-in service discovery
- Automatic health checks
- Integrated telemetry dashboard
- Local development experience
Orleansβ
Purpose: Stateful virtual actors for long-lived workflows
public interface IIncidentActor : IGrainWithStringKey
{
Task ReportIncident(IncidentReport report);
Task UpdateStatus(IncidentStatus status);
Task<IncidentState> GetState();
}
public class IncidentActor : Grain, IIncidentActor
{
private IncidentState state = new();
public async Task ReportIncident(IncidentReport report)
{
state.Severity = report.Severity;
state.ReportedAt = DateTime.UtcNow;
// Persist state
await WriteStateAsync();
// Publish event
await PublishIncidentEvent(report);
// Set reminder for follow-up
await RegisterOrUpdateReminder(
"followup",
TimeSpan.FromMinutes(15),
TimeSpan.FromMinutes(15)
);
}
public override async Task OnActivateAsync(CancellationToken cancellationToken)
{
// Restore state
await ReadStateAsync();
}
}
Why Orleans:
- Automatic state persistence
- Location transparency
- Built-in reminders/timers
- Elastic scalability
Daprβ
Purpose: Language-agnostic building blocks (pub/sub, state, bindings)
// Pub/Sub with Dapr
[Topic("pubsub", "telemetry.canonical")]
public async Task HandleTelemetry(CloudEvent<Point> cloudEvent)
{
var point = cloudEvent.Data;
logger.LogInformation(
"Telemetry received: {EntityId} = {Value}",
point.EntityId,
point.Value
);
await ProcessTelemetry(point);
}
// State store
var stateStore = "statestore";
await daprClient.SaveStateAsync(stateStore, "zone-state", zoneState);
var retrieved = await daprClient.GetStateAsync<ZoneState>(stateStore, "zone-state");
Why Dapr:
- Polyglot (Python, .NET, Node all use same APIs)
- Portable across clouds (abstracts Kafka, Azure Service Bus, etc.)
- Built-in retries, circuit breakers
Semantic Kernelβ
Purpose: .NET agent framework for AI integration
var kernel = Kernel.CreateBuilder()
.AddOpenAIChatCompletion(modelId, apiKey)
.Build();
// Add skills
kernel.ImportPluginFromObject(new HvacPlugin());
// Run agent
var result = await kernel.InvokePromptAsync(
"Optimize HVAC for building A based on current weather and occupancy"
);
.NET Project Structureβ
// CitadelMesh.sln
solution
βββ src/
β βββ CitadelMesh.Aspire/
β β βββ AppHost/ # Aspire composition
β βββ CitadelMesh.Services/
β β βββ Scheduler/ # Scheduler service
β β βββ Alarms/ # Alarm service
β β βββ Sessions/ # Session manager
β βββ CitadelMesh.Orleans/
β β βββ Grains/ # Orleans actors
β βββ CitadelMesh.Shared/
β βββ Contracts/ # Protobuf generated
β βββ Common/ # Shared utilities
βββ tests/
βββ CitadelMesh.Tests/
TypeScript Stack (Frontend & Tooling)β
When to Use TypeScriptβ
Best for:
- Frontend web applications
- Real-time dashboards
- Developer tooling
- MCP servers
Core Frameworksβ
React + Next.jsβ
// Building dashboard component
'use client';
import { useEffect, useState } from 'react';
import { useStreamingTelemetry } from '@/hooks/useTelemetry';
export function ZoneStatusCard({ zoneId }: { zoneId: string }) {
const telemetry = useStreamingTelemetry(zoneId);
return (
<div className="p-4 border rounded-lg">
<h3 className="text-lg font-semibold">{zoneId}</h3>
<div className="mt-2">
<span>Temperature: {telemetry.temp}Β°F</span>
<span>Setpoint: {telemetry.setpoint}Β°F</span>
<span>Occupancy: {telemetry.occupied ? 'Yes' : 'No'}</span>
</div>
</div>
);
}
gRPC-Webβ
// gRPC client for Twin Service
import { TwinServiceClient } from '@/proto/twin_grpc_web_pb';
import { GetEntityRequest } from '@/proto/twin_pb';
const client = new TwinServiceClient('https://twin.citadel.io');
async function getZoneState(zoneId: string) {
const request = new GetEntityRequest();
request.setEntityId(zoneId);
const response = await client.getEntity(request, {});
return response.toObject();
}
MCP Server (TypeScript)β
import { MCPServer, Tool } from '@modelcontextprotocol/sdk';
class BACnetMCPServer extends MCPServer {
async getTools(): Promise<Tool[]> {
return [
{
name: 'bacnet_read_point',
description: 'Read BACnet point value',
inputSchema: {
type: 'object',
properties: {
point_id: { type: 'string' }
}
}
}
];
}
async callTool(name: string, args: any): Promise<any> {
if (name === 'bacnet_read_point') {
return await this.bacnetClient.readPoint(args.point_id);
}
}
}
Infrastructure Choicesβ
Kubernetes Distribution: K3sβ
Why K3s:
- Lightweight (< 512 MB memory)
- Single binary deployment
- Fully compatible with K8s APIs
- Built-in components (Traefik, local storage)
- Perfect for edge deployments
Alternative: Full Kubernetes (AKS, EKS, GKE) for large-scale cloud
Message Broker: NATS JetStreamβ
Why NATS:
- Low latency (microseconds)
- Small footprint (< 20 MB)
- JetStream persistence
- Built-in clustering
- MQTT compatibility
Alternative: Kafka/Redpanda for cloud high-throughput scenarios
Time-Series Database: TimescaleDBβ
Why TimescaleDB:
- PostgreSQL-compatible (familiar SQL)
- Automatic partitioning
- Continuous aggregates
- Compression (10x space savings)
- Mature ecosystem
Alternative: InfluxDB, QuestDB for pure time-series workloads
Identity: SPIFFE/SPIREβ
Why SPIFFE:
- Zero-trust native
- Automatic key rotation
- No shared secrets
- Industry standard
- Multi-platform support
No viable alternative for workload identity at this scale
Cross-Language Integrationβ
Protobuf for Allβ
# Generate for all languages
buf generate
# Outputs:
src/proto_gen/python/citadel/v1/*.py
src/proto_gen/csharp/Citadel.V1/*.cs
src/proto_gen/typescript/citadel/v1/*.ts
gRPC Everywhereβ
# Python client calling .NET service
channel = grpc.secure_channel("twin-service:8443", credentials)
client = TwinServiceStub(channel)
entity = await client.GetEntity(request)
// .NET calling Python agent
var channel = GrpcChannel.ForAddress("http://energy-agent:5000");
var client = new AgentService.AgentServiceClient(channel);
var result = await client.OptimizeAsync(request);
Development Toolsβ
Build Toolsβ
- Python:
uv(fast package manager) - .NET:
dotnetCLI - TypeScript:
pnpm(fast, efficient) - Protobuf:
buf(linting, breaking change detection)
Testingβ
# pytest for Python
pytest tests/ -v --cov=src
# xUnit for .NET
dotnet test --logger "console;verbosity=detailed"
# Vitest for TypeScript
pnpm test
Related Documentationβ
- Protocol Strategy - Cross-language protocols
- Agent Topology - Python agent implementation
- Cloud Integration - .NET services