Technology Stack
CitadelMesh is intentionally polyglot, choosing the best language and framework for each component. This document explains our technology choices, decision rationale, and when to use each stack.
Stack Overview
graph TB
subgraph "Agent Layer (Python)"
LangGraph[LangGraph]
LangChain[LangChain]
Pydantic[Pydantic]
end
subgraph "Services Layer (.NET)"
Aspire[.NET Aspire]
Orleans[Orleans]
Dapr[Dapr]
SK[Semantic Kernel]
end
subgraph "Frontend (TypeScript)"
React[React]
Next[Next.js]
TailwindCSS[TailwindCSS]
end
subgraph "Infrastructure"
K3s[K3s]
NATS[NATS]
Postgres[PostgreSQL]
Timescale[TimescaleDB]
end
subgraph "Protocols"
Protobuf[Protobuf]
gRPC[gRPC]
CloudEvents[CloudEvents]
end
LangGraph --> Protobuf
Orleans --> Protobuf
React --> gRPC
LangGraph --> NATS
Orleans --> NATS
Python Stack (Agents)
When to Use Python
Best for:
- Agent development (LangGraph, LlamaIndex, AutoGen)
- Rapid iteration and experimentation
- AI/ML integration (Transformers, PyTorch)
- Protocol adapters with vendor SDKs
Not ideal for:
- High-throughput stateful services
- Low-latency requirements (< 5ms)
- Large-scale parallel processing
Core Libraries
LangGraph
Purpose: Deterministic agent state machines
from langgraph.graph import StateGraph, END
from typing import TypedDict
class AgentState(TypedDict):
messages: List[str]
next_action: str
workflow = StateGraph(AgentState)
workflow.add_node("process", process_node)
workflow.add_node("decide", decide_node)
workflow.set_entry_point("process")
workflow.add_edge("process", "decide")
workflow.add_edge("decide", END)
graph = workflow.compile()
Why LangGraph:
- Deterministic execution (replayable)
- Built-in checkpointing
- Human-in-the-loop support
- Excellent OpenTelemetry integration
Pydantic
Purpose: Data validation and serialization
from pydantic import BaseModel, Field
class Command(BaseModel):
id: str = Field(..., min_length=26, max_length=26) # ULID
target_id: str
action: str
params: dict[str, str] = {}
priority: int = Field(default=2, ge=0, le=4)
# Automatic validation
command = Command(id=ulid(), target_id="door.lobby", action="unlock")
NATS.py
Purpose: Event bus client
import nats
from nats.js import JetStreamContext
nc = await nats.connect("nats://nats.citadel.svc:4222")
js = nc.jetstream()
# Publish event
await js.publish("telemetry.canonical.building_a", event_data)
# Subscribe to stream
sub = await js.pull_subscribe("telemetry.>", "agent-consumer")
msgs = await sub.fetch(batch=10)
Python Environment
# pyproject.toml
[project]
name = "citadel-agents"
version = "1.0.0"
requires-python = ">=3.11"
dependencies = [
"langgraph>=0.2",
"langchain>=0.3",
"pydantic>=2.0",
"nats-py>=2.7",
"grpcio>=1.60",
"protobuf>=5.0",
"opentelemetry-api>=1.22",
"opentelemetry-sdk>=1.22",
"structlog>=24.0"
]
[tool.uv]
dev-dependencies = [
"pytest>=8.0",
"pytest-asyncio>=0.23",
"ruff>=0.3"
]
.NET Stack (Services)
When to Use .NET
Best for:
- High-throughput stateful services
- Long-lived actors (Orleans)
- Cloud-native microservices (Aspire)
- Low-latency requirements
Not ideal for:
- Quick prototyping
- AI/ML-heavy workloads
- Vendor SDKs only in Python/JS
Core Frameworks
.NET Aspire
Purpose: Cloud-native service composition and orchestration
// Program.cs
var builder = DistributedApplication.CreateBuilder(args);
// Add infrastructure
var cache = builder.AddRedis("cache");
var postgres = builder.AddPostgres("postgres");
var nats = builder.AddNats("nats");
// Add services
builder.AddProject<SchedulerService>("scheduler")
.WithReference(cache)
.WithReference(postgres)
.WithReference(nats);
builder.AddProject<AlarmService>("alarms")
.WithReference(postgres)
.WithReference(nats);
builder.Build().Run();
Why Aspire:
- Built-in service discovery
- Automatic health checks
- Integrated telemetry dashboard
- Local development experience
Orleans
Purpose: Stateful virtual actors for long-lived workflows
public interface IIncidentActor : IGrainWithStringKey
{
Task ReportIncident(IncidentReport report);
Task UpdateStatus(IncidentStatus status);
Task<IncidentState> GetState();
}
public class IncidentActor : Grain, IIncidentActor
{
private IncidentState state = new();
public async Task ReportIncident(IncidentReport report)
{
state.Severity = report.Severity;
state.ReportedAt = DateTime.UtcNow;
// Persist state
await WriteStateAsync();
// Publish event
await PublishIncidentEvent(report);
// Set reminder for follow-up
await RegisterOrUpdateReminder(
"followup",
TimeSpan.FromMinutes(15),
TimeSpan.FromMinutes(15)
);
}
public override async Task OnActivateAsync(CancellationToken cancellationToken)
{
// Restore state
await ReadStateAsync();
}
}
Why Orleans:
- Automatic state persistence
- Location transparency
- Built-in reminders/timers
- Elastic scalability
Dapr
Purpose: Language-agnostic building blocks (pub/sub, state, bindings)
// Pub/Sub with Dapr
[Topic("pubsub", "telemetry.canonical")]
public async Task HandleTelemetry(CloudEvent<Point> cloudEvent)
{
var point = cloudEvent.Data;
logger.LogInformation(
"Telemetry received: {EntityId} = {Value}",
point.EntityId,
point.Value
);
await ProcessTelemetry(point);
}
// State store
var stateStore = "statestore";
await daprClient.SaveStateAsync(stateStore, "zone-state", zoneState);
var retrieved = await daprClient.GetStateAsync<ZoneState>(stateStore, "zone-state");
Why Dapr:
- Polyglot (Python, .NET, Node all use same APIs)
- Portable across clouds (abstracts Kafka, Azure Service Bus, etc.)
- Built-in retries, circuit breakers
Semantic Kernel
Purpose: .NET agent framework for AI integration
var kernel = Kernel.CreateBuilder()
.AddOpenAIChatCompletion(modelId, apiKey)
.Build();
// Add skills
kernel.ImportPluginFromObject(new HvacPlugin());
// Run agent
var result = await kernel.InvokePromptAsync(
"Optimize HVAC for building A based on current weather and occupancy"
);
.NET Project Structure
// CitadelMesh.sln
solution
├── src/
│ ├── CitadelMesh.Aspire/
│ │ └── AppHost/ # Aspire composition
│ ├── CitadelMesh.Services/
│ │ ├── Scheduler/ # Scheduler service
│ │ ├── Alarms/ # Alarm service
│ │ └── Sessions/ # Session manager
│ ├── CitadelMesh.Orleans/
│ │ └── Grains/ # Orleans actors
│ └── CitadelMesh.Shared/
│ ├── Contracts/ # Protobuf generated
│ └── Common/ # Shared utilities
└── tests/
└── CitadelMesh.Tests/
TypeScript Stack (Frontend & Tooling)
When to Use TypeScript
Best for:
- Frontend web applications
- Real-time dashboards
- Developer tooling
- MCP servers
Core Frameworks
React + Next.js
// Building dashboard component
'use client';
import { useEffect, useState } from 'react';
import { useStreamingTelemetry } from '@/hooks/useTelemetry';
export function ZoneStatusCard({ zoneId }: { zoneId: string }) {
const telemetry = useStreamingTelemetry(zoneId);
return (
<div className="p-4 border rounded-lg">
<h3 className="text-lg font-semibold">{zoneId}</h3>
<div className="mt-2">
<span>Temperature: {telemetry.temp}°F</span>
<span>Setpoint: {telemetry.setpoint}°F</span>
<span>Occupancy: {telemetry.occupied ? 'Yes' : 'No'}</span>
</div>
</div>
);
}
gRPC-Web
// gRPC client for Twin Service
import { TwinServiceClient } from '@/proto/twin_grpc_web_pb';
import { GetEntityRequest } from '@/proto/twin_pb';
const client = new TwinServiceClient('https://twin.citadel.io');
async function getZoneState(zoneId: string) {
const request = new GetEntityRequest();
request.setEntityId(zoneId);
const response = await client.getEntity(request, {});
return response.toObject();
}
MCP Server (TypeScript)
import { MCPServer, Tool } from '@modelcontextprotocol/sdk';
class BACnetMCPServer extends MCPServer {
async getTools(): Promise<Tool[]> {
return [
{
name: 'bacnet_read_point',
description: 'Read BACnet point value',
inputSchema: {
type: 'object',
properties: {
point_id: { type: 'string' }
}
}
}
];
}
async callTool(name: string, args: any): Promise<any> {
if (name === 'bacnet_read_point') {
return await this.bacnetClient.readPoint(args.point_id);
}
}
}
Infrastructure Choices
Kubernetes Distribution: K3s
Why K3s:
- Lightweight (< 512 MB memory)
- Single binary deployment
- Fully compatible with K8s APIs
- Built-in components (Traefik, local storage)
- Perfect for edge deployments
Alternative: Full Kubernetes (AKS, EKS, GKE) for large-scale cloud
Message Broker: NATS JetStream
Why NATS:
- Low latency (microseconds)
- Small footprint (< 20 MB)
- JetStream persistence
- Built-in clustering
- MQTT compatibility
Alternative: Kafka/Redpanda for cloud high-throughput scenarios
Time-Series Database: TimescaleDB
Why TimescaleDB:
- PostgreSQL-compatible (familiar SQL)
- Automatic partitioning
- Continuous aggregates
- Compression (10x space savings)
- Mature ecosystem
Alternative: InfluxDB, QuestDB for pure time-series workloads
Identity: SPIFFE/SPIRE
Why SPIFFE:
- Zero-trust native
- Automatic key rotation
- No shared secrets
- Industry standard
- Multi-platform support
No viable alternative for workload identity at this scale
Cross-Language Integration
Protobuf for All
# Generate for all languages
buf generate
# Outputs:
src/proto_gen/python/citadel/v1/*.py
src/proto_gen/csharp/Citadel.V1/*.cs
src/proto_gen/typescript/citadel/v1/*.ts
gRPC Everywhere
# Python client calling .NET service
channel = grpc.secure_channel("twin-service:8443", credentials)
client = TwinServiceStub(channel)
entity = await client.GetEntity(request)
// .NET calling Python agent
var channel = GrpcChannel.ForAddress("http://energy-agent:5000");
var client = new AgentService.AgentServiceClient(channel);
var result = await client.OptimizeAsync(request);
Development Tools
Build Tools
- Python:
uv(fast package manager) - .NET:
dotnetCLI - TypeScript:
pnpm(fast, efficient) - Protobuf:
buf(linting, breaking change detection)
Testing
# pytest for Python
pytest tests/ -v --cov=src
# xUnit for .NET
dotnet test --logger "console;verbosity=detailed"
# Vitest for TypeScript
pnpm test
Related Documentation
- Protocol Strategy - Cross-language protocols
- Agent Topology - Python agent implementation
- Cloud Integration - .NET services