Skip to main content

Technology Stack

CitadelMesh is intentionally polyglot, choosing the best language and framework for each component. This document explains our technology choices, decision rationale, and when to use each stack.

Stack Overview

graph TB
subgraph "Agent Layer (Python)"
LangGraph[LangGraph]
LangChain[LangChain]
Pydantic[Pydantic]
end

subgraph "Services Layer (.NET)"
Aspire[.NET Aspire]
Orleans[Orleans]
Dapr[Dapr]
SK[Semantic Kernel]
end

subgraph "Frontend (TypeScript)"
React[React]
Next[Next.js]
TailwindCSS[TailwindCSS]
end

subgraph "Infrastructure"
K3s[K3s]
NATS[NATS]
Postgres[PostgreSQL]
Timescale[TimescaleDB]
end

subgraph "Protocols"
Protobuf[Protobuf]
gRPC[gRPC]
CloudEvents[CloudEvents]
end

LangGraph --> Protobuf
Orleans --> Protobuf
React --> gRPC

LangGraph --> NATS
Orleans --> NATS

Python Stack (Agents)

When to Use Python

Best for:

  • Agent development (LangGraph, LlamaIndex, AutoGen)
  • Rapid iteration and experimentation
  • AI/ML integration (Transformers, PyTorch)
  • Protocol adapters with vendor SDKs

Not ideal for:

  • High-throughput stateful services
  • Low-latency requirements (< 5ms)
  • Large-scale parallel processing

Core Libraries

LangGraph

Purpose: Deterministic agent state machines

from langgraph.graph import StateGraph, END
from typing import TypedDict

class AgentState(TypedDict):
messages: List[str]
next_action: str

workflow = StateGraph(AgentState)
workflow.add_node("process", process_node)
workflow.add_node("decide", decide_node)
workflow.set_entry_point("process")
workflow.add_edge("process", "decide")
workflow.add_edge("decide", END)

graph = workflow.compile()

Why LangGraph:

  • Deterministic execution (replayable)
  • Built-in checkpointing
  • Human-in-the-loop support
  • Excellent OpenTelemetry integration

Pydantic

Purpose: Data validation and serialization

from pydantic import BaseModel, Field

class Command(BaseModel):
id: str = Field(..., min_length=26, max_length=26) # ULID
target_id: str
action: str
params: dict[str, str] = {}
priority: int = Field(default=2, ge=0, le=4)

# Automatic validation
command = Command(id=ulid(), target_id="door.lobby", action="unlock")

NATS.py

Purpose: Event bus client

import nats
from nats.js import JetStreamContext

nc = await nats.connect("nats://nats.citadel.svc:4222")
js = nc.jetstream()

# Publish event
await js.publish("telemetry.canonical.building_a", event_data)

# Subscribe to stream
sub = await js.pull_subscribe("telemetry.>", "agent-consumer")
msgs = await sub.fetch(batch=10)

Python Environment

# pyproject.toml
[project]
name = "citadel-agents"
version = "1.0.0"
requires-python = ">=3.11"

dependencies = [
"langgraph>=0.2",
"langchain>=0.3",
"pydantic>=2.0",
"nats-py>=2.7",
"grpcio>=1.60",
"protobuf>=5.0",
"opentelemetry-api>=1.22",
"opentelemetry-sdk>=1.22",
"structlog>=24.0"
]

[tool.uv]
dev-dependencies = [
"pytest>=8.0",
"pytest-asyncio>=0.23",
"ruff>=0.3"
]

.NET Stack (Services)

When to Use .NET

Best for:

  • High-throughput stateful services
  • Long-lived actors (Orleans)
  • Cloud-native microservices (Aspire)
  • Low-latency requirements

Not ideal for:

  • Quick prototyping
  • AI/ML-heavy workloads
  • Vendor SDKs only in Python/JS

Core Frameworks

.NET Aspire

Purpose: Cloud-native service composition and orchestration

// Program.cs
var builder = DistributedApplication.CreateBuilder(args);

// Add infrastructure
var cache = builder.AddRedis("cache");
var postgres = builder.AddPostgres("postgres");
var nats = builder.AddNats("nats");

// Add services
builder.AddProject<SchedulerService>("scheduler")
.WithReference(cache)
.WithReference(postgres)
.WithReference(nats);

builder.AddProject<AlarmService>("alarms")
.WithReference(postgres)
.WithReference(nats);

builder.Build().Run();

Why Aspire:

  • Built-in service discovery
  • Automatic health checks
  • Integrated telemetry dashboard
  • Local development experience

Orleans

Purpose: Stateful virtual actors for long-lived workflows

public interface IIncidentActor : IGrainWithStringKey
{
Task ReportIncident(IncidentReport report);
Task UpdateStatus(IncidentStatus status);
Task<IncidentState> GetState();
}

public class IncidentActor : Grain, IIncidentActor
{
private IncidentState state = new();

public async Task ReportIncident(IncidentReport report)
{
state.Severity = report.Severity;
state.ReportedAt = DateTime.UtcNow;

// Persist state
await WriteStateAsync();

// Publish event
await PublishIncidentEvent(report);

// Set reminder for follow-up
await RegisterOrUpdateReminder(
"followup",
TimeSpan.FromMinutes(15),
TimeSpan.FromMinutes(15)
);
}

public override async Task OnActivateAsync(CancellationToken cancellationToken)
{
// Restore state
await ReadStateAsync();
}
}

Why Orleans:

  • Automatic state persistence
  • Location transparency
  • Built-in reminders/timers
  • Elastic scalability

Dapr

Purpose: Language-agnostic building blocks (pub/sub, state, bindings)

// Pub/Sub with Dapr
[Topic("pubsub", "telemetry.canonical")]
public async Task HandleTelemetry(CloudEvent<Point> cloudEvent)
{
var point = cloudEvent.Data;

logger.LogInformation(
"Telemetry received: {EntityId} = {Value}",
point.EntityId,
point.Value
);

await ProcessTelemetry(point);
}

// State store
var stateStore = "statestore";
await daprClient.SaveStateAsync(stateStore, "zone-state", zoneState);
var retrieved = await daprClient.GetStateAsync<ZoneState>(stateStore, "zone-state");

Why Dapr:

  • Polyglot (Python, .NET, Node all use same APIs)
  • Portable across clouds (abstracts Kafka, Azure Service Bus, etc.)
  • Built-in retries, circuit breakers

Semantic Kernel

Purpose: .NET agent framework for AI integration

var kernel = Kernel.CreateBuilder()
.AddOpenAIChatCompletion(modelId, apiKey)
.Build();

// Add skills
kernel.ImportPluginFromObject(new HvacPlugin());

// Run agent
var result = await kernel.InvokePromptAsync(
"Optimize HVAC for building A based on current weather and occupancy"
);

.NET Project Structure

// CitadelMesh.sln
solution
├── src/
│ ├── CitadelMesh.Aspire/
│ │ └── AppHost/ # Aspire composition
│ ├── CitadelMesh.Services/
│ │ ├── Scheduler/ # Scheduler service
│ │ ├── Alarms/ # Alarm service
│ │ └── Sessions/ # Session manager
│ ├── CitadelMesh.Orleans/
│ │ └── Grains/ # Orleans actors
│ └── CitadelMesh.Shared/
│ ├── Contracts/ # Protobuf generated
│ └── Common/ # Shared utilities
└── tests/
└── CitadelMesh.Tests/

TypeScript Stack (Frontend & Tooling)

When to Use TypeScript

Best for:

  • Frontend web applications
  • Real-time dashboards
  • Developer tooling
  • MCP servers

Core Frameworks

React + Next.js

// Building dashboard component
'use client';

import { useEffect, useState } from 'react';
import { useStreamingTelemetry } from '@/hooks/useTelemetry';

export function ZoneStatusCard({ zoneId }: { zoneId: string }) {
const telemetry = useStreamingTelemetry(zoneId);

return (
<div className="p-4 border rounded-lg">
<h3 className="text-lg font-semibold">{zoneId}</h3>
<div className="mt-2">
<span>Temperature: {telemetry.temp}°F</span>
<span>Setpoint: {telemetry.setpoint}°F</span>
<span>Occupancy: {telemetry.occupied ? 'Yes' : 'No'}</span>
</div>
</div>
);
}

gRPC-Web

// gRPC client for Twin Service
import { TwinServiceClient } from '@/proto/twin_grpc_web_pb';
import { GetEntityRequest } from '@/proto/twin_pb';

const client = new TwinServiceClient('https://twin.citadel.io');

async function getZoneState(zoneId: string) {
const request = new GetEntityRequest();
request.setEntityId(zoneId);

const response = await client.getEntity(request, {});
return response.toObject();
}

MCP Server (TypeScript)

import { MCPServer, Tool } from '@modelcontextprotocol/sdk';

class BACnetMCPServer extends MCPServer {
async getTools(): Promise<Tool[]> {
return [
{
name: 'bacnet_read_point',
description: 'Read BACnet point value',
inputSchema: {
type: 'object',
properties: {
point_id: { type: 'string' }
}
}
}
];
}

async callTool(name: string, args: any): Promise<any> {
if (name === 'bacnet_read_point') {
return await this.bacnetClient.readPoint(args.point_id);
}
}
}

Infrastructure Choices

Kubernetes Distribution: K3s

Why K3s:

  • Lightweight (< 512 MB memory)
  • Single binary deployment
  • Fully compatible with K8s APIs
  • Built-in components (Traefik, local storage)
  • Perfect for edge deployments

Alternative: Full Kubernetes (AKS, EKS, GKE) for large-scale cloud

Message Broker: NATS JetStream

Why NATS:

  • Low latency (microseconds)
  • Small footprint (< 20 MB)
  • JetStream persistence
  • Built-in clustering
  • MQTT compatibility

Alternative: Kafka/Redpanda for cloud high-throughput scenarios

Time-Series Database: TimescaleDB

Why TimescaleDB:

  • PostgreSQL-compatible (familiar SQL)
  • Automatic partitioning
  • Continuous aggregates
  • Compression (10x space savings)
  • Mature ecosystem

Alternative: InfluxDB, QuestDB for pure time-series workloads

Identity: SPIFFE/SPIRE

Why SPIFFE:

  • Zero-trust native
  • Automatic key rotation
  • No shared secrets
  • Industry standard
  • Multi-platform support

No viable alternative for workload identity at this scale

Cross-Language Integration

Protobuf for All

# Generate for all languages
buf generate

# Outputs:
src/proto_gen/python/citadel/v1/*.py
src/proto_gen/csharp/Citadel.V1/*.cs
src/proto_gen/typescript/citadel/v1/*.ts

gRPC Everywhere

# Python client calling .NET service
channel = grpc.secure_channel("twin-service:8443", credentials)
client = TwinServiceStub(channel)
entity = await client.GetEntity(request)
// .NET calling Python agent
var channel = GrpcChannel.ForAddress("http://energy-agent:5000");
var client = new AgentService.AgentServiceClient(channel);
var result = await client.OptimizeAsync(request);

Development Tools

Build Tools

  • Python: uv (fast package manager)
  • .NET: dotnet CLI
  • TypeScript: pnpm (fast, efficient)
  • Protobuf: buf (linting, breaking change detection)

Testing

# pytest for Python
pytest tests/ -v --cov=src

# xUnit for .NET
dotnet test --logger "console;verbosity=detailed"

# Vitest for TypeScript
pnpm test

See Also