Skip to main content

Chapter 3: The First Breath of Aspire

"The first time you see all your microservices breathing in harmony... it's like watching a city come alive."


The Orchestration Revelation

We had protocols. We had schemas. We had a vision. But scattered code doesn't make a system - it makes a mess.

How do you coordinate dozens of microservices, agents, databases, and message queues without losing your sanity? How do you see what's happening when things go wrong? How do you develop locally without deploying to production?

Enter .NET Aspire - our orchestration and observability savior.

What is .NET Aspire?

Think of Aspire as Kubernetes for local development - but simpler, smarter, and optimized for the inner development loop.

.NET Aspire provides:

  • 🎯 Service Orchestration: Define your entire system in code
  • 📊 Built-in Observability: Logs, traces, metrics out of the box
  • 🔗 Service Discovery: Services find each other automatically
  • 🐳 Container Management: Docker containers as first-class citizens
  • 🚀 Developer Experience: One command to start everything

The Philosophy

Old Way:

# Start services manually (nightmare mode)
docker-compose up -d postgres
docker-compose up -d redis
docker-compose up -d nats
python agents/security_agent.py &
dotnet run --project services/safety &
npm start --prefix adapters/schneider &
# ... repeat for 15 more services
# ... kill them all when done (did you get all of them?)

Aspire Way:

# Start entire system with observability
cd src/CitadelMesh.AppHost
dotnet run
# → Opens dashboard at https://localhost:5000
# → All services, logs, traces, metrics in one place
# → Ctrl+C stops everything cleanly

One command. One dashboard. Complete visibility.

The Aspire Architecture

The AppHost: Mission Control

The CitadelMesh.AppHost project is the brain of our local development environment:

// src/CitadelMesh.AppHost/Program.cs
var builder = DistributedApplication.CreateBuilder(args);

// Infrastructure Services
var postgres = builder.AddPostgres("postgres")
.WithDataVolume()
.AddDatabase("citadel-db");

var redis = builder.AddRedis("redis")
.WithDataVolume();

var nats = builder.AddContainer("nats", "nats")
.WithBindMount("./config/nats", "/config")
.WithArgs("--config", "/config/nats-server.conf")
.WithEndpoint(port: 4222, targetPort: 4222, name: "client")
.WithEndpoint(port: 8222, targetPort: 8222, name: "monitoring");

// OPA Policy Engine
var opa = builder.AddContainer("opa", "openpolicyagent/opa")
.WithBindMount("./policies", "/policies")
.WithArgs("run", "--server", "--addr", "0.0.0.0:8181", "/policies")
.WithEndpoint(port: 8181, targetPort: 8181, name: "api");

// SPIRE Server (Identity)
var spire = builder.AddContainer("spire-server", "ghcr.io/spiffe/spire-server")
.WithBindMount("./config/spire", "/opt/spire/conf")
.WithEndpoint(port: 8081, targetPort: 8081, name: "api");

// CitadelMesh Microservices
var safety = builder.AddProject<Projects.CitadelMesh_Safety>("safety")
.WithReference(opa)
.WithReference(postgres);

var orchestrator = builder.AddProject<Projects.CitadelMesh_Orchestrator>("orchestrator")
.WithReference(nats)
.WithReference(redis)
.WithReference(safety);

// Gateway (UI Backend)
var gateway = builder.AddNpmApp("gateway", "../gateway")
.WithReference(safety)
.WithReference(orchestrator)
.WithHttpEndpoint(port: 3001, env: "PORT");

// Python Agents
var securityAgent = builder.AddExecutable(
"security-agent",
"python",
"../agents",
"-m", "security.security_agent"
).WithReference(nats)
.WithReference(safety);

builder.Build().Run();

What this gives us:

Service Dependencies - Services start in the correct order ✅ Environment Variables - Auto-configured connection strings ✅ Health Checks - Know when services are ready ✅ Resource Management - Containers cleaned up automatically ✅ Observability - Logs and traces flow to dashboard

The Dashboard Experience

The Moment of Truth

cd src/CitadelMesh.AppHost
dotnet run

Output:

Building...
info: Aspire.Hosting.DistributedApplication[0]
Aspire app host listening on: https://localhost:5000
info: Aspire.Hosting.DistributedApplication[0]
Login to the dashboard at https://localhost:5000

Open your browser to https://localhost:5000 and witness the magic:

Dashboard Features

📊 Resources Tab See all services at a glance:

┌─────────────────────┬──────────┬─────────────────┬───────────┐
│ Resource │ State │ Type │ Endpoints │
├─────────────────────┼──────────┼─────────────────┼───────────┤
│ postgres │ Running │ Container │ 5432 │
│ redis │ Running │ Container │ 6379 │
│ nats │ Running │ Container │ 4222,8222 │
│ opa │ Running │ Container │ 8181 │
│ spire-server │ Running │ Container │ 8081 │
│ safety │ Running │ .NET Project │ 5100 │
│ orchestrator │ Running │ .NET Project │ 5200 │
│ gateway │ Running │ Node.js App │ 3001 │
│ security-agent │ Running │ Python Script │ - │
└─────────────────────┴──────────┴─────────────────┴───────────┘

📝 Logs Tab Unified log stream from all services:

[13:45:23] [safety] Policy evaluation: citadel/security/allow_door_unlock | ALLOW
[13:45:23] [orchestrator] Event received: citadel.security.incident
[13:45:24] [security-agent] Incident analyzed: severity=MEDIUM, action=ALERT
[13:45:24] [gateway] GET /api/incidents → 200 (45ms)

Filter by service, level, or search text.

📈 Traces Tab See request flow across services:

GET /api/policy/evaluate
├─ gateway → safety (12ms)
│ └─ safety → opa (18ms)
│ └─ OPA evaluation (15ms)
└─ Total: 45ms

Click any trace to see detailed spans, timing, and metadata.

📊 Metrics Tab Real-time performance metrics:

  • Request rate (req/s)
  • Error rate (%)
  • Response time (p50, p95, p99)
  • Resource usage (CPU, memory)

🔗 Dependencies Tab Visual graph of service dependencies:

     ┌─────────────┐
│ gateway │
└──────┬──────┘

┌──────┴──────┐
│ safety │
└──────┬──────┘

┌──────┴──────┐
│ opa │
└─────────────┘

The Development Workflow

Day 1: Starting from Scratch

Developer gets laptop, clones repo:

git clone https://github.com/KWIKalamazoo/CitadelMesh.git
cd CitadelMesh

Install prerequisites:

# .NET 8 SDK
dotnet --version # 8.0+

# Docker Desktop
docker --version

# Python 3.11+
python --version

# Node.js 20+
node --version

Start the world:

cd src/CitadelMesh.AppHost
dotnet run

Aspire does the rest:

  1. Pulls container images (postgres, redis, nats, opa, spire)
  2. Starts containers with correct configuration
  3. Builds .NET projects
  4. Installs npm dependencies
  5. Sets up Python virtual environments
  6. Configures service discovery
  7. Opens dashboard at https://localhost:5000

Time to first working system: ~5 minutes.

Day-to-Day Development

Scenario: Adding a new OPA policy

  1. Edit policy file:

    code policies/energy.rego
  2. OPA auto-reloads (mounted volume)

  3. Test in dashboard:

    • View logs: See OPA reload message
    • Test policy: Call gateway endpoint
    • View trace: See policy evaluation span
  4. No restart needed. Just edit and test.

Scenario: Debugging the security agent

  1. Check logs in dashboard:

    • Filter: resource:security-agent
    • Search: incident
    • See structured log output
  2. View traces:

    • Find incident processing trace
    • See timing for each step
    • Identify slow operations
  3. Attach debugger:

    # Stop agent in Aspire, run manually with debugger
    cd src/agents
    python -m debugpy --listen 5678 -m security.security_agent
  4. Restart agent in Aspire when done

The Inner Loop

Old way:

Edit code → Stop all services → Rebuild → Restart services → Test
(5-10 minutes per iteration)

Aspire way:

Edit code → Auto-reload or hot-reload → Test
(< 5 seconds per iteration)

10x faster iteration = 10x more productive.

Observability: The Superpower

Distributed Tracing with OpenTelemetry

Every request creates a trace that flows through multiple services:

Example: Policy Evaluation Request

Trace ID: 8f7d2a3b-1c4e-9f6a-2d8b-5e3a7c9f1b4d
Span Tree:
├─ gateway.http.request (48ms)
│ ├─ gateway.call_safety_service (35ms)
│ │ ├─ safety.evaluate_policy (30ms)
│ │ │ ├─ safety.call_opa (25ms)
│ │ │ │ └─ opa.evaluation (15ms)
│ │ │ └─ safety.audit_log (3ms)
│ │ └─ safety.response_serialization (2ms)
│ └─ gateway.response (5ms)

Click any span to see:

  • ⏱️ Start time, duration
  • 🏷️ Tags (service, operation, status)
  • 📊 Attributes (user, resource, outcome)
  • 🔗 Links to logs and metrics

Structured Logging

All services emit structured logs (JSON format):

{
"timestamp": "2025-10-01T13:45:23.123Z",
"level": "INFO",
"service": "safety",
"trace_id": "8f7d2a3b-1c4e-9f6a-2d8b-5e3a7c9f1b4d",
"span_id": "5e3a7c9f1b4d",
"message": "Policy evaluation completed",
"policy_path": "citadel/security/allow_door_unlock",
"decision": "ALLOW",
"duration_ms": 15,
"input": {
"role": "security_officer",
"time": 14,
"door_zone": "lobby"
}
}

Queryable, filterable, and correlatable with traces.

Metrics and Dashboards

Aspire collects metrics automatically:

Safety Service Metrics:

  • safety.policy.evaluations.total (counter)
  • safety.policy.evaluation.duration (histogram)
  • safety.policy.denials.total (counter)

Orchestrator Metrics:

  • orchestrator.events.received.total (counter)
  • orchestrator.events.processing.duration (histogram)
  • orchestrator.agents.active (gauge)

View in Aspire dashboard or export to Prometheus/Grafana.

The Aspire Advantage

Versus Docker Compose

FeatureDocker ComposeAspire
Service orchestration
Container management
Observability❌ Manual✅ Built-in
Service dependencies⚠️ Basic✅ Rich
Hot reload
Traces
Metrics
.NET integration✅ Excellent
Python/Node support
Production deployment⚠️ Dev-focused

Verdict: Use Aspire for development, Kubernetes for production.

Versus Kubernetes (for dev)

Kubernetes:

  • ✅ Production-grade orchestration
  • ❌ Complex setup (minikube, kind, etc.)
  • ❌ Slow iteration (build → push → deploy)
  • ❌ Heavy resource usage
  • ❌ Difficult debugging

Aspire:

  • ✅ Instant startup
  • ✅ Hot reload
  • ✅ Built-in debugging
  • ✅ Lightweight
  • ⚠️ Dev environment only

Verdict: Aspire for dev, Kubernetes for prod. Best of both worlds.

The Foundation Services

With Aspire orchestrating, we deployed the core CitadelMesh services:

🛡️ CitadelMesh.Safety

.NET 8 microservice that wraps OPA policy engine:

  • Exposes REST API for policy evaluation
  • Manages policy loading and updates
  • Provides audit logging
  • Handles policy bundles

Endpoints:

  • POST /api/safety/evaluate - Evaluate policy decision
  • GET /api/safety/policies - List available policies
  • GET /api/safety/health - Health check

🎭 CitadelMesh.Orchestrator

.NET 8 service for event coordination:

  • Subscribes to NATS event bus
  • Routes events to appropriate agents
  • Manages agent lifecycle
  • Coordinates multi-agent workflows

Uses:

  • Dapr for pub/sub and state management
  • Orleans for actor-based agents (future)
  • MassTransit for saga orchestration (future)

📊 OpenTelemetry

Observability infrastructure:

  • Collects traces from all services
  • Aggregates metrics
  • Exports to Aspire dashboard
  • Can export to Jaeger, Zipkin, Prometheus

Auto-instrumentation for:

  • HTTP requests/responses
  • gRPC calls
  • Database queries
  • Message queue operations

🗃️ Structured Logging

Serilog configured for all services:

  • JSON output format
  • Enriched with trace context
  • Minimum level: Info (configurable)
  • Sinks: Console, File, Dashboard

Milestone Achieved ✅

Aspire Orchestration Complete

  • ✅ AppHost configured with all services
  • ✅ Dashboard accessible at https://localhost:5000
  • ✅ All services starting successfully
  • ✅ Logs, traces, metrics flowing
  • ✅ Service discovery working
  • ✅ Hot reload enabled
  • ✅ Developer workflow optimized

The System Breathes

Type dotnet run, and watch the entire CitadelMesh ecosystem come to life:

  • 🐘 PostgreSQL ready for data
  • 🔴 Redis caching at speed
  • 📨 NATS events flowing
  • 🛡️ OPA policies enforcing
  • 🔐 SPIRE identities issuing
  • 🎯 Microservices coordinating
  • 🤖 Agents listening

Foundation complete. Time to awaken the guardian.


🏰 NEXT: Chapter 4: The Policy Guardian Awakens →


Updated: October 2025 | Status: Complete ✅