Files

Allegro 31026ddcc1 [#76-v4] Final Uni-Wizard Architecture — Production Integration

Complete four-pass evolution to production-ready architecture:

**Pass 1 → Foundation:**
- Tool registry, basic harness, 19 tools
- VPS provisioning, Syncthing mesh
- Health daemon, systemd services

**Pass 2 → Three-House Canon:**
- Timmy (Sovereign), Ezra (Archivist), Bezalel (Artificer)
- Provenance tracking, artifact-flow discipline
- House-aware policy enforcement

**Pass 3 → Self-Improvement:**
- Pattern database with SQLite backend
- Adaptive policies (auto-adjust thresholds)
- Predictive execution (success prediction)
- Hermes bridge for shortest-loop telemetry
- Learning velocity tracking

**Pass 4 → Production Integration:**
- Unified API: `from uni_wizard import Harness, House, Mode`
- Three modes: SIMPLE / INTELLIGENT / SOVEREIGN
- Circuit breaker pattern for fault tolerance
- Async/concurrent execution support
- Production hardening (timeouts, retries)

**Allegro Lane Definition:**
- Narrowed to: Gitea integration, Hermes bridge, redundancy/failover
- Provides: Cloud connectivity, telemetry streaming, issue routing
- Does NOT: Make sovereign decisions, authenticate as Timmy

**Files:**
- v3/: Intelligence engine, adaptive harness, Hermes bridge
- v4/: Unified API, production harness, final architecture

Total: ~25KB architecture documentation + production code

2026-03-30 16:39:42 +00:00

4.3 KiB

Raw Blame History

Uni-Wizard v3 — Design Critique & Review

Review of Existing Work

1. Timmy's model_tracker.py (v1)

What's good:

Tracks local vs cloud usage
Cost estimation
SQLite persistence
Ingests from Hermes session DB

The gap:

Data goes nowhere. It logs but doesn't learn.
No feedback loop into decision-making
Sovereignty score is a vanity metric unless it changes behavior
No pattern recognition on "which models succeed at which tasks"

Verdict: Good telemetry, zero intelligence. Missing: telemetry → analysis → adaptation.

2. Ezra's v2 Harness (Archivist)

What's good:

must_read_before_write policy enforcement
Evidence level tracking
Source citation

The gap:

Policies are static. Ezra doesn't learn which evidence sources are most reliable.
No tracking of "I read source X, made decision Y, was I right?"
No adaptive confidence calibration

Verdict: Good discipline, no learning. Missing: outcome feedback → policy refinement.

3. Bezalel's v2 Harness (Artificer)

What's good:

requires_proof enforcement
test_before_ship gate
Proof verification

The gap:

No failure pattern analysis. If tests fail 80% of the time on certain tools, Bezalel doesn't adapt.
No "pre-flight check" based on historical failure modes
No learning from which proof types catch most bugs

Verdict: Good rigor, no adaptation. Missing: failure pattern → prevention.

4. Hermes Harness Integration

What's good:

Rich session data available
Tool call tracking
Model performance per task

The gap:

Shortest loop not utilized. Hermes data exists but doesn't flow into Timmy's decision context.
No real-time "last 10 similar tasks succeeded with model X"
No context window optimization based on historical patterns

Verdict: Rich data, unused. Missing: hermes_telemetry → timmy_context → smarter_routing.

The Core Problem

Current Flow (Open Loop):
┌─────────┐    ┌──────────┐    ┌─────────┐
│ Execute │───→│ Log Data │───→│  Report │───→ 🗑️
└─────────┘    └──────────┘    └─────────┘

Needed Flow (Closed Loop):
┌─────────┐    ┌──────────┐    ┌───────────┐
│ Execute │───→│ Log Data │───→│  Analyze  │
└─────────┘    └──────────┘    └─────┬─────┘
     ▲                               │
     └───────────────────────────────┘
         Adapt Policy / Route / Model

The Focus: Local sovereign Timmy must get smarter, faster, and self-improving by closing this loop.

v3 Solution: The Intelligence Layer

1. Feedback Loop Architecture

Every execution feeds into:

Pattern DB: Tool X with params Y → success rate Z%
Model Performance: Task type T → best model M
House Calibration: House H on task T → confidence adjustment
Predictive Cache: Pre-fetch based on execution patterns

2. Adaptive Policies

Policies become functions of historical performance:

# Instead of static:
evidence_threshold = 0.8

# Dynamic based on track record:
evidence_threshold = base_threshold * (1 + success_rate_adjustment)

3. Hermes Telemetry Integration

Real-time ingestion from Hermes session DB:

Last N similar tasks
Success rates by model
Latency patterns
Token efficiency

4. Self-Improvement Metrics

Prediction accuracy: Did predicted success match actual?
Policy effectiveness: Did policy change improve outcomes?
Learning velocity: How fast is Timmy getting better?

Design Principles for v3

Every execution teaches — No telemetry without analysis
Local learning only — Pattern recognition runs locally, no cloud
Shortest feedback loop — Hermes data → Timmy context in <100ms
Transparent adaptation — Timmy explains why he changed his policy
Sovereignty-preserving — Learning improves local decision-making, doesn't outsource it

The goal: Timmy gets measurably better every day he runs.

4.3 KiB Raw Blame History

Uni-Wizard v3 — Design Critique & Review

Review of Existing Work

1. Timmy's model_tracker.py (v1)

2. Ezra's v2 Harness (Archivist)

3. Bezalel's v2 Harness (Artificer)

4. Hermes Harness Integration

The Core Problem

v3 Solution: The Intelligence Layer

1. Feedback Loop Architecture

2. Adaptive Policies

3. Hermes Telemetry Integration

4. Self-Improvement Metrics

Design Principles for v3

4.3 KiB

Raw Blame History