Files
Timmy-time-dashboard/docs/ROADMAP.md
2026-03-07 07:28:14 -05:00

28 KiB
Raw Blame History

ROADMAP.md — Integration Roadmap for Timmy Time

Origin: The Ascension of Timmy: Beyond the Exodus Version: 0.1.0 (draft) Last updated: 2026-02-28 Maintainer: @AlexanderWhitestone

This document guides AI agents and human developers through the full integration plan for Timmy Time. Each phase is independently valuable. Phases are ordered by priority; dependencies are noted where they exist.

Repo strategy: Multi-repo ecosystem. Each heavy integration gets its own repo, dockerized as a service. The main Timmy-time-dashboard repo stays lean — it consumes these services via HTTP/WebSocket/gRPC.


Current State (v2.0 Exodus)

What already works:

Component Status Implementation
Voice TTS pyttsx3 (robotic) src/timmy_serve/voice_tts.py
Voice STT Browser Web Speech API src/dashboard/templates/voice_button.html
Voice NLU Regex-based intent detection src/integrations/voice/nlu.py
Frontend HTMX + Bootstrap + marked.js src/dashboard/templates/base.html
LLM router Cascade with circuit breaker src/infrastructure/router/cascade.py

What does NOT exist yet:

Piper TTS · Faster-Whisper · Chroma · CRDTs · LangGraph · Nostr · ZK-ML · Prometheus/Grafana/Loki · OpenTelemetry · Three.js · Tauri · Alpine.js · Firecracker/gVisor/WASM sandboxing · BOLT12 · Vickrey auctions


Phase 0: Multi-Repo Foundation

Goal: Establish the repo ecosystem before building new services.

Repos to create

Repo Purpose Interface to main repo
timmy-voice STT + TTS service HTTP REST + WebSocket streaming
timmy-nostr Nostr relay client, identity, reputation HTTP REST + event stream
timmy-memory Vector DB service (if external DB chosen) HTTP REST
timmy-observe Metrics collection + export Prometheus scrape endpoint

Repo template

Each service repo follows:

timmy-<service>/
├── src/                    # Python source
├── tests/
├── Dockerfile
├── docker-compose.yml      # Standalone dev setup
├── pyproject.toml
├── Makefile                # make test, make dev, make docker-build
├── CLAUDE.md               # AI agent instructions specific to this repo
└── README.md

Main repo changes

  • Add docker-compose.services.yml for orchestrating external services
  • Add thin client modules in src/infrastructure/clients/ for each service
  • Each client follows graceful degradation: if the service is down, log and return a fallback (never crash)

Decision record

Create docs/adr/023-multi-repo-strategy.md documenting the split rationale.


Phase 1: Sovereign Voice

Priority: HIGHEST — most urgent integration Repo: timmy-voice Depends on: Phase 0 (repo scaffold)

1.1 Research & Select Engines

Before writing code, evaluate these candidates. The goal is ONE engine per concern that works across hardware tiers (Pi 4 through desktop GPU).

STT Candidates

Engine Size Speed Offline Notes
Faster-Whisper 39M1.5G 4-7x over Whisper Yes CTranslate2, INT8 quantization, mature ecosystem
Moonshine 27M245M 100x faster than Whisper large on CPU Yes New (Feb 2026), edge-first, streaming capable
Vosk 50M1.8G Real-time on Pi Yes Kaldi-based, very lightweight, good for embedded
whisper.cpp Same as Whisper CPU-optimized C++ Yes llama.cpp ecosystem, GGML quantization

Research tasks:

  • Benchmark Moonshine vs Faster-Whisper vs whisper.cpp on: (a) RPi 4 4GB, (b) M-series Mac, (c) Linux desktop with GPU
  • Evaluate streaming vs batch transcription for each
  • Test accuracy on accented speech and technical vocabulary
  • Measure cold-start latency (critical for voice UX)

Recommendation to validate: Moonshine for edge, Faster-Whisper for desktop, with a unified API wrapper that selects by hardware tier.

TTS Candidates

Engine Size Speed Offline Notes
Piper 416M per voice 10-30x real-time on Pi 4 Yes VITS arch, ONNX, production-proven, many voices
Kokoro 82M params Fast on CPU Yes Apache 2.0, quality rivals large models
Coqui/XTTS-v2 1.5G+ Needs GPU Yes Voice cloning, multilingual — but company shut down
F5-TTS Medium Needs GPU Yes Flow matching, 10s voice clone, MIT license

Research tasks:

  • Benchmark Piper vs Kokoro on: (a) RPi 4, (b) desktop CPU, (c) desktop GPU
  • Compare voice naturalness (subjective listening test)
  • Test Piper custom voice training pipeline (for Timmy's voice)
  • Evaluate Kokoro Apache 2.0 licensing for commercial use

Recommendation to validate: Piper for edge (proven on Pi), Kokoro for desktop quality, with a TTS provider interface that swaps transparently.

1.2 Architecture

┌─────────────────────────────────────────────┐
│                timmy-voice                   │
│                                              │
│  ┌──────────┐   ┌──────────┐   ┌─────────┐ │
│  │ STT      │   │ TTS      │   │ NLU     │ │
│  │ Engine   │   │ Engine   │   │ (move   │ │
│  │ (select  │   │ (select  │   │  from   │ │
│  │  by hw)  │   │  by hw)  │   │  main)  │ │
│  └────┬─────┘   └────┬─────┘   └────┬────┘ │
│       │              │              │       │
│  ┌────┴──────────────┴──────────────┴────┐  │
│  │         FastAPI / WebSocket API        │  │
│  │  POST /transcribe (audio → text)      │  │
│  │  POST /speak (text → audio)           │  │
│  │  WS   /stream (real-time STT)         │  │
│  │  POST /understand (text → intent)     │  │
│  └───────────────────────────────────────┘  │
│                                              │
│  Docker: timmy-voice:latest                  │
│  Ports: 8410 (HTTP) / 8411 (WS)             │
└─────────────────────────────────────────────┘

1.3 Integration with Main Repo

  • Add src/infrastructure/clients/voice_client.py — async HTTP/WS client
  • Replace browser Web Speech API with calls to timmy-voice service
  • Replace pyttsx3 calls with TTS service calls
  • Move src/integrations/voice/nlu.py to timmy-voice repo
  • Keep graceful fallback: if voice service unavailable, disable voice features in the UI (don't crash)

1.4 Deliverables

  • STT engine benchmarks documented in timmy-voice/docs/benchmarks.md
  • TTS engine benchmarks documented alongside
  • Working Docker container with REST + WebSocket API
  • Client integration in main repo
  • Tests: unit tests in timmy-voice, integration tests in main repo
  • Dashboard voice button works end-to-end through the service

1.5 Success Criteria

  • STT: < 500ms latency for 5-second utterance on desktop, < 2s on Pi 4
  • TTS: Naturalness score > 3.5/5 (subjective), real-time factor > 5x on Pi 4
  • Zero cloud dependencies for voice pipeline
  • make test passes in both repos

Phase 2: Nostr Identity & Reputation

Priority: HIGH Repo: timmy-nostr Depends on: Phase 0

2.1 Scope

Full Nostr citizen: agent identity, user auth, relay publishing, reputation.

2.2 Agent Identity

Each swarm agent gets a Nostr keypair (nsec/npub).

# Agent identity lifecycle
agent = SwarmAgent(persona="forge")
agent.nostr_keys = generate_keypair()      # nsec stored encrypted
agent.nip05 = "forge@timmy.local"          # NIP-05 verification
agent.publish_profile()                     # kind:0 metadata event

Tasks:

  • Keypair generation and encrypted storage (use from config import settings for encryption key)
  • NIP-01: Basic event publishing (kind:0 metadata, kind:1 notes)
  • NIP-05: DNS-based identifier verification (for @timmy.local or custom domain)
  • NIP-39: External identity linking (link agent npub to Lightning node pubkey, GitHub, etc.)

2.3 User Authentication

Users authenticate via Nostr keys instead of traditional auth.

Tasks:

  • NIP-07: Browser extension signer integration (nos2x, Alby)
  • NIP-42: Client authentication to relay
  • NIP-44: Encrypted direct messages (XChaCha20-Poly1305 v2)
  • Session management: Nostr pubkey → session token

2.4 Reputation System

Agents build portable reputation through signed event history.

Tasks:

  • NIP-32: Labeling — agents rate each other's work quality
  • Reputation score calculation from label events
  • Cross-instance reputation portability (reputation follows the npub)
  • Dashboard: agent profile page showing Nostr identity + reputation

2.5 Relay Infrastructure

Tasks:

  • Embed or connect to a Nostr relay (evaluate: strfry, nostr-rs-relay)
  • Publish agent work events (task completed, bid won, etc.) to relay
  • Subscribe to events from other Timmy instances (federation via Nostr)
  • Data Vending Machine (DVM) pattern: advertise agent capabilities as Nostr events, receive job requests, deliver results, get paid in sats

2.6 Integration with Main Repo

  • Add src/infrastructure/clients/nostr_client.py
  • Modify the swarm coordinator to publish task/bid/completion events
  • Add Nostr auth option to dashboard login
  • Agent profile pages show npub, NIP-05, reputation score

2.7 Key References

  • Clawstr — Nostr-native AI agent social network (NIP-22, NIP-73)
  • ai.wot — Cross-platform trust attestations via NIP-32
  • NIP-101 — Decentralized Trust System
  • Nostr NIPs repo

2.8 Success Criteria

  • Every swarm agent has a Nostr identity (npub)
  • Users can log in via NIP-07 browser extension
  • Agent work history is published to a relay
  • Reputation scores are visible on agent profile pages
  • Two separate Timmy instances can discover each other via relay

Phase 3: Semantic Memory Evolution

Priority: HIGH Repo: Likely stays in main repo (lightweight) or timmy-memory (if heavy) Depends on: None (can start in parallel)

3.1 Research Vector DB Alternatives

The current implementation uses SQLite + in-Python cosine similarity with a hash-based embedding fallback. This needs to be evaluated against proper vector search solutions.

Candidates

DB Architecture Index Best Scale Server? License
sqlite-vec SQLite extension Brute-force KNN Thousands100K No MIT
LanceDB Embedded, disk-based IVF_PQ Up to ~10M No Apache 2.0
Chroma Client-server or embedded HNSW Up to ~10M Optional Apache 2.0
Qdrant Client-server HNSW 100M+ Yes Apache 2.0

Research tasks:

  • Benchmark current SQLite implementation: query latency at 1K, 10K, 100K memories
  • Test sqlite-vec as drop-in upgrade (same SQLite, add extension)
  • Test LanceDB embedded mode (no server, disk-based, Arrow format)
  • Evaluate whether Chroma or Qdrant are needed at current scale
  • Document findings in docs/adr/024-vector-db-selection.md

Recommendation to validate: sqlite-vec is the most natural upgrade path (already using SQLite, zero new dependencies, MIT license). LanceDB if we outgrow brute-force KNN. Chroma/Qdrant only if we need client-server architecture.

3.2 Embedding Model Upgrade

Current: all-MiniLM-L6-v2 (sentence-transformers) with hash fallback.

Research tasks:

  • Evaluate nomic-embed-text via Ollama (keeps everything local, no sentence-transformers dependency)
  • Evaluate all-MiniLM-L6-v2 vs bge-small-en-v1.5 vs nomic-embed-text on retrieval quality
  • Decide: keep sentence-transformers, or use Ollama embeddings for everything?

3.3 Memory Architecture Improvements

  • Episodic memory: condensed summaries of past conversations with entity and intent tags
  • Procedural memory: tool/skill embeddings for natural language invocation
  • Temporal constraints: time-weighted retrieval (recent memories scored higher)
  • Memory pruning: automatic compaction of old, low-relevance memories

3.4 CRDTs for Multi-Device Sync

Timeline: Later phase (after vector DB selection is settled)

  • Research CRDT libraries: yrs (Yjs Rust port), automerge
  • Design sync protocol for memory entries across devices
  • Evaluate: is CRDT sync needed, or can we use a simpler last-write-wins approach with conflict detection?

3.5 Success Criteria

  • Vector search latency < 50ms at 100K memories
  • Retrieval quality measurably improves over current hash fallback
  • No new server process required (embedded preferred)
  • Existing memories migrate without loss

Phase 4: Observability Stack

Priority: MEDIUM-HIGH Repo: timmy-observe (collector + dashboards) or integrated Depends on: None

4.1 Prometheus Metrics

Add a /metrics endpoint to the main dashboard (FastAPI).

Metrics to expose:

  • timmy_tasks_total{status,persona} — task counts by status and agent
  • timmy_auction_duration_seconds — auction completion time
  • timmy_llm_request_duration_seconds{provider,model} — LLM latency
  • timmy_llm_tokens_total{provider,direction} — token usage
  • timmy_lightning_balance_sats — treasury balance
  • timmy_memory_count — total memories stored
  • timmy_ws_connections — active WebSocket connections
  • timmy_agent_health{persona} — agent liveness

Tasks:

  • Add prometheus_client to dependencies
  • Instrument the swarm coordinator (task lifecycle metrics)
  • Instrument src/infrastructure/router/cascade.py (LLM metrics)
  • Instrument the Lightning ledger module (when implemented) (financial metrics)
  • Add /metrics route in src/dashboard/routes/
  • Grafana dashboard JSON in deploy/grafana/

4.2 Structured Logging with Loki

Replace ad-hoc logging with structured JSON logs that Loki can ingest.

Tasks:

  • Add python-json-logger or structlog
  • Standardize log format: {timestamp, level, module, event, context}
  • Add Loki + Promtail to docker-compose.services.yml
  • Grafana Loki datasource in dashboard config

4.3 OpenTelemetry Distributed Tracing

Trace requests across services (dashboard → voice → LLM → swarm).

Tasks:

  • Add opentelemetry-api, opentelemetry-sdk, opentelemetry-instrumentation-fastapi
  • Instrument FastAPI with auto-instrumentation
  • Propagate trace context to timmy-voice and other services
  • Add Jaeger or Tempo to docker-compose.services.yml
  • Grafana Tempo datasource

4.4 Swarm Visualization

Real-time force-directed graph of agent topology.

Tasks:

  • Evaluate: Three.js vs D3.js force layout vs Cytoscape.js
  • WebSocket feed of swarm topology events (already have /swarm/events)
  • Nodes: agents (sized by reputation/stake, colored by status)
  • Edges: task assignments, Lightning channels
  • Add as new dashboard page: /swarm/graph

4.5 Success Criteria

  • Prometheus scrapes metrics every 15s
  • Grafana dashboard shows task throughput, LLM latency, agent health
  • Log search across all services via Loki
  • Request traces span from HTTP request to LLM response

Phase 5: Lightning Maturation

Priority: MEDIUM — extends existing code Repo: Main repo + possibly timmy-lightning for LND Depends on: None (existing foundation is solid)

5.1 LND gRPC (already planned in REVELATION_PLAN)

  • Generate protobuf stubs from LND source
  • Implement LndBackend methods (currently NotImplementedError)
  • Connection pooling, macaroon encryption, TLS validation
  • Integration tests against regtest

5.2 BOLT12 Offers

Static, reusable payment requests with blinded paths for payer privacy.

  • Research BOLT12 support in LND vs CLN vs LDK
  • Implement offer creation and redemption
  • Agent-level offers: each agent has a persistent payment endpoint

5.3 HTLC/PTLC Extensions

  • HTLC: Hash Time-Locked Contracts for conditional payments
  • PTLC: Point Time-Locked Contracts (Taproot, privacy-preserving)
  • Use case: agent escrow — payment locked until task completion verified

5.4 Autonomous Treasury (already planned in REVELATION_PLAN)

  • Per-agent balance tracking
  • Cold storage sweep threshold
  • Earnings dashboard
  • Withdrawal approval queue

5.5 Success Criteria

  • Create and settle real invoices on regtest
  • Agents have persistent BOLT12 offers
  • Treasury dashboard shows real balances
  • Graceful fallback to mock when LND unavailable

Phase 6: Vickrey Auctions & Agent Economics

Priority: MEDIUM Repo: Main repo Depends on: Phase 5 (Lightning, for real payments)

6.1 Upgrade to Vickrey (Second-Price) Auction

Current: first-price lowest-bid. Manifesto calls for Vickrey.

# Current: winner pays their own bid
winner = min(bids, key=lambda b: b.bid_sats)
payment = winner.bid_sats

# Vickrey: winner pays second-lowest bid
sorted_bids = sorted(bids, key=lambda b: b.bid_sats)
winner = sorted_bids[0]
payment = sorted_bids[1].bid_sats if len(sorted_bids) > 1 else winner.bid_sats

Tasks:

  • Implement sealed-bid collection (encrypted commitment phase)
  • Simultaneous revelation phase
  • Second-price payment calculation
  • Update the swarm bidder and routing modules (when implemented)
  • ADR: docs/adr/025-vickrey-auctions.md

6.2 Incentive-Compatible Truthfulness

  • Prove (or document) that Vickrey mechanism is incentive-compatible for the swarm use case
  • Hash-chain bid commitment to prevent bid manipulation
  • Timestamp ordering for fairness

6.3 Success Criteria

  • Auction mechanism is provably incentive-compatible
  • Winner pays second-lowest price
  • Bids are sealed during collection phase
  • No regression in task assignment quality

Phase 7: State Machine Orchestration

Priority: MEDIUM Repo: Main repo Depends on: None

7.1 Evaluate LangGraph vs Custom

The current swarm coordinator is custom-built and working. LangGraph would add: deterministic replay, human-in-the-loop checkpoints, serializable state.

Research tasks:

  • Evaluate LangGraph overhead (dependency weight, complexity)
  • Can we get replay + checkpoints without LangGraph? (custom state serialization to SQLite)
  • Does LangGraph conflict with the no-cloud-dependencies rule? (it shouldn't — it's a local library)

7.2 Minimum Viable State Machine

Whether LangGraph or custom:

  • Task lifecycle as explicit state machine (posted → bidding → assigned → executing → completed/failed)
  • State serialization to SQLite (checkpoint/resume)
  • Deterministic replay for debugging failed tasks
  • Human-in-the-loop: pause at configurable checkpoints for approval

7.3 Agent Death Detection

  • Heartbeat-based liveness checking
  • Checkpointed state enables reassignment to new agent
  • Timeout-based automatic task reassignment

7.4 Success Criteria

  • Task state is fully serializable and recoverable
  • Failed tasks can be replayed for debugging
  • Human-in-the-loop checkpoints work for sensitive operations
  • Agent failure triggers automatic task reassignment

Phase 8: Frontend Evolution

Priority: MEDIUM-LOW Repo: Main repo (src/dashboard/) Depends on: Phase 4.4 (swarm visualization data)

8.1 Alpine.js for Reactive Components

HTMX handles server-driven updates well. Alpine.js would add client-side reactivity for interactive components without a build step.

Tasks:

  • Add Alpine.js CDN to base.html
  • Identify components that need client-side state (settings toggles, form wizards, real-time filters)
  • Migrate incrementally — HTMX for server state, Alpine for client state

8.2 Three.js Swarm Visualization

Real-time 3D force-directed graph (from Phase 4.4).

  • Three.js or WebGPU renderer for swarm topology
  • Force-directed layout: nodes = agents, edges = channels/assignments
  • Node size by reputation, color by status, edge weight by payment flow
  • Target: 100+ nodes at 60fps
  • New dashboard page: /swarm/3d

8.3 Success Criteria

  • Alpine.js coexists with HTMX without conflicts
  • Swarm graph renders at 60fps with current agent count
  • No build step required (CDN or vendored JS)

Phase 9: Sandboxing

Priority: LOW (aspirational, near-term for WASM) Repo: Main repo or timmy-sandbox Depends on: Phase 7 (state machine, for checkpoint/resume in sandbox)

9.1 WASM Runtime (Near-Term)

Lightweight sandboxing for untrusted agent code.

Tasks:

  • Evaluate: Wasmtime, Wasmer, or WasmEdge as Python-embeddable runtime
  • Define sandbox API: what syscalls/capabilities are allowed
  • Agent code compiled to WASM for execution in sandbox
  • Memory-safe execution guarantee

9.2 Firecracker MicroVMs (Medium-Term)

Full VM isolation for high-security workloads.

  • Firecracker integration for agent spawning (125ms cold start)
  • Replace Docker runner with Firecracker option
  • Network isolation per agent VM

9.3 gVisor User-Space Kernel (Medium-Term)

Syscall interception layer as alternative to full VMs.

  • gVisor as Docker runtime (runsc)
  • Syscall filtering policy per agent type
  • Performance benchmarking vs standard runc

9.4 Bubblewrap (Lightweight Alternative)

  • Bubblewrap for single-process sandboxing on Linux
  • Useful for self-coding module safety

9.5 Success Criteria

  • At least one sandbox option operational for agent code execution
  • Self-coding module runs in sandbox by default
  • No sandbox escape possible via known vectors

Phase 10: Desktop Packaging (Tauri)

Priority: LOW (aspirational) Repo: timmy-desktop Depends on: Phases 1, 5 (voice and Lightning should work first)

10.1 Tauri App Shell

Tauri (Rust + WebView) instead of Electron — smaller binary, lower RAM.

Tasks:

  • Tauri project scaffold wrapping the FastAPI dashboard
  • System tray icon (Start/Stop/Status)
  • Native menu bar
  • Auto-updater
  • Embed Ollama binary (download on first run)
  • Optional: embed LND binary

10.2 First-Run Experience

  • Launch → download Ollama → pull model → create mock wallet → ready
  • Optional: connect real LND node
  • Target: usable in < 2 minutes from first launch

10.3 Success Criteria

  • Single .app (macOS) / .AppImage (Linux) / .exe (Windows)
  • Binary size < 100MB (excluding models)
  • Works offline after first-run setup

Phase 11: MLX & Unified Inference

Priority: LOW Repo: Main repo or part of timmy-voice Depends on: Phase 1 (voice engines selected first)

11.1 Direct MLX Integration

Currently MLX is accessed through AirLLM. Evaluate direct MLX for:

  • LLM inference on Apple Silicon
  • STT/TTS model execution on Apple Silicon
  • Unified runtime for all model types

Tasks:

  • Benchmark direct MLX vs AirLLM wrapper overhead
  • Evaluate MLX for running Whisper/Piper models natively
  • If beneficial, add mlx as optional dependency alongside airllm

11.2 Success Criteria

  • Measurable speedup over AirLLM wrapper on Apple Silicon
  • Single runtime for LLM + voice models (if feasible)

Phase 12: ZK-ML Verification

Priority: ASPIRATIONAL (long-horizon, 12+ months) Repo: timmy-zkml (when ready) Depends on: Phases 5, 6 (Lightning payments + auctions)

12.1 Current Reality

ZK-ML is 10-100x slower than native inference today. This phase is about tracking the field and being ready to integrate when performance is viable.

12.2 Research & Track

  • Monitor: EZKL, Modulus Labs, Giza, ZKonduit
  • Identify first viable use case: auction winner verification or payment amount calculation (small computation, high trust requirement)
  • Prototype: ZK proof of correct inference for a single small model

12.3 Target Use Cases

  1. Auction verification: Prove winner was selected correctly without revealing all bids
  2. Payment calculation: Prove payment amount is correct without revealing pricing model
  3. Inference attestation: Prove a response came from a specific model without revealing weights

12.4 Success Criteria

  • At least one ZK proof running in < 10x native inference time
  • Verifiable on-chain or via Nostr event

Cross-Cutting Concerns

Security

  • All new services follow existing security patterns (see CLAUDE.md)
  • Nostr private keys (nsec) encrypted at rest via settings.secret_key
  • Lightning macaroons encrypted at rest
  • No secrets in environment variables without warning on startup
  • Sandbox all self-coding and untrusted agent execution

Testing

  • Each repo: make test must pass before merge
  • Main repo: integration tests for each service client
  • Coverage threshold: 60% per repo (matching main repo)
  • Stubs for optional services in conftest (same pattern as current)

Graceful Degradation

Every external service integration MUST degrade gracefully:

# Pattern: try service, fallback, never crash
async def transcribe(audio: bytes) -> str:
    try:
        return await voice_client.transcribe(audio)
    except VoiceServiceUnavailable:
        logger.warning("Voice service unavailable, feature disabled")
        return ""

Configuration

All new config via pydantic-settings in each repo's config.py. Main repo config adds service URLs:

# config.py additions
voice_service_url: str = "http://localhost:8410"
nostr_relay_url: str = "ws://localhost:7777"
memory_service_url: str = ""  # empty = use built-in SQLite

Phase Dependencies

Phase 0 (Repo Foundation)
    ├── Phase 1 (Voice) ─────────────────────┐
    ├── Phase 2 (Nostr) ─────────────────────┤
    │                                         ├── Phase 10 (Tauri)
    Phase 3 (Memory) ── standalone            │
    Phase 4 (Observability) ── standalone      │
    Phase 5 (Lightning) ─┬── Phase 6 (Vickrey)│
                         └── Phase 12 (ZK-ML) │
    Phase 7 (State Machine) ── Phase 9 (Sandbox)
    Phase 8 (Frontend) ── needs Phase 4.4 data
    Phase 11 (MLX) ── needs Phase 1 decisions

Phases 0-4 can largely run in parallel. Phase 0 should be first (even if minimal — just create the repos). Phases 1 and 2 are the highest priority new work. Phases 3 and 4 can proceed independently.


Version Mapping

Version Codename Phases Theme
v2.0 Exodus Current Foundation — swarm, L402, dashboard
v2.5 Ascension 0, 1, 2, 3 Voice + Identity + Memory
v3.0 Revelation 4, 5, 6, 7 Observability + Economics + Orchestration
v3.5 Embodiment 8, 9, 10 Frontend + Sandboxing + Desktop
v4.0 Apotheosis 11, 12 Unified inference + ZK verification

How to Use This Document

For AI agents: Read this file before starting work on any integration. Check which phase your task falls under. Follow the existing patterns in CLAUDE.md. Run make test before committing.

For human developers: Each phase has research tasks (marked [ ]) and implementation tasks. Start with research tasks to validate recommendations before writing code.

For the coordinator: Track phase completion here. Update checkboxes as work completes. This document is the single source of truth for integration priorities.


From the Exodus to the Ascension. The stack continues.