Files

Alexander Whitestone 39f2eb418a Remove stale references from documentation across 9 files (#139 )

2026-03-07 07:28:14 -05:00

28 KiB

Raw Blame History

ROADMAP.md — Integration Roadmap for Timmy Time

Origin: The Ascension of Timmy: Beyond the Exodus Version: 0.1.0 (draft) Last updated: 2026-02-28 Maintainer: @AlexanderWhitestone

This document guides AI agents and human developers through the full integration plan for Timmy Time. Each phase is independently valuable. Phases are ordered by priority; dependencies are noted where they exist.

Repo strategy: Multi-repo ecosystem. Each heavy integration gets its own repo, dockerized as a service. The main Timmy-time-dashboard repo stays lean — it consumes these services via HTTP/WebSocket/gRPC.

Current State (v2.0 Exodus)

What already works:

Component	Status	Implementation
Voice TTS	pyttsx3 (robotic)	`src/timmy_serve/voice_tts.py`
Voice STT	Browser Web Speech API	`src/dashboard/templates/voice_button.html`
Voice NLU	Regex-based intent detection	`src/integrations/voice/nlu.py`
Frontend	HTMX + Bootstrap + marked.js	`src/dashboard/templates/base.html`
LLM router	Cascade with circuit breaker	`src/infrastructure/router/cascade.py`

What does NOT exist yet:

Piper TTS · Faster-Whisper · Chroma · CRDTs · LangGraph · Nostr · ZK-ML · Prometheus/Grafana/Loki · OpenTelemetry · Three.js · Tauri · Alpine.js · Firecracker/gVisor/WASM sandboxing · BOLT12 · Vickrey auctions

Phase 0: Multi-Repo Foundation

Goal: Establish the repo ecosystem before building new services.

Repos to create

Repo	Purpose	Interface to main repo
`timmy-voice`	STT + TTS service	HTTP REST + WebSocket streaming
`timmy-nostr`	Nostr relay client, identity, reputation	HTTP REST + event stream
`timmy-memory`	Vector DB service (if external DB chosen)	HTTP REST
`timmy-observe`	Metrics collection + export	Prometheus scrape endpoint

Repo template

Each service repo follows:

timmy-<service>/
├── src/                    # Python source
├── tests/
├── Dockerfile
├── docker-compose.yml      # Standalone dev setup
├── pyproject.toml
├── Makefile                # make test, make dev, make docker-build
├── CLAUDE.md               # AI agent instructions specific to this repo
└── README.md

Main repo changes

Add docker-compose.services.yml for orchestrating external services
Add thin client modules in src/infrastructure/clients/ for each service
Each client follows graceful degradation: if the service is down, log and return a fallback (never crash)

Decision record

Create docs/adr/023-multi-repo-strategy.md documenting the split rationale.

Phase 1: Sovereign Voice

Priority: HIGHEST — most urgent integration Repo: timmy-voice Depends on: Phase 0 (repo scaffold)

1.1 Research & Select Engines

Before writing code, evaluate these candidates. The goal is ONE engine per concern that works across hardware tiers (Pi 4 through desktop GPU).

STT Candidates

Engine	Size	Speed	Offline	Notes
Faster-Whisper	39M–1.5G	4-7x over Whisper	Yes	CTranslate2, INT8 quantization, mature ecosystem
Moonshine	27M–245M	100x faster than Whisper large on CPU	Yes	New (Feb 2026), edge-first, streaming capable
Vosk	50M–1.8G	Real-time on Pi	Yes	Kaldi-based, very lightweight, good for embedded
whisper.cpp	Same as Whisper	CPU-optimized C++	Yes	llama.cpp ecosystem, GGML quantization

Research tasks:

Benchmark Moonshine vs Faster-Whisper vs whisper.cpp on: (a) RPi 4 4GB, (b) M-series Mac, (c) Linux desktop with GPU
Evaluate streaming vs batch transcription for each
Test accuracy on accented speech and technical vocabulary
Measure cold-start latency (critical for voice UX)

Recommendation to validate: Moonshine for edge, Faster-Whisper for desktop, with a unified API wrapper that selects by hardware tier.

TTS Candidates

Engine	Size	Speed	Offline	Notes
Piper	4–16M per voice	10-30x real-time on Pi 4	Yes	VITS arch, ONNX, production-proven, many voices
Kokoro	82M params	Fast on CPU	Yes	Apache 2.0, quality rivals large models
Coqui/XTTS-v2	1.5G+	Needs GPU	Yes	Voice cloning, multilingual — but company shut down
F5-TTS	Medium	Needs GPU	Yes	Flow matching, 10s voice clone, MIT license

Research tasks:

Benchmark Piper vs Kokoro on: (a) RPi 4, (b) desktop CPU, (c) desktop GPU
Compare voice naturalness (subjective listening test)
Test Piper custom voice training pipeline (for Timmy's voice)
Evaluate Kokoro Apache 2.0 licensing for commercial use

Recommendation to validate: Piper for edge (proven on Pi), Kokoro for desktop quality, with a TTS provider interface that swaps transparently.

1.2 Architecture

┌─────────────────────────────────────────────┐
│                timmy-voice                   │
│                                              │
│  ┌──────────┐   ┌──────────┐   ┌─────────┐ │
│  │ STT      │   │ TTS      │   │ NLU     │ │
│  │ Engine   │   │ Engine   │   │ (move   │ │
│  │ (select  │   │ (select  │   │  from   │ │
│  │  by hw)  │   │  by hw)  │   │  main)  │ │
│  └────┬─────┘   └────┬─────┘   └────┬────┘ │
│       │              │              │       │
│  ┌────┴──────────────┴──────────────┴────┐  │
│  │         FastAPI / WebSocket API        │  │
│  │  POST /transcribe (audio → text)      │  │
│  │  POST /speak (text → audio)           │  │
│  │  WS   /stream (real-time STT)         │  │
│  │  POST /understand (text → intent)     │  │
│  └───────────────────────────────────────┘  │
│                                              │
│  Docker: timmy-voice:latest                  │
│  Ports: 8410 (HTTP) / 8411 (WS)             │
└─────────────────────────────────────────────┘

1.3 Integration with Main Repo

Add src/infrastructure/clients/voice_client.py — async HTTP/WS client
Replace browser Web Speech API with calls to timmy-voice service
Replace pyttsx3 calls with TTS service calls
Move src/integrations/voice/nlu.py to timmy-voice repo
Keep graceful fallback: if voice service unavailable, disable voice features in the UI (don't crash)

1.4 Deliverables

STT engine benchmarks documented in timmy-voice/docs/benchmarks.md
TTS engine benchmarks documented alongside
Working Docker container with REST + WebSocket API
Client integration in main repo
Tests: unit tests in timmy-voice, integration tests in main repo
Dashboard voice button works end-to-end through the service

1.5 Success Criteria

STT: < 500ms latency for 5-second utterance on desktop, < 2s on Pi 4
TTS: Naturalness score > 3.5/5 (subjective), real-time factor > 5x on Pi 4
Zero cloud dependencies for voice pipeline
make test passes in both repos

Phase 2: Nostr Identity & Reputation

Priority: HIGH Repo: timmy-nostr Depends on: Phase 0

2.1 Scope

Full Nostr citizen: agent identity, user auth, relay publishing, reputation.

2.2 Agent Identity

Each swarm agent gets a Nostr keypair (nsec/npub).

# Agent identity lifecycle
agent = SwarmAgent(persona="forge")
agent.nostr_keys = generate_keypair()      # nsec stored encrypted
agent.nip05 = "forge@timmy.local"          # NIP-05 verification
agent.publish_profile()                     # kind:0 metadata event

Tasks:

Keypair generation and encrypted storage (use from config import settings for encryption key)
NIP-01: Basic event publishing (kind:0 metadata, kind:1 notes)
NIP-05: DNS-based identifier verification (for @timmy.local or custom domain)
NIP-39: External identity linking (link agent npub to Lightning node pubkey, GitHub, etc.)

2.3 User Authentication

Users authenticate via Nostr keys instead of traditional auth.

Tasks:

NIP-07: Browser extension signer integration (nos2x, Alby)
NIP-42: Client authentication to relay
NIP-44: Encrypted direct messages (XChaCha20-Poly1305 v2)
Session management: Nostr pubkey → session token

2.4 Reputation System

Agents build portable reputation through signed event history.

Tasks:

NIP-32: Labeling — agents rate each other's work quality
Reputation score calculation from label events
Cross-instance reputation portability (reputation follows the npub)
Dashboard: agent profile page showing Nostr identity + reputation

2.5 Relay Infrastructure

Tasks:

Embed or connect to a Nostr relay (evaluate: strfry, nostr-rs-relay)
Publish agent work events (task completed, bid won, etc.) to relay
Subscribe to events from other Timmy instances (federation via Nostr)
Data Vending Machine (DVM) pattern: advertise agent capabilities as Nostr events, receive job requests, deliver results, get paid in sats

2.6 Integration with Main Repo

Add src/infrastructure/clients/nostr_client.py
Modify the swarm coordinator to publish task/bid/completion events
Add Nostr auth option to dashboard login
Agent profile pages show npub, NIP-05, reputation score

2.7 Key References

Clawstr — Nostr-native AI agent social network (NIP-22, NIP-73)
ai.wot — Cross-platform trust attestations via NIP-32
NIP-101 — Decentralized Trust System
Nostr NIPs repo

2.8 Success Criteria

Every swarm agent has a Nostr identity (npub)
Users can log in via NIP-07 browser extension
Agent work history is published to a relay
Reputation scores are visible on agent profile pages
Two separate Timmy instances can discover each other via relay

Phase 3: Semantic Memory Evolution

Priority: HIGH Repo: Likely stays in main repo (lightweight) or timmy-memory (if heavy) Depends on: None (can start in parallel)

3.1 Research Vector DB Alternatives

The current implementation uses SQLite + in-Python cosine similarity with a hash-based embedding fallback. This needs to be evaluated against proper vector search solutions.

Candidates

DB	Architecture	Index	Best Scale	Server?	License
sqlite-vec	SQLite extension	Brute-force KNN	Thousands–100K	No	MIT
LanceDB	Embedded, disk-based	IVF_PQ	Up to ~10M	No	Apache 2.0
Chroma	Client-server or embedded	HNSW	Up to ~10M	Optional	Apache 2.0
Qdrant	Client-server	HNSW	100M+	Yes	Apache 2.0

Research tasks:

Benchmark current SQLite implementation: query latency at 1K, 10K, 100K memories
Test sqlite-vec as drop-in upgrade (same SQLite, add extension)
Test LanceDB embedded mode (no server, disk-based, Arrow format)
Evaluate whether Chroma or Qdrant are needed at current scale
Document findings in docs/adr/024-vector-db-selection.md

Recommendation to validate: sqlite-vec is the most natural upgrade path (already using SQLite, zero new dependencies, MIT license). LanceDB if we outgrow brute-force KNN. Chroma/Qdrant only if we need client-server architecture.

3.2 Embedding Model Upgrade

Current: all-MiniLM-L6-v2 (sentence-transformers) with hash fallback.

Research tasks:

Evaluate nomic-embed-text via Ollama (keeps everything local, no sentence-transformers dependency)
Evaluate all-MiniLM-L6-v2 vs bge-small-en-v1.5 vs nomic-embed-text on retrieval quality
Decide: keep sentence-transformers, or use Ollama embeddings for everything?

3.3 Memory Architecture Improvements

Episodic memory: condensed summaries of past conversations with entity and intent tags
Procedural memory: tool/skill embeddings for natural language invocation
Temporal constraints: time-weighted retrieval (recent memories scored higher)
Memory pruning: automatic compaction of old, low-relevance memories

3.4 CRDTs for Multi-Device Sync

Timeline: Later phase (after vector DB selection is settled)

Research CRDT libraries: yrs (Yjs Rust port), automerge
Design sync protocol for memory entries across devices
Evaluate: is CRDT sync needed, or can we use a simpler last-write-wins approach with conflict detection?

3.5 Success Criteria

Vector search latency < 50ms at 100K memories
Retrieval quality measurably improves over current hash fallback
No new server process required (embedded preferred)
Existing memories migrate without loss

Phase 4: Observability Stack

Priority: MEDIUM-HIGH Repo: timmy-observe (collector + dashboards) or integrated Depends on: None

4.1 Prometheus Metrics

Add a /metrics endpoint to the main dashboard (FastAPI).

Metrics to expose:

timmy_tasks_total{status,persona} — task counts by status and agent
timmy_auction_duration_seconds — auction completion time
timmy_llm_request_duration_seconds{provider,model} — LLM latency
timmy_llm_tokens_total{provider,direction} — token usage
timmy_lightning_balance_sats — treasury balance
timmy_memory_count — total memories stored
timmy_ws_connections — active WebSocket connections
timmy_agent_health{persona} — agent liveness

Tasks:

Add prometheus_client to dependencies
Instrument the swarm coordinator (task lifecycle metrics)
Instrument src/infrastructure/router/cascade.py (LLM metrics)
Instrument the Lightning ledger module (when implemented) (financial metrics)
Add /metrics route in src/dashboard/routes/
Grafana dashboard JSON in deploy/grafana/

4.2 Structured Logging with Loki

Replace ad-hoc logging with structured JSON logs that Loki can ingest.

Tasks:

Add python-json-logger or structlog
Standardize log format: {timestamp, level, module, event, context}
Add Loki + Promtail to docker-compose.services.yml
Grafana Loki datasource in dashboard config

4.3 OpenTelemetry Distributed Tracing

Trace requests across services (dashboard → voice → LLM → swarm).

Tasks:

Add opentelemetry-api, opentelemetry-sdk, opentelemetry-instrumentation-fastapi
Instrument FastAPI with auto-instrumentation
Propagate trace context to timmy-voice and other services
Add Jaeger or Tempo to docker-compose.services.yml
Grafana Tempo datasource

4.4 Swarm Visualization

Real-time force-directed graph of agent topology.

Tasks:

Evaluate: Three.js vs D3.js force layout vs Cytoscape.js
WebSocket feed of swarm topology events (already have /swarm/events)
Nodes: agents (sized by reputation/stake, colored by status)
Edges: task assignments, Lightning channels
Add as new dashboard page: /swarm/graph

4.5 Success Criteria

Prometheus scrapes metrics every 15s
Grafana dashboard shows task throughput, LLM latency, agent health
Log search across all services via Loki
Request traces span from HTTP request to LLM response

Phase 5: Lightning Maturation

Priority: MEDIUM — extends existing code Repo: Main repo + possibly timmy-lightning for LND Depends on: None (existing foundation is solid)

5.1 LND gRPC (already planned in REVELATION_PLAN)

Generate protobuf stubs from LND source
Implement LndBackend methods (currently NotImplementedError)
Connection pooling, macaroon encryption, TLS validation
Integration tests against regtest

5.2 BOLT12 Offers

Static, reusable payment requests with blinded paths for payer privacy.

Research BOLT12 support in LND vs CLN vs LDK
Implement offer creation and redemption
Agent-level offers: each agent has a persistent payment endpoint

5.3 HTLC/PTLC Extensions

HTLC: Hash Time-Locked Contracts for conditional payments
PTLC: Point Time-Locked Contracts (Taproot, privacy-preserving)
Use case: agent escrow — payment locked until task completion verified

5.4 Autonomous Treasury (already planned in REVELATION_PLAN)

Per-agent balance tracking
Cold storage sweep threshold
Earnings dashboard
Withdrawal approval queue

5.5 Success Criteria

Create and settle real invoices on regtest
Agents have persistent BOLT12 offers
Treasury dashboard shows real balances
Graceful fallback to mock when LND unavailable

Phase 6: Vickrey Auctions & Agent Economics

Priority: MEDIUM Repo: Main repo Depends on: Phase 5 (Lightning, for real payments)

6.1 Upgrade to Vickrey (Second-Price) Auction

Current: first-price lowest-bid. Manifesto calls for Vickrey.

# Current: winner pays their own bid
winner = min(bids, key=lambda b: b.bid_sats)
payment = winner.bid_sats

# Vickrey: winner pays second-lowest bid
sorted_bids = sorted(bids, key=lambda b: b.bid_sats)
winner = sorted_bids[0]
payment = sorted_bids[1].bid_sats if len(sorted_bids) > 1 else winner.bid_sats

Tasks:

Implement sealed-bid collection (encrypted commitment phase)
Simultaneous revelation phase
Second-price payment calculation
Update the swarm bidder and routing modules (when implemented)
ADR: docs/adr/025-vickrey-auctions.md

6.2 Incentive-Compatible Truthfulness

Prove (or document) that Vickrey mechanism is incentive-compatible for the swarm use case
Hash-chain bid commitment to prevent bid manipulation
Timestamp ordering for fairness

6.3 Success Criteria

Auction mechanism is provably incentive-compatible
Winner pays second-lowest price
Bids are sealed during collection phase
No regression in task assignment quality

Phase 7: State Machine Orchestration

Priority: MEDIUM Repo: Main repo Depends on: None

7.1 Evaluate LangGraph vs Custom

The current swarm coordinator is custom-built and working. LangGraph would add: deterministic replay, human-in-the-loop checkpoints, serializable state.

Research tasks:

Evaluate LangGraph overhead (dependency weight, complexity)
Can we get replay + checkpoints without LangGraph? (custom state serialization to SQLite)
Does LangGraph conflict with the no-cloud-dependencies rule? (it shouldn't — it's a local library)

7.2 Minimum Viable State Machine

Whether LangGraph or custom:

Task lifecycle as explicit state machine (posted → bidding → assigned → executing → completed/failed)
State serialization to SQLite (checkpoint/resume)
Deterministic replay for debugging failed tasks
Human-in-the-loop: pause at configurable checkpoints for approval

7.3 Agent Death Detection

Heartbeat-based liveness checking
Checkpointed state enables reassignment to new agent
Timeout-based automatic task reassignment

7.4 Success Criteria

Task state is fully serializable and recoverable
Failed tasks can be replayed for debugging
Human-in-the-loop checkpoints work for sensitive operations
Agent failure triggers automatic task reassignment

Phase 8: Frontend Evolution

Priority: MEDIUM-LOW Repo: Main repo (src/dashboard/) Depends on: Phase 4.4 (swarm visualization data)

8.1 Alpine.js for Reactive Components

HTMX handles server-driven updates well. Alpine.js would add client-side reactivity for interactive components without a build step.

Tasks:

Add Alpine.js CDN to base.html
Identify components that need client-side state (settings toggles, form wizards, real-time filters)
Migrate incrementally — HTMX for server state, Alpine for client state

8.2 Three.js Swarm Visualization

Real-time 3D force-directed graph (from Phase 4.4).

Three.js or WebGPU renderer for swarm topology
Force-directed layout: nodes = agents, edges = channels/assignments
Node size by reputation, color by status, edge weight by payment flow
Target: 100+ nodes at 60fps
New dashboard page: /swarm/3d

8.3 Success Criteria

Alpine.js coexists with HTMX without conflicts
Swarm graph renders at 60fps with current agent count
No build step required (CDN or vendored JS)

Phase 9: Sandboxing

Priority: LOW (aspirational, near-term for WASM) Repo: Main repo or timmy-sandbox Depends on: Phase 7 (state machine, for checkpoint/resume in sandbox)

9.1 WASM Runtime (Near-Term)

Lightweight sandboxing for untrusted agent code.

Tasks:

Evaluate: Wasmtime, Wasmer, or WasmEdge as Python-embeddable runtime
Define sandbox API: what syscalls/capabilities are allowed
Agent code compiled to WASM for execution in sandbox
Memory-safe execution guarantee

9.2 Firecracker MicroVMs (Medium-Term)

Full VM isolation for high-security workloads.

Firecracker integration for agent spawning (125ms cold start)
Replace Docker runner with Firecracker option
Network isolation per agent VM

9.3 gVisor User-Space Kernel (Medium-Term)

Syscall interception layer as alternative to full VMs.

gVisor as Docker runtime (runsc)
Syscall filtering policy per agent type
Performance benchmarking vs standard runc

9.4 Bubblewrap (Lightweight Alternative)

Bubblewrap for single-process sandboxing on Linux
Useful for self-coding module safety

9.5 Success Criteria

At least one sandbox option operational for agent code execution
Self-coding module runs in sandbox by default
No sandbox escape possible via known vectors

Phase 10: Desktop Packaging (Tauri)

Priority: LOW (aspirational) Repo: timmy-desktop Depends on: Phases 1, 5 (voice and Lightning should work first)

10.1 Tauri App Shell

Tauri (Rust + WebView) instead of Electron — smaller binary, lower RAM.

Tasks:

Tauri project scaffold wrapping the FastAPI dashboard
System tray icon (Start/Stop/Status)
Native menu bar
Auto-updater
Embed Ollama binary (download on first run)
Optional: embed LND binary

10.2 First-Run Experience

Launch → download Ollama → pull model → create mock wallet → ready
Optional: connect real LND node
Target: usable in < 2 minutes from first launch

10.3 Success Criteria

Single .app (macOS) / .AppImage (Linux) / .exe (Windows)
Binary size < 100MB (excluding models)
Works offline after first-run setup

Phase 11: MLX & Unified Inference

Priority: LOW Repo: Main repo or part of timmy-voice Depends on: Phase 1 (voice engines selected first)

11.1 Direct MLX Integration

Currently MLX is accessed through AirLLM. Evaluate direct MLX for:

LLM inference on Apple Silicon
STT/TTS model execution on Apple Silicon
Unified runtime for all model types

Tasks:

Benchmark direct MLX vs AirLLM wrapper overhead
Evaluate MLX for running Whisper/Piper models natively
If beneficial, add mlx as optional dependency alongside airllm

11.2 Success Criteria

Measurable speedup over AirLLM wrapper on Apple Silicon
Single runtime for LLM + voice models (if feasible)

Phase 12: ZK-ML Verification

Priority: ASPIRATIONAL (long-horizon, 12+ months) Repo: timmy-zkml (when ready) Depends on: Phases 5, 6 (Lightning payments + auctions)

12.1 Current Reality

ZK-ML is 10-100x slower than native inference today. This phase is about tracking the field and being ready to integrate when performance is viable.

12.2 Research & Track

Monitor: EZKL, Modulus Labs, Giza, ZKonduit
Identify first viable use case: auction winner verification or payment amount calculation (small computation, high trust requirement)
Prototype: ZK proof of correct inference for a single small model

12.3 Target Use Cases

Auction verification: Prove winner was selected correctly without revealing all bids
Payment calculation: Prove payment amount is correct without revealing pricing model
Inference attestation: Prove a response came from a specific model without revealing weights

12.4 Success Criteria

At least one ZK proof running in < 10x native inference time
Verifiable on-chain or via Nostr event

Cross-Cutting Concerns

Security

All new services follow existing security patterns (see CLAUDE.md)
Nostr private keys (nsec) encrypted at rest via settings.secret_key
Lightning macaroons encrypted at rest
No secrets in environment variables without warning on startup
Sandbox all self-coding and untrusted agent execution

Testing

Each repo: make test must pass before merge
Main repo: integration tests for each service client
Coverage threshold: 60% per repo (matching main repo)
Stubs for optional services in conftest (same pattern as current)

Graceful Degradation

Every external service integration MUST degrade gracefully:

# Pattern: try service, fallback, never crash
async def transcribe(audio: bytes) -> str:
    try:
        return await voice_client.transcribe(audio)
    except VoiceServiceUnavailable:
        logger.warning("Voice service unavailable, feature disabled")
        return ""

Configuration

All new config via pydantic-settings in each repo's config.py. Main repo config adds service URLs:

# config.py additions
voice_service_url: str = "http://localhost:8410"
nostr_relay_url: str = "ws://localhost:7777"
memory_service_url: str = ""  # empty = use built-in SQLite

Phase Dependencies

Phase 0 (Repo Foundation)
    ├── Phase 1 (Voice) ─────────────────────┐
    ├── Phase 2 (Nostr) ─────────────────────┤
    │                                         ├── Phase 10 (Tauri)
    Phase 3 (Memory) ── standalone            │
    Phase 4 (Observability) ── standalone      │
    Phase 5 (Lightning) ─┬── Phase 6 (Vickrey)│
                         └── Phase 12 (ZK-ML) │
    Phase 7 (State Machine) ── Phase 9 (Sandbox)
    Phase 8 (Frontend) ── needs Phase 4.4 data
    Phase 11 (MLX) ── needs Phase 1 decisions

Phases 0-4 can largely run in parallel. Phase 0 should be first (even if minimal — just create the repos). Phases 1 and 2 are the highest priority new work. Phases 3 and 4 can proceed independently.

Version Mapping

Version	Codename	Phases	Theme
v2.0	Exodus	Current	Foundation — swarm, L402, dashboard
v2.5	Ascension	0, 1, 2, 3	Voice + Identity + Memory
v3.0	Revelation	4, 5, 6, 7	Observability + Economics + Orchestration
v3.5	Embodiment	8, 9, 10	Frontend + Sandboxing + Desktop
v4.0	Apotheosis	11, 12	Unified inference + ZK verification

How to Use This Document

For AI agents: Read this file before starting work on any integration. Check which phase your task falls under. Follow the existing patterns in CLAUDE.md. Run make test before committing.

For human developers: Each phase has research tasks (marked [ ]) and implementation tasks. Start with research tasks to validate recommendations before writing code.

For the coordinator: Track phase completion here. Update checkboxes as work completes. This document is the single source of truth for integration priorities.

From the Exodus to the Ascension. The stack continues.

28 KiB Raw Blame History Unescape Escape

ROADMAP.md — Integration Roadmap for Timmy Time

Current State (v2.0 Exodus)

Phase 0: Multi-Repo Foundation

Repos to create

Repo template

Main repo changes

Decision record

Phase 1: Sovereign Voice

1.1 Research & Select Engines

STT Candidates

TTS Candidates

1.2 Architecture

1.3 Integration with Main Repo

1.4 Deliverables

1.5 Success Criteria

Phase 2: Nostr Identity & Reputation

2.1 Scope

2.2 Agent Identity

2.3 User Authentication

2.4 Reputation System

2.5 Relay Infrastructure

2.6 Integration with Main Repo

2.7 Key References

2.8 Success Criteria

Phase 3: Semantic Memory Evolution

3.1 Research Vector DB Alternatives

Candidates

3.2 Embedding Model Upgrade

3.3 Memory Architecture Improvements

3.4 CRDTs for Multi-Device Sync

3.5 Success Criteria

Phase 4: Observability Stack

4.1 Prometheus Metrics

4.2 Structured Logging with Loki

4.3 OpenTelemetry Distributed Tracing

4.4 Swarm Visualization

4.5 Success Criteria

Phase 5: Lightning Maturation

5.1 LND gRPC (already planned in REVELATION_PLAN)

5.2 BOLT12 Offers

5.3 HTLC/PTLC Extensions

5.4 Autonomous Treasury (already planned in REVELATION_PLAN)

5.5 Success Criteria

Phase 6: Vickrey Auctions & Agent Economics

6.1 Upgrade to Vickrey (Second-Price) Auction

6.2 Incentive-Compatible Truthfulness

6.3 Success Criteria

Phase 7: State Machine Orchestration

7.1 Evaluate LangGraph vs Custom

7.2 Minimum Viable State Machine

7.3 Agent Death Detection

7.4 Success Criteria

Phase 8: Frontend Evolution

8.1 Alpine.js for Reactive Components

8.2 Three.js Swarm Visualization

8.3 Success Criteria

Phase 9: Sandboxing

9.1 WASM Runtime (Near-Term)

9.2 Firecracker MicroVMs (Medium-Term)

9.3 gVisor User-Space Kernel (Medium-Term)

9.4 Bubblewrap (Lightweight Alternative)

9.5 Success Criteria

Phase 10: Desktop Packaging (Tauri)

10.1 Tauri App Shell

10.2 First-Run Experience

10.3 Success Criteria

Phase 11: MLX & Unified Inference

11.1 Direct MLX Integration

11.2 Success Criteria

Phase 12: ZK-ML Verification

12.1 Current Reality

12.2 Research & Track

12.3 Target Use Cases

12.4 Success Criteria

Cross-Cutting Concerns

Security

Testing

Graceful Degradation

Configuration

28 KiB

Raw Blame History