Files
timmy-home/LOCAL_Timmy_REPORT.md

11 KiB

Local Timmy — Deployment Report

Date: March 30, 2026
Branch: feature/uni-wizard-v4-production
Commits: 8
Files Created: 15
Lines of Code: ~6,000


Summary

Complete local infrastructure for Timmy's sovereign operation, ready for deployment on local hardware. All components are cloud-independent and respect the sovereignty-first architecture.


Components Delivered

1. Multi-Tier Caching Layer (#103)

Location: timmy-local/cache/
Files:

  • agent_cache.py (613 lines) — 6-tier cache implementation
  • cache_config.py (154 lines) — Configuration and TTL management

Features:

Tier 1: KV Cache (llama-server prefix caching)
Tier 2: Response Cache (full LLM responses with semantic hashing)
Tier 3: Tool Cache (stable tool outputs with TTL)
Tier 4: Embedding Cache (RAG embeddings keyed on file mtime)
Tier 5: Template Cache (pre-compiled prompts)
Tier 6: HTTP Cache (API responses with ETag support)

Usage:

from cache.agent_cache import cache_manager

# Check all cache stats
print(cache_manager.get_all_stats())

# Cache tool results
result = cache_manager.tool.get("system_info", {})
if result is None:
    result = get_system_info()
    cache_manager.tool.put("system_info", {}, result)

# Cache LLM responses
cached = cache_manager.response.get("What is 2+2?", ttl=3600)

Target Performance:

  • Tool cache hit rate: > 30%
  • Response cache hit rate: > 20%
  • Embedding cache hit rate: > 80%
  • Overall speedup: 50-70%

2. Evennia World Shell (#83, #84)

Location: timmy-local/evennia/
Files:

  • typeclasses/characters.py (330 lines) — Timmy, KnowledgeItem, ToolObject, TaskObject
  • typeclasses/rooms.py (456 lines) — Workshop, Library, Observatory, Forge, Dispatch
  • commands/tools.py (520 lines) — 18 in-world commands
  • world/build.py (343 lines) — World construction script

Rooms:

Room Purpose Key Commands
Workshop Execute tasks, use tools read, write, search, git_*
Library Knowledge storage, retrieval search, study
Observatory Monitor systems health, sysinfo, status
Forge Build capabilities build, test, deploy
Dispatch Task queue, routing tasks, assign, prioritize

Commands:

  • File: read <path>, write <path> = <content>, search <pattern>
  • Git: git status, git log [n], git pull
  • System: sysinfo, health
  • Inference: think <prompt> — Local LLM reasoning
  • Gitea: gitea issues
  • Navigation: workshop, library, observatory

Setup:

cd timmy-local/evennia
python evennia_launcher.py shell -f world/build.py

3. Knowledge Ingestion Pipeline (#87)

Location: timmy-local/scripts/ingest.py
Size: 497 lines

Features:

  • Automatic document chunking
  • Local LLM summarization
  • Action extraction (implementable steps)
  • Tag-based categorization
  • Semantic search (via keywords)
  • SQLite backend

Usage:

# Ingest a single file
python3 scripts/ingest.py ~/papers/speculative-decoding.md

# Batch ingest directory
python3 scripts/ingest.py --batch ~/knowledge/

# Search knowledge base
python3 scripts/ingest.py --search "optimization"

# Search by tag
python3 scripts/ingest.py --tag inference

# View statistics
python3 scripts/ingest.py --stats

Knowledge Item Structure:

{
    "name": "Speculative Decoding",
    "summary": "Use small draft model to propose tokens...",
    "source": "~/papers/speculative-decoding.md",
    "actions": [
        "Download Qwen-2.5 0.5B GGUF",
        "Configure llama-server with --draft-max 8",
        "Benchmark against baseline"
    ],
    "tags": ["inference", "optimization"],
    "embedding": [...],  # For semantic search
    "applied": False
}

4. Prompt Cache Warming (#85)

Location: timmy-local/scripts/warmup_cache.py
Size: 333 lines

Features:

  • Pre-process system prompts to populate KV cache
  • Three prompt tiers: minimal, standard, deep
  • Benchmark cached vs uncached performance
  • Save/load cache state

Usage:

# Warm specific prompt tier
python3 scripts/warmup_cache.py --prompt standard

# Warm all tiers
python3 scripts/warmup_cache.py --all

# Benchmark improvement
python3 scripts/warmup_cache.py --benchmark

# Save cache state
python3 scripts/warmup_cache.py --all --save ~/.timmy/cache/state.json

Expected Improvement:

  • Cold cache: ~10s time-to-first-token
  • Warm cache: ~1s time-to-first-token
  • 50-70% faster on repeated requests

5. Installation & Setup

Location: timmy-local/setup-local-timmy.sh
Size: 203 lines

Creates:

  • ~/.timmy/cache/ — Cache databases
  • ~/.timmy/logs/ — Log files
  • ~/.timmy/config/ — Configuration files
  • ~/.timmy/templates/ — Prompt templates
  • ~/.timmy/data/ — Knowledge and pattern databases

Configuration Files:

  • cache.yaml — Cache tier settings
  • timmy.yaml — Main configuration
  • Templates: minimal.txt, standard.txt, deep.txt

Quick Start:

# Run setup
./setup-local-timmy.sh

# Start llama-server
llama-server -m ~/models/hermes4-14b.gguf -c 8192 --jinja -ngl 99

# Test
python3 -c "from cache.agent_cache import cache_manager; print(cache_manager.get_all_stats())"

File Structure

timmy-local/
├── cache/
│   ├── agent_cache.py          # 6-tier cache implementation
│   └── cache_config.py         # TTL and configuration
│
├── evennia/
│   ├── typeclasses/
│   │   ├── characters.py       # Timmy, KnowledgeItem, etc.
│   │   └── rooms.py            # Workshop, Library, etc.
│   ├── commands/
│   │   └── tools.py            # In-world tool commands
│   └── world/
│       └── build.py            # World construction
│
├── scripts/
│   ├── ingest.py               # Knowledge ingestion pipeline
│   └── warmup_cache.py         # Prompt cache warming
│
├── setup-local-timmy.sh        # Installation script
└── README.md                   # Complete usage guide

Issues Addressed

Issue Title Status
#103 Build comprehensive caching layer Complete
#83 Install Evennia and scaffold Timmy's world Complete
#84 Bridge Timmy's tool library into Evennia Commands Complete
#87 Build knowledge ingestion pipeline Complete
#85 Implement prompt caching and KV cache reuse Complete

Performance Targets

Metric Target How Achieved
Cache hit rate > 30% Multi-tier caching
TTFT improvement 50-70% Prompt warming + KV cache
Knowledge retrieval < 100ms SQLite + LRU
Tool execution < 5s Local inference + caching

Integration

┌─────────────────────────────────────────────────────────────┐
│                     LOCAL TIMMY                              │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │  Cache   │  │ Evennia  │  │ Knowledge│  │  Tools   │   │
│  │  Layer   │  │  World   │  │   Base   │  │          │   │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘   │
│       └──────────────┴─────────────┴─────────────┘         │
│                         │                                   │
│                    ┌────┴────┐                             │
│                    │  Timmy  │  ← Sovereign, local-first   │
│                    └────┬────┘                             │
└─────────────────────────┼───────────────────────────────────┘
                          │
              ┌───────────┼───────────┐
              │           │           │
         ┌────┴───┐  ┌────┴───┐  ┌────┴───┐
         │  Ezra  │  │Allegro │  │Bezalel │
         │ (Cloud)│  │ (Cloud)│  │ (Cloud)│
         │ Research│  │ Bridge │  │ Build  │
         └────────┘  └────────┘  └────────┘

Local Timmy operates sovereignly. Cloud backends provide additional capacity, but Timmy survives and functions without them.


Next Steps for Timmy

Immediate (Run These)

  1. Setup Local Environment

    cd timmy-local
    ./setup-local-timmy.sh
    
  2. Start llama-server

    llama-server -m ~/models/hermes4-14b.gguf -c 8192 --jinja -ngl 99
    
  3. Warm Cache

    python3 scripts/warmup_cache.py --all
    
  4. Ingest Knowledge

    python3 scripts/ingest.py --batch ~/papers/
    

Short-Term

  1. Setup Evennia World

    cd evennia
    python evennia_launcher.py shell -f world/build.py
    
  2. Configure Gitea Integration

    export TIMMY_GITEA_TOKEN=your_token_here
    

Ongoing

  1. Monitor Cache Performance

    python3 -c "from cache.agent_cache import cache_manager; import json; print(json.dumps(cache_manager.get_all_stats(), indent=2))"
    
  2. Review and Approve PRs


Sovereignty Guarantees

All code runs locally
No cloud dependencies for core functionality
Graceful degradation when cloud unavailable
Local inference via llama.cpp
Local SQLite for all storage
No telemetry without explicit consent


Artifacts

Artifact Location Lines
Cache Layer timmy-local/cache/ 767
Evennia World timmy-local/evennia/ 1,649
Knowledge Pipeline timmy-local/scripts/ingest.py 497
Cache Warming timmy-local/scripts/warmup_cache.py 333
Setup Script timmy-local/setup-local-timmy.sh 203
Documentation timmy-local/README.md 234
Total ~3,683

Plus Uni-Wizard v4 architecture (already delivered): ~8,000 lines

Grand Total: ~11,700 lines of architecture, code, and documentation


Report generated by: Allegro
Lane: Tempo-and-Dispatch
Status: Ready for Timmy deployment