11 KiB
Local Timmy — Deployment Report
Date: March 30, 2026
Branch: feature/uni-wizard-v4-production
Commits: 8
Files Created: 15
Lines of Code: ~6,000
Summary
Complete local infrastructure for Timmy's sovereign operation, ready for deployment on local hardware. All components are cloud-independent and respect the sovereignty-first architecture.
Components Delivered
1. Multi-Tier Caching Layer (#103)
Location: timmy-local/cache/
Files:
agent_cache.py(613 lines) — 6-tier cache implementationcache_config.py(154 lines) — Configuration and TTL management
Features:
Tier 1: KV Cache (llama-server prefix caching)
Tier 2: Response Cache (full LLM responses with semantic hashing)
Tier 3: Tool Cache (stable tool outputs with TTL)
Tier 4: Embedding Cache (RAG embeddings keyed on file mtime)
Tier 5: Template Cache (pre-compiled prompts)
Tier 6: HTTP Cache (API responses with ETag support)
Usage:
from cache.agent_cache import cache_manager
# Check all cache stats
print(cache_manager.get_all_stats())
# Cache tool results
result = cache_manager.tool.get("system_info", {})
if result is None:
result = get_system_info()
cache_manager.tool.put("system_info", {}, result)
# Cache LLM responses
cached = cache_manager.response.get("What is 2+2?", ttl=3600)
Target Performance:
- Tool cache hit rate: > 30%
- Response cache hit rate: > 20%
- Embedding cache hit rate: > 80%
- Overall speedup: 50-70%
2. Evennia World Shell (#83, #84)
Location: timmy-local/evennia/
Files:
typeclasses/characters.py(330 lines) — Timmy, KnowledgeItem, ToolObject, TaskObjecttypeclasses/rooms.py(456 lines) — Workshop, Library, Observatory, Forge, Dispatchcommands/tools.py(520 lines) — 18 in-world commandsworld/build.py(343 lines) — World construction script
Rooms:
| Room | Purpose | Key Commands |
|---|---|---|
| Workshop | Execute tasks, use tools | read, write, search, git_* |
| Library | Knowledge storage, retrieval | search, study |
| Observatory | Monitor systems | health, sysinfo, status |
| Forge | Build capabilities | build, test, deploy |
| Dispatch | Task queue, routing | tasks, assign, prioritize |
Commands:
- File:
read <path>,write <path> = <content>,search <pattern> - Git:
git status,git log [n],git pull - System:
sysinfo,health - Inference:
think <prompt>— Local LLM reasoning - Gitea:
gitea issues - Navigation:
workshop,library,observatory
Setup:
cd timmy-local/evennia
python evennia_launcher.py shell -f world/build.py
3. Knowledge Ingestion Pipeline (#87)
Location: timmy-local/scripts/ingest.py
Size: 497 lines
Features:
- Automatic document chunking
- Local LLM summarization
- Action extraction (implementable steps)
- Tag-based categorization
- Semantic search (via keywords)
- SQLite backend
Usage:
# Ingest a single file
python3 scripts/ingest.py ~/papers/speculative-decoding.md
# Batch ingest directory
python3 scripts/ingest.py --batch ~/knowledge/
# Search knowledge base
python3 scripts/ingest.py --search "optimization"
# Search by tag
python3 scripts/ingest.py --tag inference
# View statistics
python3 scripts/ingest.py --stats
Knowledge Item Structure:
{
"name": "Speculative Decoding",
"summary": "Use small draft model to propose tokens...",
"source": "~/papers/speculative-decoding.md",
"actions": [
"Download Qwen-2.5 0.5B GGUF",
"Configure llama-server with --draft-max 8",
"Benchmark against baseline"
],
"tags": ["inference", "optimization"],
"embedding": [...], # For semantic search
"applied": False
}
4. Prompt Cache Warming (#85)
Location: timmy-local/scripts/warmup_cache.py
Size: 333 lines
Features:
- Pre-process system prompts to populate KV cache
- Three prompt tiers: minimal, standard, deep
- Benchmark cached vs uncached performance
- Save/load cache state
Usage:
# Warm specific prompt tier
python3 scripts/warmup_cache.py --prompt standard
# Warm all tiers
python3 scripts/warmup_cache.py --all
# Benchmark improvement
python3 scripts/warmup_cache.py --benchmark
# Save cache state
python3 scripts/warmup_cache.py --all --save ~/.timmy/cache/state.json
Expected Improvement:
- Cold cache: ~10s time-to-first-token
- Warm cache: ~1s time-to-first-token
- 50-70% faster on repeated requests
5. Installation & Setup
Location: timmy-local/setup-local-timmy.sh
Size: 203 lines
Creates:
~/.timmy/cache/— Cache databases~/.timmy/logs/— Log files~/.timmy/config/— Configuration files~/.timmy/templates/— Prompt templates~/.timmy/data/— Knowledge and pattern databases
Configuration Files:
cache.yaml— Cache tier settingstimmy.yaml— Main configuration- Templates:
minimal.txt,standard.txt,deep.txt
Quick Start:
# Run setup
./setup-local-timmy.sh
# Start llama-server
llama-server -m ~/models/hermes4-14b.gguf -c 8192 --jinja -ngl 99
# Test
python3 -c "from cache.agent_cache import cache_manager; print(cache_manager.get_all_stats())"
File Structure
timmy-local/
├── cache/
│ ├── agent_cache.py # 6-tier cache implementation
│ └── cache_config.py # TTL and configuration
│
├── evennia/
│ ├── typeclasses/
│ │ ├── characters.py # Timmy, KnowledgeItem, etc.
│ │ └── rooms.py # Workshop, Library, etc.
│ ├── commands/
│ │ └── tools.py # In-world tool commands
│ └── world/
│ └── build.py # World construction
│
├── scripts/
│ ├── ingest.py # Knowledge ingestion pipeline
│ └── warmup_cache.py # Prompt cache warming
│
├── setup-local-timmy.sh # Installation script
└── README.md # Complete usage guide
Issues Addressed
| Issue | Title | Status |
|---|---|---|
| #103 | Build comprehensive caching layer | ✅ Complete |
| #83 | Install Evennia and scaffold Timmy's world | ✅ Complete |
| #84 | Bridge Timmy's tool library into Evennia Commands | ✅ Complete |
| #87 | Build knowledge ingestion pipeline | ✅ Complete |
| #85 | Implement prompt caching and KV cache reuse | ✅ Complete |
Performance Targets
| Metric | Target | How Achieved |
|---|---|---|
| Cache hit rate | > 30% | Multi-tier caching |
| TTFT improvement | 50-70% | Prompt warming + KV cache |
| Knowledge retrieval | < 100ms | SQLite + LRU |
| Tool execution | < 5s | Local inference + caching |
Integration
┌─────────────────────────────────────────────────────────────┐
│ LOCAL TIMMY │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Cache │ │ Evennia │ │ Knowledge│ │ Tools │ │
│ │ Layer │ │ World │ │ Base │ │ │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ └──────────────┴─────────────┴─────────────┘ │
│ │ │
│ ┌────┴────┐ │
│ │ Timmy │ ← Sovereign, local-first │
│ └────┬────┘ │
└─────────────────────────┼───────────────────────────────────┘
│
┌───────────┼───────────┐
│ │ │
┌────┴───┐ ┌────┴───┐ ┌────┴───┐
│ Ezra │ │Allegro │ │Bezalel │
│ (Cloud)│ │ (Cloud)│ │ (Cloud)│
│ Research│ │ Bridge │ │ Build │
└────────┘ └────────┘ └────────┘
Local Timmy operates sovereignly. Cloud backends provide additional capacity, but Timmy survives and functions without them.
Next Steps for Timmy
Immediate (Run These)
-
Setup Local Environment
cd timmy-local ./setup-local-timmy.sh -
Start llama-server
llama-server -m ~/models/hermes4-14b.gguf -c 8192 --jinja -ngl 99 -
Warm Cache
python3 scripts/warmup_cache.py --all -
Ingest Knowledge
python3 scripts/ingest.py --batch ~/papers/
Short-Term
-
Setup Evennia World
cd evennia python evennia_launcher.py shell -f world/build.py -
Configure Gitea Integration
export TIMMY_GITEA_TOKEN=your_token_here
Ongoing
-
Monitor Cache Performance
python3 -c "from cache.agent_cache import cache_manager; import json; print(json.dumps(cache_manager.get_all_stats(), indent=2))" -
Review and Approve PRs
- Branch:
feature/uni-wizard-v4-production - URL: http://143.198.27.163:3000/Timmy_Foundation/timmy-home/pulls
- Branch:
Sovereignty Guarantees
✅ All code runs locally
✅ No cloud dependencies for core functionality
✅ Graceful degradation when cloud unavailable
✅ Local inference via llama.cpp
✅ Local SQLite for all storage
✅ No telemetry without explicit consent
Artifacts
| Artifact | Location | Lines |
|---|---|---|
| Cache Layer | timmy-local/cache/ |
767 |
| Evennia World | timmy-local/evennia/ |
1,649 |
| Knowledge Pipeline | timmy-local/scripts/ingest.py |
497 |
| Cache Warming | timmy-local/scripts/warmup_cache.py |
333 |
| Setup Script | timmy-local/setup-local-timmy.sh |
203 |
| Documentation | timmy-local/README.md |
234 |
| Total | ~3,683 |
Plus Uni-Wizard v4 architecture (already delivered): ~8,000 lines
Grand Total: ~11,700 lines of architecture, code, and documentation
Report generated by: Allegro
Lane: Tempo-and-Dispatch
Status: Ready for Timmy deployment