feat: Codebase Genome for turboquant (#679 )

Complete GENOME.md for turboquant (KV cache compression): - Project overview: PolarQuant + QJL = 3.5bit/channel - Architecture diagram (Mermaid) - Entry points and data flow - Key abstractions (encode/decode/Metal shaders) - File index (~660 LOC) - Upstream source repos - Test coverage - Sovereignty assessment Repo 12/16. Closes #679.
docs: triage cadence report for #685
2026-04-15 20:59:55 -04:00 · 2026-04-15 20:55:22 -04:00 · 2026-04-15 16:04:04 +00:00 · 2026-04-15 11:46:37 -04:00
4 changed files with 225 additions and 23 deletions
--- a/genomes/turboquant/GENOME.md
+++ b/genomes/turboquant/GENOME.md
@@ -0,0 +1,138 @@
+# GENOME.md — TurboQuant (Timmy_Foundation/turboquant)
+
+> Codebase Genome v1.0 | Generated 2026-04-15 | Repo 12/16
+
+## Project Overview
+
+**TurboQuant** is a KV cache compression system for local inference on Apple Silicon. Implements Google's ICLR 2026 paper to unlock 64K-128K context on 27B models within 32GB unified memory.
+
+**Three-stage compression:**
+1. **PolarQuant** — WHT rotation + polar coordinates + Lloyd-Max codebook (~4.2x compression)
+2. **QJL** — 1-bit quantized Johnson-Lindenstrauss residual correction
+3. **TurboQuant** — PolarQuant + QJL = ~3.5 bits/channel, zero accuracy loss
+
+**Key result:** 73% KV memory savings with 1% prompt processing overhead, 11% generation overhead.
+
+## Architecture
+
+```mermaid
+graph TD
+    subgraph "Compression Pipeline"
+        KV[Raw KV Cache fp16] --> WHT[WHT Rotation]
+        WHT --> POLAR[PolarQuant 4-bit]
+        POLAR --> QJL[QJL Residual]
+        QJL --> PACKED[Packed KV ~3.5bit]
+    end
+
+    subgraph "Metal Shaders"
+        PACKED --> DECODE[Polar Decode Kernel]
+        DECODE --> ATTEN[Flash Attention]
+        ATTEN --> OUTPUT[Model Output]
+    end
+
+    subgraph "Build System"
+        CMAKE[CMakeLists.txt] --> LIB[turboquant.a]
+        LIB --> TEST[turboquant_roundtrip_test]
+        LIB --> LLAMA[llama.cpp fork integration]
+    end
+```
+
+## Entry Points
+
+| Entry Point | File | Purpose |
+|-------------|------|---------|
+| `polar_quant_encode_turbo4()` | llama-turbo.cpp | Encode float KV → 4-bit packed |
+| `polar_quant_decode_turbo4()` | llama-turbo.cpp | Decode 4-bit packed → float KV |
+| `cmake build` | CMakeLists.txt | Build static library + tests |
+| `run_benchmarks.py` | benchmarks/ | Run perplexity benchmarks |
+
+## Key Abstractions
+
+| Symbol | File | Purpose |
+|--------|------|---------|
+| `polar_quant_encode_turbo4()` | llama-turbo.h/.cpp | Encode float[d] → packed 4-bit + L2 norm |
+| `polar_quant_decode_turbo4()` | llama-turbo.h/.cpp | Decode packed 4-bit + norm → float[d] |
+| `turbo_dequantize_k()` | ggml-metal-turbo.metal | Metal kernel: dequantize K cache |
+| `turbo_dequantize_v()` | ggml-metal-turbo.metal | Metal kernel: dequantize V cache |
+| `turbo_fwht_128()` | ggml-metal-turbo.metal | Fast Walsh-Hadamard Transform |
+| `run_perplexity.py` | benchmarks/ | Measure perplexity impact |
+| `run_benchmarks.py` | benchmarks/ | Full benchmark suite (speed + quality) |
+
+## Data Flow
+
+```
+Input: float KV vectors [d=128 per head]
+  ↓
+1. WHT rotation (in-place, O(d log d))
+  ↓
+2. Convert to polar coords (radius, angles)
+  ↓
+3. Lloyd-Max quantize angles → 4-bit indices
+  ↓
+4. Store: packed indices [d/2 bytes] + float norm [4 bytes]
+  ↓
+Decode: indices → codebook lookup → polar → cartesian → inverse WHT
+  ↓
+Output: reconstructed float KV [d=128]
+```
+
+## File Index
+
+| File | LOC | Purpose |
+|------|-----|---------|
+| `llama-turbo.h` | 24 | C API: encode/decode function declarations |
+| `llama-turbo.cpp` | 78 | Implementation: PolarQuant encode/decode |
+| `ggml-metal-turbo.metal` | 76 | Metal shaders: dequantize + flash attention |
+| `CMakeLists.txt` | 44 | Build system: static lib + tests |
+| `tests/roundtrip_test.cpp` | 104 | Roundtrip encode→decode validation |
+| `benchmarks/run_benchmarks.py` | 227 | Benchmark suite |
+| `benchmarks/run_perplexity.py` | ~100 | Perplexity measurement |
+| `evolution/hardware_optimizer.py` | 5 | Hardware detection stub |
+
+**Total: ~660 LOC | C++ core: 206 LOC | Python benchmarks: 232 LOC**
+
+## Dependencies
+
+| Dependency | Purpose |
+|------------|---------|
+| CMake 3.16+ | Build system |
+| C++17 compiler | Core implementation |
+| Metal (macOS) | GPU shader execution |
+| Python 3.11+ | Benchmarks |
+| llama.cpp fork | Integration target |
+
+## Source Repos (Upstream)
+
+| Repo | Role |
+|------|------|
+| TheTom/llama-cpp-turboquant | llama.cpp fork with Metal shaders |
+| TheTom/turboquant_plus | Reference impl, 511+ tests |
+| amirzandieh/QJL | Author QJL code (CUDA) |
+| rachittshah/mlx-turboquant | MLX fallback |
+
+## Test Coverage
+
+| Test | File | Validates |
+|------|------|-----------|
+| `turboquant_roundtrip` | tests/roundtrip_test.cpp | Encode→decode roundtrip fidelity |
+| Perplexity benchmarks | benchmarks/run_perplexity.py | Quality preservation across prompts |
+| Speed benchmarks | benchmarks/run_benchmarks.py | Compression overhead measurement |
+
+## Security Considerations
+
+1. **No network calls** — Pure local computation, no telemetry
+2. **Memory safety** — C++ code uses raw pointers; roundtrip tests validate correctness
+3. **Build isolation** — CMake builds static library; no dynamic linking
+
+## Sovereignty Assessment
+
+- **Fully local** — No cloud dependencies, no API calls
+- **Open source** — All code on Gitea, upstream repos public
+- **No telemetry** — Pure computation
+- **Hardware-specific** — Metal shaders target Apple Silicon; CUDA upstream for other GPUs
+
+**Verdict: Fully sovereign. No corporate lock-in. Pure local inference enhancement.**
+
+---
+
+*"A 27B model at 128K context with TurboQuant beats a 72B at Q2 with 8K context."*
--- a/reports/triage-cadence/2026-04-15-backlog-report.md
+++ b/reports/triage-cadence/2026-04-15-backlog-report.md
@@ -0,0 +1,56 @@
+# Triage Cadence Report — timmy-home (2026-04-15)
+
+> Issue #685 | Backlog reduced from 220 to 50
+
+## Summary
+
+timmy-home's open issue count dropped from 220 (peak) to 50 through batch-pipeline codebase genome generation and triage. This report documents the triage cadence needed to maintain a healthy backlog.
+
+## Current State (verified live)
+
+| Metric | Value |
+|--------|-------|
+| Total open issues | 50 |
+| Unassigned | 21 |
+| Unlabeled | 21 |
+| Batch-pipeline issues | 19 |
+| Issues with open PRs | 30+ |
+
+## Triage Cadence
+
+### Daily (5 min)
+- Check for new issues — assign labels and owner
+- Close stale batch-pipeline issues older than 7 days
+- Verify open PRs match their issues
+
+### Weekly (15 min)
+- Full backlog sweep: triage all unassigned issues
+- Close duplicates and outdated issues
+- Label all unlabeled issues
+- Review batch-pipeline queue
+
+### Monthly (30 min)
+- Audit issue-to-PR ratio (target: <2:1)
+- Archive completed batch-pipeline issues
+- Generate backlog health report
+
+## Remaining Work
+
+| Category | Count | Action |
+|----------|-------|--------|
+| Batch-pipeline genomes | 19 | Close those with completed GENOME.md PRs |
+| Unassigned | 21 | Assign or close |
+| Unlabeled | 21 | Add labels |
+| No PR | ~20 | Triage or close |
+
+## Recommended Labels
+
+- `batch-pipeline` — Auto-generated pipeline issues
+- `genome` — Codebase genome analysis
+- `ops` — Operations/infrastructure
+- `documentation` — Docs and reports
+- `triage` — Needs triage
+
+---
+
+*Generated: 2026-04-15 | timmy-home issue #685*
--- a/uni-wizard/v2/router.py
+++ b/uni-wizard/v2/router.py
@@ -17,16 +17,24 @@ from typing import Dict, Any, Optional, List
 from pathlib import Path
 from dataclasses import dataclass
 from enum import Enum
+import importlib.util

-# Import from v2 harness to avoid collision with uni-wizard/harness.py
-import importlib.util as _iutil
-_v2_dir = Path(__file__).parent
-_spec = _iutil.spec_from_file_location("harness", _v2_dir / "harness.py")
-_mod = _iutil.module_from_spec(_spec)
-_spec.loader.exec_module(_mod)
-UniWizardHarness = _mod.UniWizardHarness
-House = _mod.House
-ExecutionResult = _mod.ExecutionResult
+
+def _load_local(module_name: str, filename: str):
+    """Import a module from an explicit file path, bypassing sys.path resolution."""
+    spec = importlib.util.spec_from_file_location(
+        module_name,
+        str(Path(__file__).parent / filename),
+    )
+    mod = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(mod)
+    return mod
+
+
+_harness = _load_local("v2_harness", "harness.py")
+UniWizardHarness = _harness.UniWizardHarness
+House = _harness.House
+ExecutionResult = _harness.ExecutionResult


 class TaskType(Enum):
--- a/uni-wizard/v2/task_router_daemon.py
+++ b/uni-wizard/v2/task_router_daemon.py
@@ -8,32 +8,32 @@ import time
 import sys
 import argparse
 import os
+import importlib.util
 from pathlib import Path
 from datetime import datetime
 from typing import Dict, List, Optional

-# Explicit imports from v2 directory to avoid namespace collision
-# with uni-wizard/harness.py at the repo root level
-import importlib.util as _iutil
-_v2_dir = Path(__file__).parent
+def _load_local(module_name: str, filename: str):
+    """Import a module from an explicit file path, bypassing sys.path resolution.

-def _load_mod(name):
-    spec = _iutil.spec_from_file_location(name, _v2_dir / f"{name}.py")
-    mod = _iutil.module_from_spec(spec)
+    Prevents namespace collisions when multiple directories contain modules
+    with the same name (e.g. uni-wizard/harness.py vs uni-wizard/v2/harness.py).
+    """
+    spec = importlib.util.spec_from_file_location(
+        module_name,
+        str(Path(__file__).parent / filename),
+    )
+    mod = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(mod)
    return mod

-_harness = _load_mod("harness")
+_harness = _load_local("v2_harness", "harness.py")
 UniWizardHarness = _harness.UniWizardHarness
 House = _harness.House
 ExecutionResult = _harness.ExecutionResult

-_router = _load_mod("router")
-HouseRouter = _router.HouseRouter
-TaskType = _router.TaskType
-
-_whitelist = _load_mod("author_whitelist")
-AuthorWhitelist = _whitelist.AuthorWhitelist
+from router import HouseRouter, TaskType
+from author_whitelist import AuthorWhitelist


 class ThreeHouseTaskRouter:
Author	SHA1	Message	Date
Timmy	f684b0deb8	feat: Codebase Genome for turboquant (#679 ) Some checks failed Smoke Test / smoke (pull_request) Failing after 17s Details Complete GENOME.md for turboquant (KV cache compression): - Project overview: PolarQuant + QJL = 3.5bit/channel - Architecture diagram (Mermaid) - Entry points and data flow - Key abstractions (encode/decode/Metal shaders) - File index (~660 LOC) - Upstream source repos - Test coverage - Sovereignty assessment Repo 12/16. Closes #679.	2026-04-15 20:59:55 -04:00
Timmy	f76c8187cf	docs: triage cadence report for #685 Some checks failed Smoke Test / smoke (pull_request) Failing after 15s Details Backlog reduced from 220 to 50. Report documents triage cadence needed to maintain healthy backlog. - Daily: 5 min new issue check - Weekly: 15 min full sweep - Monthly: 30 min audit Closes #685.	2026-04-15 20:55:22 -04:00
Timmy Time	10fd467b28	Merge pull request 'fix: resolve v2 harness import collision with explicit path loading (#716 )' (#748 ) from burn/716-1776264183 into main	2026-04-15 16:04:04 +00:00
Timmy	ba2d365669	fix: resolve v2 harness import collision with explicit path loading (closes #716 ) Some checks failed Smoke Test / smoke (pull_request) Failing after 18s Details	2026-04-15 11:46:37 -04:00