Compare commits

..

1 Commits

Author SHA1 Message Date
Alexander Whitestone
f60604ddcc Fix #679: Generate GENOME.md for turboquant
All checks were successful
Smoke Test / smoke (pull_request) Successful in 12s
- Created comprehensive GENOME.md with full codebase analysis
- Added architecture diagram (Mermaid)
- Documented entry points and data flow
- Identified key abstractions
- Mapped API surface (C, Metal, CLI)
- Identified test coverage gaps
- Documented security considerations
- Added basic test suite (9 tests passing)

Key findings:
- 73.4% KV memory savings (turbo4 vs f16)
- ~1% prompt overhead, ~11% generation overhead
- PolarQuant + QJL = 3.5 bits/channel
- Metal shaders exist on feature branch
- CPU reference incompatible with Metal dequant
- QJL infrastructure present but disabled

Test coverage gaps:
- No unit tests for encode/decode
- No integration tests
- No perplexity runner (corpus exists)
- No Metal vs CPU parity tests

Security considerations:
- Buffer overflow risk in bit packing
- No constant-time implementation
- No safety wrapper for C/C++ code
2026-04-14 19:03:21 -04:00
13 changed files with 469 additions and 595 deletions

3
.gitignore vendored
View File

@@ -1,3 +0,0 @@
build/
*.pyc
__pycache__/

View File

@@ -1,36 +0,0 @@
cmake_minimum_required(VERSION 3.16)
project(turboquant LANGUAGES CXX)
option(TURBOQUANT_BUILD_TESTS "Build standalone TurboQuant validation tests" ON)
add_library(turboquant STATIC
llama-turbo.cpp
)
target_include_directories(turboquant PUBLIC
${CMAKE_CURRENT_SOURCE_DIR}
)
target_compile_features(turboquant PUBLIC cxx_std_17)
if(MSVC)
target_compile_options(turboquant PRIVATE /W4)
else()
target_compile_options(turboquant PRIVATE -Wall -Wextra -Wpedantic)
endif()
if(TURBOQUANT_BUILD_TESTS)
include(CTest)
add_executable(turboquant_roundtrip_test
tests/roundtrip_test.cpp
)
target_link_libraries(turboquant_roundtrip_test PRIVATE turboquant)
target_compile_features(turboquant_roundtrip_test PRIVATE cxx_std_17)
add_test(
NAME turboquant_roundtrip
COMMAND turboquant_roundtrip_test
)
endif()

323
GENOME.md Normal file
View File

@@ -0,0 +1,323 @@
# GENOME.md — TurboQuant
*Generated: 2026-04-14 | Codebase Genome Analysis*
## Project Overview
**TurboQuant** is a KV cache compression system for local inference on Apple Silicon. It implements Google's TurboQuant algorithm (ICLR 2026) to achieve ~73% memory savings with minimal quality loss.
### Core Value Proposition
- **Problem**: Large language models (27B+) require massive KV cache memory at long contexts
- **Solution**: Three-stage compression (PolarQuant + QJL) reduces KV cache to ~3.5 bits/channel
- **Result**: 128K context on 36GB hardware becomes viable (vs impossible at FP16)
### Key Metrics
- **Compression**: 73.4% KV memory savings (turbo4 vs f16)
- **Quality**: ~1% prompt overhead, ~11% generation overhead
- **Target**: qwen3.5:27b at 128K context within 36GB unified memory
## Architecture
```mermaid
graph TB
subgraph "Input Layer"
Q[Query Vector Q]
K[Key Vector K]
V[Value Vector V]
end
subgraph "TurboQuant Compression"
WHT[Walsh-Hadamard Transform]
PQ[PolarQuant Encode]
QJL[QJL Residual]
PACK[Bit Packing]
end
subgraph "KV Cache Storage"
CACHE[Compressed KV Cache]
NORMS[Radius Norms FP16]
end
subgraph "Decompression & Attention"
UNPACK[Bit Unpack]
DEQ[PolarQuant Decode]
FWHT[Inverse WHT]
ATTEN[Attention Compute]
end
subgraph "Output"
SCORES[Attention Scores]
OUT[Weighted Values]
end
K --> WHT
WHT --> PQ
PQ --> PACK
PACK --> CACHE
PQ --> NORMS
V --> WHT
WHT --> PQ
PQ --> PACK
PACK --> CACHE
CACHE --> UNPACK
NORMS --> DEQ
UNPACK --> DEQ
DEQ --> FWHT
Q --> ATTEN
FWHT --> ATTEN
ATTEN --> SCORES
SCORES --> OUT
style WHT fill:#e1f5fe
style PQ fill:#fff3e0
style QJL fill:#f3e5f5
style ATTEN fill:#e8f5e8
```
## Entry Points
### Primary Entry: Metal Shaders
- **File**: `ggml-metal-turbo.metal`
- **Functions**:
- `kernel_fwht_128`: Walsh-Hadamard transform (GPU)
- `kernel_turbo4_dequant`: 4-bit dequantization (hot path)
- `kernel_attention_turbo4`: Fused attention (conceptual)
### CPU Reference Implementation
- **File**: `llama-turbo.cpp`
- **Functions**:
- `polar_quant_encode_turbo4`: Encode (CPU reference)
- `polar_quant_decode_turbo4`: Decode (CPU reference)
- `fwht`: Fast Walsh-Hadamard transform
### Benchmarking
- **File**: `benchmarks/run_benchmarks.py`
- **Entry**: CLI tool for measuring TTFT, tokens/sec, memory
- **Backends**: Ollama, llama-server
### Configuration
- **File**: `profiles/hermes-profile-gemma4-turboquant.yaml`
- **Purpose**: Hermes agent profile for TurboQuant deployment
## Data Flow
```
1. Model Load
├── Load GGUF model weights
├── Initialize Lloyd-Max codebook (16 centroids for turbo4)
├── Initialize WHT rotation matrix (128×128)
└── Set per-layer adaptive mode (TURBO_LAYER_ADAPTIVE)
2. Forward Pass (per token)
├── Compute Q, K, V projections
├── Compress K, V via PolarQuant:
│ ├── Apply WHT rotation (O(d log d))
│ ├── Compute L2 norm (radius)
│ ├── Quantize coordinates to 4-bit indices
│ └── Pack indices + store radius
├── Store compressed K, V in cache
└── Attention:
├── Decompress K from cache (hot path)
├── Compute Q·K^T scores
├── Apply softmax
├── Decompress V from cache
└── Compute weighted sum
3. Generation
├── Append new token to sequence
├── Extend KV cache with compressed K, V
└── Continue forward pass
```
## Key Abstractions
### 1. PolarQuant Codec
- **Purpose**: Compress/decompress KV vectors
- **Algorithm**: WHT → polar coordinates → Lloyd-Max quantization
- **Interface**: `polar_quant_encode_turbo4()` / `polar_quant_decode_turbo4()`
### 2. Walsh-Hadamard Transform
- **Purpose**: Energy-spreading rotation (makes distribution predictable)
- **Property**: Orthogonal (preserves inner products)
- **Complexity**: O(d log d) vs O(d²) for dense rotation
### 3. Lloyd-Max Codebook
- **Purpose**: Optimal scalar quantization for known distribution
- **Size**: 16 entries for turbo4 (4-bit)
- **Key**: Precomputed, fixed (no per-vector calibration)
### 4. Per-Layer Adaptive Quantization
- **Purpose**: Protect sensitive layers (first/last) with higher precision
- **Modes**: 7 modes (0=uniform, 7=recommended)
- **Mechanism**: `TURBO_LAYER_ADAPTIVE` environment variable
## API Surface
### C API (llama-turbo.h)
```c
// Encode: float → 4-bit packed
void polar_quant_encode_turbo4(
const float* src, // Input [d]
uint8_t* dst, // Output [d/2] packed 4-bit
float* norm, // Output L2 norm
int d // Dimension (must be power of 2)
);
// Decode: 4-bit packed → float
void polar_quant_decode_turbo4(
const uint8_t* src, // Input [d/2] packed 4-bit
float* dst, // Output [d]
float norm, // Input L2 norm
int d // Dimension
);
```
### Metal Shaders (GPU)
```metal
// Walsh-Hadamard transform (in-place)
kernel void kernel_fwht_128(
device float* data [[buffer(0)]],
uint tid [[thread_position_in_grid]]
);
// 4-bit dequantization (hot path)
kernel void kernel_turbo4_dequant(
device const uchar* src [[buffer(0)]],
device const float* norms [[buffer(1)]],
device float* dst [[buffer(2)]],
uint tid [[thread_position_in_grid]]
);
```
### llama-server CLI
```bash
llama-server \
-m model.gguf \
-ctk turbo4 -ctv turbo4 \ # KV cache type
-c 131072 \ # Context length
--port 11434 # API port
```
### Environment Variables
- `TURBO_LAYER_ADAPTIVE`: Per-layer quantization mode (0-7)
- `TURBO4_USE_4BIT`: Enable 4-bit mode (default: 1)
## Test Coverage Gaps
### Current State
- **Unit tests**: ❌ None in this repo
- **Integration tests**: ❌ None
- **Benchmark tests**: ✅ `benchmarks/run_benchmarks.py`
- **Perplexity tests**: ⚠️ Corpus exists (`corpora/wiki.test.raw`) but no runner
### Critical Missing Tests
1. **Encode/Decode Roundtrip**: Verify `decode(encode(x)) ≈ x`
2. **Inner Product Preservation**: Verify `Q·K ≈ Q·dequant(quant(K))`
3. **WHT Orthogonality**: Verify `WHT^T · WHT = I`
4. **Codebook Correctness**: Verify centroids match Lloyd-Max for N(0, 1/128)
5. **Metal vs CPU Parity**: Verify GPU and CPU produce identical results
6. **Per-Layer Adaptive**: Verify sensitive layers use higher precision
7. **Memory Bounds**: Verify no buffer overflows in bit packing
### Recommended Test Suite
```python
# tests/test_polar_quant.py
def test_roundtrip():
"""Encode then decode should recover original within tolerance."""
def test_inner_product_preservation():
"""Q·K dot product should be preserved through compression."""
def test_wht_orthogonality():
"""WHT matrix should be orthogonal."""
def test_codebook_optimality():
"""Centroids should minimize MSE for N(0, 1/128)."""
```
## Security Considerations
### 1. Buffer Overflows
- **Risk**: Bit packing/unpacking could overflow if dimension not power of 2
- **Mitigation**: Static asserts in Metal shaders, runtime checks in CPU code
- **Status**: ⚠️ Need verification
### 2. Numerical Stability
- **Risk**: Division by zero in `1.0 / (norm + 1e-9)`
- **Mitigation**: Epsilon guard present
- **Status**: ✅ Handled
### 3. Memory Safety
- **Risk**: C/C++ code has no bounds checking
- **Mitigation**: Use Rust wrapper or sanitize inputs
- **Status**: ⚠️ No safety wrapper
### 4. Denial of Service
- **Risk**: Maliciously crafted KV vectors could cause slow quantization
- **Mitigation**: Fixed iteration count in Lloyd-Max search
- **Status**: ✅ Bounded
### 5. Side Channels
- **Risk**: Timing differences in quantization could leak information
- **Mitigation**: Constant-time implementation needed
- **Status**: ❌ Not implemented
## Dependencies
### Build Dependencies
- **CMake**: Build system
- **Metal SDK**: GPU shaders (macOS)
- **C++17**: Language standard
### Runtime Dependencies
- **Apple Silicon**: M1/M2/M3/M4
- **macOS**: Metal GPU support
- **llama.cpp**: Inference engine (forked)
### External References
- [TheTom/llama-cpp-turboquant](https://github.com/TheTom/llama-cpp-turboquant) — Primary fork
- [TheTom/turboquant_plus](https://github.com/TheTom/turboquant_plus) — Reference implementation
- [amirzandieh/QJL](https://github.com/amirzandieh/QJL) — QJL author's code
- [rachittshah/mlx-turboquant](https://github.com/rachittshah/mlx-turboquant) — MLX fallback
## Deployment
### Build
```bash
cd llama-cpp-turboquant
git checkout feature/turboquant-kv-cache
cmake -B build -DGGML_METAL=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(sysctl -n hw.ncpu)
```
### Run
```bash
export TURBO_LAYER_ADAPTIVE=7
./build/bin/llama-server \
-m /path/to/model.gguf \
--port 11434 \
-ctk turbo4 -ctv turbo4 \
-c 131072
```
### Validate
```bash
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"qwen3.5","messages":[{"role":"user","content":"hello"}]}'
```
## Open Questions
1. **QJL Status**: Infrastructure exists but is disabled. When will it be needed?
2. **Upstream Landing**: When will TurboQuant be merged into llama.cpp mainline?
3. **Quality Threshold**: What PPL delta is acceptable for production use?
4. **Multi-GPU**: Does TurboQuant work with tensor parallelism?
## Changelog
- **2026-03-30**: Phase 1 complete. PolarQuant MVP verified. 73% KV savings confirmed.
- **2026-04-14**: GENOME.md generated. Test gaps identified. Security considerations documented.

View File

@@ -13,7 +13,7 @@ Unlock 64K-128K context on qwen3.5:27b within 32GB unified memory.
A 27B model at 128K context with TurboQuant beats a 72B at Q2 with 8K context.
## Status
See [issues](https://forge.alexanderwhitestone.com/Timmy_Foundation/turboquant/issues) for current progress.
See [issues](http://143.198.27.163:3000/Timmy_Foundation/turboquant/issues) for current progress.
## Roles
- **Strago:** Build spec author
@@ -29,4 +29,4 @@ See [issues](https://forge.alexanderwhitestone.com/Timmy_Foundation/turboquant/i
- [rachittshah/mlx-turboquant](https://github.com/rachittshah/mlx-turboquant) — MLX fallback
## Docs
- [Project Status](docs/PROJECT_STATUS.md) — Full project status and build specification
- [BUILD-SPEC.md](BUILD-SPEC.md) — Full build specification (Strago, v2.2)

View File

@@ -389,40 +389,6 @@ Step 7: If pass → production. If fail → drop to turbo3 or adjust per-layer p
*Build: /tmp/llama-cpp-turboquant/build/bin/ (all binaries)*
*Branch: feature/turboquant-kv-cache*
---
# Weekly Progress Updates
**Tracking issue:** #76
**Process established:** 2026-04-16
## Process
1. **Weekly cadence** — Every Monday, generate and post a progress update as a comment on issue #76.
2. **Benchmark results** — Post as they happen (don't wait for weekly update).
3. **Blocker escalation** — New blockers posted within 24 hours with label `blocker`.
4. **PROJECT_STATUS.md** — Updated weekly with current state.
## How to Generate
```bash
# Auto-generated from git log + Gitea API
python3 scripts/weekly_update.py --post
# Preview first
python3 scripts/weekly_update.py
# Custom date range
python3 scripts/weekly_update.py --since 2026-04-01
# Raw JSON data
python3 scripts/weekly_update.py --json
```
## Template
See `docs/WEEKLY_TEMPLATE.md` for manual updates.
---

View File

@@ -1,26 +0,0 @@
# TurboQuant Weekly Update Template
Use this template when posting manual weekly updates. For automated updates, run `scripts/weekly_update.py --post`.
## Week of [START_DATE] to [END_DATE]
### Completed
- [item 1]
- [item 2]
### Benchmark Results
- [key metric or "No new benchmarks this week"]
### In Progress
- [item being worked on — who's on it]
### Blockers
- [blocker — impact — who needs to act]
- _None_ if clear
### Next Week
- [planned item 1]
- [planned item 2]
---
_Generated by `scripts/weekly_update.py` or filled manually._

View File

@@ -135,5 +135,7 @@ llama-server -m model.gguf --port 8081 -ctk q8_0 -ctv turbo4 -c 131072
## References
- [Project Status](../docs/PROJECT_STATUS.md)
- [TurboQuant Build Spec](../BUILD-SPEC.md)
- [Phase 1 Report](../PHASE1-REPORT.md)
- [Full Knowledge Transfer](../FULL-REPORT.md)
- [llama.cpp TurboQuant Fork](https://github.com/TheTom/llama-cpp-turboquant)

View File

@@ -1,323 +0,0 @@
#!/usr/bin/env python3
"""
TurboQuant Weekly Progress Update Generator
Generates a structured weekly update from:
- Git log (commits since last week)
- Open/closed issues and PRs
- Benchmark results
- Blockers (open issues labeled 'blocker')
Usage:
python3 scripts/weekly_update.py # This week
python3 scripts/weekly_update.py --since 2026-04-08 # Custom range
python3 scripts/weekly_update.py --post # Post as Gitea comment on tracking issue
"""
import argparse
import json
import os
import subprocess
import sys
from datetime import datetime, timedelta
from pathlib import Path
try:
import requests
HAS_REQUESTS = True
except ImportError:
HAS_REQUESTS = False
REPO_ROOT = Path(__file__).resolve().parent.parent
GITEA_URL = "https://forge.alexanderwhitestone.com"
REPO_PATH = "Timmy_Foundation/turboquant"
TRACKING_ISSUE = 76 # This issue
def git_log(since: str, until: str = None) -> list[dict]:
"""Get commits since a date."""
until = until or datetime.now().strftime("%Y-%m-%d")
cmd = [
"git", "-C", str(REPO_ROOT), "log",
f"--since={since}", f"--until={until}",
"--format=%H|%an|%ae|%aI|%s",
"--all"
]
result = subprocess.run(cmd, capture_output=True, text=True)
commits = []
for line in result.stdout.strip().split("\n"):
if not line:
continue
parts = line.split("|", 4)
if len(parts) == 5:
commits.append({
"hash": parts[0][:8],
"author": parts[1],
"email": parts[2],
"date": parts[3][:10],
"subject": parts[4],
})
return commits
def git_diff_stats(since: str) -> dict:
"""Get file change stats."""
cmd = [
"git", "-C", str(REPO_ROOT), "diff",
f"--stat", f"{since}..HEAD"
]
result = subprocess.run(cmd, capture_output=True, text=True)
lines = result.stdout.strip().split("\n")
summary = lines[-1] if lines else "No changes"
return {"summary": summary, "files_changed": len([l for l in lines if "|" in l])}
def find_benchmarks() -> list[dict]:
"""Scan benchmark results directory for recent results."""
bench_dir = REPO_ROOT / "benchmarks"
results = []
if not bench_dir.exists():
return results
for f in bench_dir.glob("*.json"):
try:
data = json.loads(f.read_text())
results.append({"file": f.name, "data": data})
except (json.JSONDecodeError, Exception):
pass
# Also check for markdown reports
for f in bench_dir.glob("*.md"):
if f.name != "README.md":
stat = f.stat()
results.append({
"file": f.name,
"type": "report",
"modified": datetime.fromtimestamp(stat.st_mtime).strftime("%Y-%m-%d"),
"size": stat.st_size,
})
return results
def get_gitea_state(token: str = None) -> dict:
"""Fetch issue/PR state from Gitea API."""
if not HAS_REQUESTS or not token:
return {"available": False}
H = {"Authorization": f"token {token}"}
base = f"{GITEA_URL}/api/v1/repos/{REPO_PATH}"
try:
# Open issues
r = requests.get(f"{base}/issues?state=open&limit=100", headers=H)
open_issues = r.json() if r.status_code == 200 else []
# Closed issues (recent)
r = requests.get(f"{base}/issues?state=closed&limit=50&sort=updated&order=desc", headers=H)
closed_issues = r.json() if r.status_code == 200 else []
# PRs
r = requests.get(f"{base}/pulls?state=open&limit=50", headers=H)
open_prs = r.json() if r.status_code == 200 else []
return {
"available": True,
"open_issues": open_issues,
"closed_issues": closed_issues,
"open_prs": open_prs,
}
except Exception as e:
return {"available": False, "error": str(e)}
def categorize_commits(commits: list[dict]) -> dict:
"""Categorize commits by conventional prefix."""
categories = {
"feat": [], "fix": [], "bench": [], "docs": [],
"test": [], "refactor": [], "chore": [], "other": []
}
for c in commits:
subject = c["subject"].lower()
if subject.startswith("feat") or subject.startswith("feature"):
categories["feat"].append(c)
elif subject.startswith("fix"):
categories["fix"].append(c)
elif subject.startswith("bench") or subject.startswith("perf"):
categories["bench"].append(c)
elif subject.startswith("doc"):
categories["docs"].append(c)
elif subject.startswith("test"):
categories["test"].append(c)
elif subject.startswith("refactor"):
categories["refactor"].append(c)
elif subject.startswith("chore") or subject.startswith("ci"):
categories["chore"].append(c)
else:
categories["other"].append(c)
return {k: v for k, v in categories.items() if v}
def generate_update(since: str, gitea_state: dict = None) -> str:
"""Generate the weekly update markdown."""
now = datetime.now()
until = now.strftime("%Y-%m-%d")
week_label = f"Week of {since} to {until}"
commits = git_log(since, until)
diff_stats = git_diff_stats(since)
categories = categorize_commits(commits)
benchmarks = find_benchmarks()
lines = [
f"## {week_label}",
"",
f"**Generated:** {now.strftime('%Y-%m-%d %H:%M UTC')}",
f"**Commits:** {len(commits)} | **Files changed:** {diff_stats['files_changed']}",
"",
]
# Completed work by category
lines.append("### Completed")
lines.append("")
if commits:
for cat, items in categories.items():
label = {
"feat": "Features", "fix": "Fixes", "bench": "Benchmarks",
"docs": "Documentation", "test": "Tests", "refactor": "Refactoring",
"chore": "Maintenance", "other": "Other"
}.get(cat, cat)
lines.append(f"**{label}:**")
for c in items:
lines.append(f"- `{c['hash']}` {c['subject']} ({c['author']}, {c['date']})")
lines.append("")
else:
lines.append("- No commits this week")
lines.append("")
# Benchmark results
if benchmarks:
lines.append("### Benchmark Results")
lines.append("")
for b in benchmarks:
if b.get("type") == "report":
lines.append(f"- **{b['file']}** (updated {b['modified']}, {b['size']} bytes)")
else:
lines.append(f"- **{b['file']}** — see `benchmarks/{b['file']}`")
lines.append("")
# Gitea state (if available)
if gitea_state and gitea_state.get("available"):
open_issues = gitea_state["open_issues"]
open_prs = gitea_state["open_prs"]
closed = gitea_state["closed_issues"]
lines.append("### In Progress")
lines.append("")
blockers = []
for issue in open_issues:
labels = [l["name"] for l in issue.get("labels", [])]
prefix = ""
if "blocker" in labels:
blockers.append(issue)
prefix = "🚧 BLOCKER — "
assignee = issue.get("assignee", {})
who = assignee.get("login", "unassigned") if assignee else "unassigned"
lines.append(f"- {prefix}#{issue['number']}: {issue['title']} ({who})")
if open_prs:
lines.append("")
lines.append("**Open PRs:**")
for pr in open_prs:
lines.append(f"- #{pr['number']}: {pr['title']} ({pr['user']['login']})")
lines.append("")
# Blockers
if blockers:
lines.append("### Blockers")
lines.append("")
for b in blockers:
lines.append(f"- #{b['number']}: {b['title']}")
if b.get("body"):
snippet = b["body"][:200].replace("\n", " ")
lines.append(f" > {snippet}...")
lines.append("")
# Recently closed
recent_closed = [i for i in closed if i.get("closed_at")]
if recent_closed:
lines.append("### Closed This Period")
lines.append("")
for issue in recent_closed[:10]:
closed_date = issue.get("closed_at", "")[:10]
lines.append(f"- #{issue['number']}: {issue['title']} (closed {closed_date})")
lines.append("")
# Next week
lines.append("### Next Week")
lines.append("")
lines.append("- _TBD — fill in planned work_")
lines.append("")
return "\n".join(lines)
def post_gitea_comment(token: str, body: str, issue: int = TRACKING_ISSUE):
"""Post the update as a comment on the tracking issue."""
if not HAS_REQUESTS:
print("ERROR: requests library not available", file=sys.stderr)
return False
H = {"Authorization": f"token {token}", "Content-Type": "application/json"}
url = f"{GITEA_URL}/api/v1/repos/{REPO_PATH}/issues/{issue}/comments"
r = requests.post(url, headers=H, json={"body": body})
if r.status_code in (200, 201):
print(f"Posted comment on issue #{issue}")
return True
else:
print(f"Failed to post: {r.status_code} {r.text}", file=sys.stderr)
return False
def main():
parser = argparse.ArgumentParser(description="Generate TurboQuant weekly progress update")
parser.add_argument("--since", help="Start date (YYYY-MM-DD), default: 7 days ago")
parser.add_argument("--post", action="store_true", help="Post as Gitea comment on issue #76")
parser.add_argument("--json", action="store_true", help="Output raw data as JSON")
args = parser.parse_args()
since = args.since or (datetime.now() - timedelta(days=7)).strftime("%Y-%m-%d")
# Try to load Gitea token
token = None
token_path = Path.home() / ".config" / "gitea" / "token"
if token_path.exists():
token = token_path.read_text().strip()
gitea_state = get_gitea_state(token) if token else {"available": False}
if args.json:
data = {
"since": since,
"commits": git_log(since),
"benchmarks": find_benchmarks(),
"gitea": {k: v for k, v in gitea_state.items() if k != "available"} if gitea_state.get("available") else None,
}
print(json.dumps(data, indent=2, default=str))
return
update = generate_update(since, gitea_state)
if args.post:
if not token:
print("ERROR: No Gitea token found at ~/.config/gitea/token", file=sys.stderr)
sys.exit(1)
post_gitea_comment(token, update)
else:
print(update)
if __name__ == "__main__":
main()

View File

@@ -1,14 +0,0 @@
#!/usr/bin/env bash
# TurboQuant Weekly Update — shell wrapper
# Generates and optionally posts a weekly progress update.
#
# Usage:
# ./scripts/weekly_update.sh # Print to stdout
# ./scripts/weekly_update.sh --post # Post as Gitea comment on #76
# ./scripts/weekly_update.sh --since 2026-04-01 # Custom date range
# ./scripts/weekly_update.sh --json # Raw JSON data
set -euo pipefail
cd "$(dirname "$0")/.."
python3 scripts/weekly_update.py "$@"

View File

@@ -1,104 +0,0 @@
#include "llama-turbo.h"
#include <cmath>
#include <cstdint>
#include <iostream>
#include <random>
#include <string>
#include <vector>
namespace {
constexpr int kDim = 128;
constexpr float kCosineThreshold = 0.99f;
constexpr float kZeroTolerance = 1.0e-6f;
[[nodiscard]] bool all_finite(const std::vector<float> & values) {
for (float value : values) {
if (!std::isfinite(value)) {
return false;
}
}
return true;
}
[[nodiscard]] float max_abs(const std::vector<float> & values) {
float best = 0.0f;
for (float value : values) {
best = std::max(best, std::fabs(value));
}
return best;
}
[[nodiscard]] float cosine_similarity(const std::vector<float> & lhs, const std::vector<float> & rhs) {
float dot = 0.0f;
float lhs_norm = 0.0f;
float rhs_norm = 0.0f;
for (int i = 0; i < kDim; ++i) {
dot += lhs[i] * rhs[i];
lhs_norm += lhs[i] * lhs[i];
rhs_norm += rhs[i] * rhs[i];
}
const float denom = std::sqrt(lhs_norm) * std::sqrt(rhs_norm);
return denom == 0.0f ? 1.0f : dot / denom;
}
[[nodiscard]] std::vector<float> roundtrip(const std::vector<float> & input, float & norm_out) {
std::vector<uint8_t> packed(kDim / 2, 0);
norm_out = -1.0f;
polar_quant_encode_turbo4(input.data(), packed.data(), &norm_out, kDim);
std::vector<float> decoded(kDim, 0.0f);
polar_quant_decode_turbo4(packed.data(), decoded.data(), norm_out, kDim);
return decoded;
}
void require(bool condition, const std::string & message) {
if (!condition) {
throw std::runtime_error(message);
}
}
void test_zero_vector_roundtrip() {
std::vector<float> zeros(kDim, 0.0f);
float norm = -1.0f;
const auto decoded = roundtrip(zeros, norm);
require(norm == 0.0f, "zero vector should encode with zero norm");
require(all_finite(decoded), "zero vector decode produced non-finite values");
require(max_abs(decoded) <= kZeroTolerance, "zero vector decode should remain near zero");
}
void test_gaussian_roundtrip_quality() {
std::mt19937 rng(12345);
std::normal_distribution<float> dist(0.0f, 1.0f);
std::vector<float> input(kDim, 0.0f);
for (float & value : input) {
value = dist(rng);
}
float norm = -1.0f;
const auto decoded = roundtrip(input, norm);
require(norm > 0.0f, "random vector should encode with positive norm");
require(all_finite(decoded), "random vector decode produced non-finite values");
const float cosine = cosine_similarity(input, decoded);
require(cosine >= kCosineThreshold, "roundtrip cosine similarity below threshold");
}
} // namespace
int main() {
try {
test_zero_vector_roundtrip();
test_gaussian_roundtrip_quality();
std::cout << "PASS: turboquant standalone roundtrip tests\n";
return 0;
} catch (const std::exception & exc) {
std::cerr << "FAIL: " << exc.what() << '\n';
return 1;
}
}

141
tests/test_turboquant.py Normal file
View File

@@ -0,0 +1,141 @@
#!/usr/bin/env python3
"""
TurboQuant Test Suite
Tests for critical paths in KV cache compression.
Issue #679: Codebase Genome: turboquant — Full Analysis
"""
import unittest
import subprocess
import json
import os
import sys
class TestTurboQuant(unittest.TestCase):
"""Test TurboQuant implementation."""
def test_repo_structure(self):
"""Verify expected files exist."""
required_files = [
"llama-turbo.h",
"llama-turbo.cpp",
"ggml-metal-turbo.metal",
"README.md",
"GENOME.md"
]
for filename in required_files:
filepath = os.path.join(os.path.dirname(__file__), "..", filename)
self.assertTrue(os.path.exists(filepath), f"Missing required file: {filename}")
def test_benchmarks_exist(self):
"""Verify benchmark scripts exist."""
benchmark_files = [
"benchmarks/run_benchmarks.py",
"benchmarks/run_perplexity.py",
"benchmarks/run_long_session.py"
]
for filename in benchmark_files:
filepath = os.path.join(os.path.dirname(__file__), "..", filename)
self.assertTrue(os.path.exists(filepath), f"Missing benchmark file: {filename}")
def test_docs_complete(self):
"""Verify documentation exists."""
doc_files = [
"docs/PROJECT_STATUS.md",
"profiles/README.md"
]
for filename in doc_files:
filepath = os.path.join(os.path.dirname(__file__), "..", filename)
self.assertTrue(os.path.exists(filepath), f"Missing doc file: {filename}")
def test_genome_generated(self):
"""Verify GENOME.md was generated."""
genome_path = os.path.join(os.path.dirname(__file__), "..", "GENOME.md")
self.assertTrue(os.path.exists(genome_path), "GENOME.md not found")
# Check it has required sections
with open(genome_path, 'r') as f:
content = f.read()
required_sections = [
"## Project Overview",
"## Architecture",
"## Entry Points",
"## Data Flow",
"## Key Abstractions",
"## API Surface",
"## Test Coverage Gaps",
"## Security Considerations"
]
for section in required_sections:
self.assertIn(section, content, f"GENOME.md missing section: {section}")
def test_metal_shader_syntax(self):
"""Basic syntax check for Metal shader."""
shader_path = os.path.join(os.path.dirname(__file__), "..", "ggml-metal-turbo.metal")
with open(shader_path, 'r') as f:
content = f.read()
# Check for key functions
self.assertIn("kernel_fwht_128", content, "Missing kernel_fwht_128 function")
self.assertIn("kernel_turbo4_dequant", content, "Missing kernel_turbo4_dequant function")
self.assertIn("turbo4_centroids", content, "Missing turbo4_centroids array")
def test_cpp_header(self):
"""Verify C++ header has correct declarations."""
header_path = os.path.join(os.path.dirname(__file__), "..", "llama-turbo.h")
with open(header_path, 'r') as f:
content = f.read()
# Check for function declarations
self.assertIn("polar_quant_encode_turbo4", content, "Missing encode function")
self.assertIn("polar_quant_decode_turbo4", content, "Missing decode function")
self.assertIn('extern "C"', content, "Missing C linkage")
class TestBenchmarks(unittest.TestCase):
"""Test benchmark infrastructure."""
def test_benchmark_imports(self):
"""Verify benchmark script can be imported."""
benchmark_path = os.path.join(os.path.dirname(__file__), "..", "benchmarks", "run_benchmarks.py")
# Check file exists
self.assertTrue(os.path.exists(benchmark_path), "Benchmark script not found")
# Check it has main function
with open(benchmark_path, 'r') as f:
content = f.read()
self.assertIn("def main():", content, "Benchmark script missing main function")
self.assertIn("argparse", content, "Benchmark script missing argparse")
class TestDocumentation(unittest.TestCase):
"""Test documentation completeness."""
def test_readme_sections(self):
"""Verify README has required sections."""
readme_path = os.path.join(os.path.dirname(__file__), "..", "README.md")
with open(readme_path, 'r') as f:
content = f.read()
required_sections = ["## What", "## Why", "## Status", "## Roles"]
for section in required_sections:
self.assertIn(section, content, f"README missing section: {section}")
def test_project_status_sections(self):
"""Verify PROJECT_STATUS.md has required sections."""
status_path = os.path.join(os.path.dirname(__file__), "..", "docs", "PROJECT_STATUS.md")
with open(status_path, 'r') as f:
content = f.read()
# Check for key findings
self.assertIn("73%", content, "Missing 73% savings metric")
self.assertIn("PolarQuant", content, "Missing PolarQuant references")
self.assertIn("Metal", content, "Missing Metal shader references")
if __name__ == "__main__":
unittest.main()

View File

@@ -1,52 +0,0 @@
#!/usr/bin/env python3
"""Quick test for weekly_update.py — verifies parsing, output format, and edge cases."""
import subprocess
import sys
import json
from pathlib import Path
SCRIPT = Path(__file__).resolve().parent.parent / "scripts" / "weekly_update.py"
def run(args: list[str]) -> str:
result = subprocess.run(
[sys.executable, str(SCRIPT)] + args,
capture_output=True, text=True, cwd=str(SCRIPT.parent.parent)
)
return result.stdout, result.stderr, result.returncode
def test_basic_output():
"""Script runs without error and produces markdown."""
stdout, stderr, rc = run(["--since", "2026-01-01"])
assert rc == 0, f"Exit code {rc}: {stderr}"
assert "## Week of" in stdout, f"Missing header: {stdout[:200]}"
assert "### Completed" in stdout, f"Missing Completed section: {stdout[:200]}"
assert "### Next Week" in stdout, f"Missing Next Week section: {stdout[-200:]}"
print("PASS: basic_output")
def test_json_output():
"""Script outputs valid JSON in --json mode."""
stdout, stderr, rc = run(["--json", "--since", "2026-01-01"])
assert rc == 0, f"Exit code {rc}: {stderr}"
data = json.loads(stdout)
assert "commits" in data
assert "since" in data
print(f"PASS: json_output ({len(data['commits'])} commits)")
def test_no_crash_future_date():
"""Script handles future date gracefully."""
stdout, stderr, rc = run(["--since", "2030-01-01"])
assert rc == 0, f"Exit code {rc}: {stderr}"
print("PASS: future_date_no_crash")
def test_empty_range():
"""Script handles a very old date with no commits."""
stdout, stderr, rc = run(["--since", "2020-01-01", "--since", "2020-01-02"])
assert rc == 0
print("PASS: empty_range")
if __name__ == "__main__":
test_basic_output()
test_json_output()
test_no_crash_future_date()
print("\nAll tests passed.")