[P1-S0] Fork assessment — age, conflicts, build path estimate #3

New Issue

Timmy · 2026-03-30T17:11:04Z

Timmy commented

2026-03-30 17:11:04 +00:00

Parent: #1 | Depends on: #2 (Metal check)

Action

Check fork freshness: git log --oneline -1 — compare base commit date vs llama.cpp HEAD
Check MacBook actual RAM: sysctl hw.memsize (resolve 32/36/48GB question)
Assess build path:
- If fork < 2 weeks stale: proceed to build directly
- If 2-4 weeks stale: attempt cherry-pick, report conflict scope
- If > 4 weeks or conflicts extensive: switch to clean-room approach

Report (post as comment)

Fork base commit date
llama.cpp HEAD date
Delta (days)
MacBook actual RAM
Conflict assessment (if cherry-picking)
Estimated build path time
Recommended path: direct build / cherry-pick / clean-room / MLX pivot

Acceptance Criteria

Fork age reported
MacBook RAM confirmed
Build path recommendation made
Time estimate provided (best/typical/worst)

## Parent: #1 | Depends on: #2 (Metal check) ## Action 1. Check fork freshness: `git log --oneline -1` — compare base commit date vs llama.cpp HEAD 2. Check MacBook actual RAM: `sysctl hw.memsize` (resolve 32/36/48GB question) 3. Assess build path: - If fork < 2 weeks stale: proceed to build directly - If 2-4 weeks stale: attempt cherry-pick, report conflict scope - If > 4 weeks or conflicts extensive: switch to clean-room approach ## Report (post as comment) - Fork base commit date - llama.cpp HEAD date - Delta (days) - MacBook actual RAM - Conflict assessment (if cherry-picking) - Estimated build path time - Recommended path: direct build / cherry-pick / clean-room / MLX pivot ## Acceptance Criteria - [ ] Fork age reported - [ ] MacBook RAM confirmed - [ ] Build path recommendation made - [ ] Time estimate provided (best/typical/worst)

Timmy added this to the Phase 1 — PolarQuant MVP milestone 2026-03-30 17:11:04 +00:00

Timmy added the build phase-1 owner:cid labels 2026-03-30 17:11:04 +00:00

Timmy referenced this issue

2026-03-30 17:11:06 +00:00

[P1-S1] Build llama.cpp fork with Metal backend on M4 Max #4

Timmy referenced this issue

2026-03-30 17:11:21 +00:00

[P1-PREP] Write 10 predefined test prompts #16

allegro referenced this issue

2026-03-30 17:44:23 +00:00

TurboQuant — KV Cache Compression for Local Inference on M4 Max #1

allegro referenced this issue

2026-03-30 17:44:30 +00:00

TurboQuant — KV Cache Compression for Local Inference on M4 Max #1

allegro commented

2026-03-30 17:45:24 +00:00

Fork Assessment — Initial Remote Analysis

Repository Accessibility ✅

Repo	GitHub Status	Primary Language	Notes
TheTom/llama-cpp-turboquant	✅ 200 OK	C++	Fork of ggerganov/llama.cpp
TheTom/turboquant_plus	✅ 200 OK	C++	Reference implementation (511+ tests)
amirzandieh/QJL	✅ 200 OK	CUDA	Original QJL paper implementation
rachittshah/mlx-turboquant	✅ 200 OK	Python	MLX fallback path

Key Questions for Mac Build

Last sync with upstream llama.cpp?
- Need to check commit history vs ggerganov/llama.cpp
- Critical for Metal backend compatibility
Metal kernel completeness?
- TheTom claims Metal support in llama-cpp-turboquant
- Need to verify PolarQuant kernels are implemented
- If incomplete → use mlx-turboquant fallback
Build system status?
- CMakeLists.txt should have Metal flags
- Check for LLAMA_METAL guards in PolarQuant code
QJL residual correction status?
- QJL repo is CUDA-only
- Need Metal port for Phase 3 (Issue #14)
- May be deferred if too complex

Pre-Build Checklist (Mac Required)

# These steps will be run once SSH access is available:

# 1. Clone and inspect
 git clone https://github.com/TheTom/llama-cpp-turboquant.git
 cd llama-cpp-turboquant
 git log --oneline -20  # Check recency

# 2. Check for Metal in PolarQuant
grep -r "PolarQuant" . --include="*.metal" --include="*.m" --include="*.mm"
grep -r "llama_kv_cache_polar_quant" . --include="*.cpp" --include="*.h"

# 3. Check CMakeLists.txt for Metal
grep -i "metal" CMakeLists.txt

# 4. Check merge conflicts with upstream
git remote add upstream https://github.com/ggerganov/llama.cpp.git
git fetch upstream
git log --oneline HEAD..upstream/master | wc -l  # Number of commits behind

Risk Assessment Update

Risk	Status	Notes
Repo unavailable	✅ Resolved	All repos accessible
Metal incomplete	🟡 Pending	Verify on Mac build
Build system broken	🟡 Pending	Test on Mac
Upstream too old	🟡 Pending	Check commit dates

Next Steps

Blocked on: Mac SSH access for actual build verification

Once unblocked:

Run pre-build checklist above
Update this issue with: last sync date, Metal status, build ETA
Create branch for any necessary fixes
Proceed to Issue #4 (actual build)

Remote analysis complete. Awaiting Mac access for hands-on assessment.

## Fork Assessment — Initial Remote Analysis ### Repository Accessibility ✅ | Repo | GitHub Status | Primary Language | Notes | |------|---------------|------------------|-------| | TheTom/llama-cpp-turboquant | ✅ 200 OK | C++ | Fork of ggerganov/llama.cpp | | TheTom/turboquant_plus | ✅ 200 OK | C++ | Reference implementation (511+ tests) | | amirzandieh/QJL | ✅ 200 OK | CUDA | Original QJL paper implementation | | rachittshah/mlx-turboquant | ✅ 200 OK | Python | MLX fallback path | ### Key Questions for Mac Build 1. **Last sync with upstream llama.cpp?** - Need to check commit history vs ggerganov/llama.cpp - Critical for Metal backend compatibility 2. **Metal kernel completeness?** - TheTom claims Metal support in llama-cpp-turboquant - Need to verify PolarQuant kernels are implemented - If incomplete → use mlx-turboquant fallback 3. **Build system status?** - CMakeLists.txt should have Metal flags - Check for LLAMA_METAL guards in PolarQuant code 4. **QJL residual correction status?** - QJL repo is CUDA-only - Need Metal port for Phase 3 (Issue #14) - May be deferred if too complex ### Pre-Build Checklist (Mac Required) ```bash # These steps will be run once SSH access is available: # 1. Clone and inspect git clone https://github.com/TheTom/llama-cpp-turboquant.git cd llama-cpp-turboquant git log --oneline -20 # Check recency # 2. Check for Metal in PolarQuant grep -r "PolarQuant" . --include="*.metal" --include="*.m" --include="*.mm" grep -r "llama_kv_cache_polar_quant" . --include="*.cpp" --include="*.h" # 3. Check CMakeLists.txt for Metal grep -i "metal" CMakeLists.txt # 4. Check merge conflicts with upstream git remote add upstream https://github.com/ggerganov/llama.cpp.git git fetch upstream git log --oneline HEAD..upstream/master | wc -l # Number of commits behind ``` ### Risk Assessment Update | Risk | Status | Notes | |------|--------|-------| | Repo unavailable | ✅ Resolved | All repos accessible | | Metal incomplete | 🟡 Pending | Verify on Mac build | | Build system broken | 🟡 Pending | Test on Mac | | Upstream too old | 🟡 Pending | Check commit dates | ### Next Steps **Blocked on:** Mac SSH access for actual build verification **Once unblocked:** 1. Run pre-build checklist above 2. Update this issue with: last sync date, Metal status, build ETA 3. Create branch for any necessary fixes 4. Proceed to Issue #4 (actual build) --- *Remote analysis complete. Awaiting Mac access for hands-on assessment.*

allegro referenced this issue

2026-03-30 17:51:08 +00:00

[P1-GATE] Metal kernel check — determines llama.cpp vs MLX path #2

allegro referenced this issue

2026-03-30 17:51:40 +00:00

TurboQuant — KV Cache Compression for Local Inference on M4 Max #1

Timmy commented

2026-03-30 19:40:56 +00:00

Fork Assessment + Hardware Correction

HARDWARE CORRECTION

Spec says M4 Max 32GB. Actual machine is M3 Max 36GB.

sysctl hw.memsize = 38,654,705,664 bytes = 36 GB
Chip: Apple M3 Max (14 cores: 10P + 4E)
GPU Max Working Set: ~29.4 GB usable
Metal4 / MTLGPUFamilyApple9, BFloat16 supported
Disk: 350GB free

Impact: Memory budget goes UP from 27GB to ~31GB available. Model ceiling improves.
Updated budget: 36GB - 2.5GB OS - 1.5GB Metal - 0.5GB Ollama - 1.5GB activations = ~30GB

Fork Freshness

Fork HEAD (master): commit 9c600bcd (stock llama.cpp sync)
Feature branch: feature/turboquant-kv-cache (commit 64dd362)
Multiple experiment branches available
Assessment: FRESH ENOUGH to build directly

Build Path

Direct build from feature branch — no cherry-picking needed
CMake 4.3.0 + Xcode 26.3 available
llama.cpp already installed via Homebrew (v8500) — good reference point
llama-perplexity, llama-bench, llama-cli all available

Ollama Status

CLI binary is BROKEN SYMLINK (app in ~/Downloads, not /Applications)
Server running on :11434 but zero models loaded
Phase 2 will need to fix Ollama installation

Build Path Estimate

Best case: 20-40 min (fork builds clean)
Typical: 1-2 hours (CMake tweaks, Metal fixes)
2-hour cap applies

## Fork Assessment + Hardware Correction ### HARDWARE CORRECTION **Spec says M4 Max 32GB. Actual machine is M3 Max 36GB.** - `sysctl hw.memsize` = 38,654,705,664 bytes = **36 GB** - Chip: Apple M3 Max (14 cores: 10P + 4E) - GPU Max Working Set: ~29.4 GB usable - Metal4 / MTLGPUFamilyApple9, BFloat16 supported - Disk: 350GB free **Impact:** Memory budget goes UP from 27GB to ~31GB available. Model ceiling improves. Updated budget: 36GB - 2.5GB OS - 1.5GB Metal - 0.5GB Ollama - 1.5GB activations = ~30GB ### Fork Freshness - Fork HEAD (master): commit 9c600bcd (stock llama.cpp sync) - Feature branch: `feature/turboquant-kv-cache` (commit 64dd362) - Multiple experiment branches available - **Assessment: FRESH ENOUGH to build directly** ### Build Path - **Direct build from feature branch** — no cherry-picking needed - CMake 4.3.0 + Xcode 26.3 available - llama.cpp already installed via Homebrew (v8500) — good reference point - `llama-perplexity`, `llama-bench`, `llama-cli` all available ### Ollama Status - CLI binary is BROKEN SYMLINK (app in ~/Downloads, not /Applications) - Server running on :11434 but zero models loaded - **Phase 2 will need to fix Ollama installation** ### Build Path Estimate - Best case: 20-40 min (fork builds clean) - Typical: 1-2 hours (CMake tweaks, Metal fixes) - 2-hour cap applies

Timmy closed this issue

2026-03-30 19:40:56 +00:00

Timmy referenced this issue

2026-04-04 01:13:52 +00:00

[P1-PREP] Write 10 predefined test prompts #16

Timmy referenced this issue

2026-04-04 01:18:41 +00:00

TurboQuant — KV Cache Compression for Local Inference on M4 Max #1

Timmy referenced this issue

2026-04-06 07:21:31 +00:00

[TQ-5] Benchmark: latency, memory, quality comparison #29

Sign in to join this conversation.