[P2-1] Download wikitext-2-raw and run perplexity quality gate #21

Open
opened 2026-03-31 04:34:05 +00:00 by Timmy · 3 comments
Owner

Parent: #1 | BLOCKING — nothing else matters if quality fails

Why First

The Phase 1 report deferred the most important test: does turbo4 compression degrade text quality? Throughput benchmarks passed but perplexity is the real gate. If PPL delta exceeds 0.5, turbo4 is not production-ready regardless of how fast it is.

Steps

  1. Download wikitext-2-raw-v1 test corpus:
cd ~/turboquant
mkdir -p corpora
curl -L "https://huggingface.co/datasets/Salesforce/wikitext/resolve/main/wikitext-2-raw/test-00000-of-00001.parquet" -o corpora/wikitext2-test.parquet
# Or the raw text version:
curl -L "https://huggingface.co/datasets/wikitext/resolve/main/wikitext-2-raw-v1/wiki.test.raw" -o corpora/wiki.test.raw
  1. Run perplexity baseline (f16 KV):
cd ~/turboquant/llama.cpp-fork
./build/bin/llama-perplexity \
  -m ~/models/hermes4-14b/NousResearch_Hermes-4-14B-Q4_K_M.gguf \
  -f ~/turboquant/corpora/wiki.test.raw \
  -c 2048 --kv-type f16

Record: PPL value, time taken.

  1. Run perplexity with turbo4:
./build/bin/llama-perplexity \
  -m ~/models/hermes4-14b/NousResearch_Hermes-4-14B-Q4_K_M.gguf \
  -f ~/turboquant/corpora/wiki.test.raw \
  -c 2048 --kv-type turbo4

Record: PPL value, time taken.

  1. Compare: PPL_delta = turbo4_ppl - baseline_ppl

Acceptance Criteria

  • wikitext-2-raw corpus downloaded
  • Baseline PPL recorded (f16 KV)
  • turbo4 PPL recorded
  • Delta calculated
  • PASS if delta ≤ 0.5 — proceed to Phase 2
  • FAIL if delta > 0.5 — investigate turbo3 or asymmetric configs before proceeding
  • Results committed to benchmarks/perplexity_results.json

Time estimate: 30-60 minutes (perplexity runs are slow)

## Parent: #1 | BLOCKING — nothing else matters if quality fails ### Why First The Phase 1 report deferred the most important test: does turbo4 compression degrade text quality? Throughput benchmarks passed but perplexity is the real gate. If PPL delta exceeds 0.5, turbo4 is not production-ready regardless of how fast it is. ### Steps 1. Download wikitext-2-raw-v1 test corpus: ```bash cd ~/turboquant mkdir -p corpora curl -L "https://huggingface.co/datasets/Salesforce/wikitext/resolve/main/wikitext-2-raw/test-00000-of-00001.parquet" -o corpora/wikitext2-test.parquet # Or the raw text version: curl -L "https://huggingface.co/datasets/wikitext/resolve/main/wikitext-2-raw-v1/wiki.test.raw" -o corpora/wiki.test.raw ``` 2. Run perplexity baseline (f16 KV): ```bash cd ~/turboquant/llama.cpp-fork ./build/bin/llama-perplexity \ -m ~/models/hermes4-14b/NousResearch_Hermes-4-14B-Q4_K_M.gguf \ -f ~/turboquant/corpora/wiki.test.raw \ -c 2048 --kv-type f16 ``` Record: PPL value, time taken. 3. Run perplexity with turbo4: ```bash ./build/bin/llama-perplexity \ -m ~/models/hermes4-14b/NousResearch_Hermes-4-14B-Q4_K_M.gguf \ -f ~/turboquant/corpora/wiki.test.raw \ -c 2048 --kv-type turbo4 ``` Record: PPL value, time taken. 4. Compare: `PPL_delta = turbo4_ppl - baseline_ppl` ### Acceptance Criteria - [ ] wikitext-2-raw corpus downloaded - [ ] Baseline PPL recorded (f16 KV) - [ ] turbo4 PPL recorded - [ ] Delta calculated - [ ] **PASS if delta ≤ 0.5** — proceed to Phase 2 - [ ] **FAIL if delta > 0.5** — investigate turbo3 or asymmetric configs before proceeding - [ ] Results committed to `benchmarks/perplexity_results.json` ### Time estimate: 30-60 minutes (perplexity runs are slow)
Timmy self-assigned this 2026-03-31 04:34:05 +00:00
Author
Owner

Progress Update — 2026-03-31

Baseline Perplexity Established

  • f16 KV baseline PPL: 5.9879 ± 0.06158 on hermes4:14b (wikitext-2-raw, 50 chunks, 2048 ctx)
  • Speed: 296 tok/s
  • Corpus downloaded to ~/turboquant/corpora/wiki.test.raw

Wrong Fork Issue

Initial attempt built the wrong fork (unixsysdev/llama-turboquant) which has upstream tq1/tq2/tq3 ternary types, NOT PolarQuant turbo2/turbo3/turbo4. That fork is also missing Metal SET_ROWS kernels.

The correct fork is TheTom/llama-cpp-turboquant which has:

  • turbo2/turbo3/turbo4 GGML types (PolarQuant + Metal)
  • Metal kernels for Walsh-Hadamard Transform (turbo-wht.h)
  • Layer-adaptive mode (TURBO_LAYER_ADAPTIVE env var)
  • Confirmed working in earlier session (processes ran with -ctk turbo4 -ctv turbo4)

Next Step

Clone correct fork, build, run turbo4 perplexity against the established 5.9879 baseline. Pass criteria: delta < 0.5.

Build+benchmark instructions saved to ~/turboquant/NEXT_STEPS.md.

## Progress Update — 2026-03-31 ### Baseline Perplexity Established - **f16 KV baseline PPL: 5.9879 ± 0.06158** on hermes4:14b (wikitext-2-raw, 50 chunks, 2048 ctx) - Speed: 296 tok/s - Corpus downloaded to ~/turboquant/corpora/wiki.test.raw ### Wrong Fork Issue Initial attempt built the **wrong fork** (unixsysdev/llama-turboquant) which has upstream tq1/tq2/tq3 ternary types, NOT PolarQuant turbo2/turbo3/turbo4. That fork is also missing Metal SET_ROWS kernels. The correct fork is **TheTom/llama-cpp-turboquant** which has: - turbo2/turbo3/turbo4 GGML types (PolarQuant + Metal) - Metal kernels for Walsh-Hadamard Transform (turbo-wht.h) - Layer-adaptive mode (TURBO_LAYER_ADAPTIVE env var) - Confirmed working in earlier session (processes ran with -ctk turbo4 -ctv turbo4) ### Next Step Clone correct fork, build, run turbo4 perplexity against the established 5.9879 baseline. Pass criteria: delta < 0.5. Build+benchmark instructions saved to ~/turboquant/NEXT_STEPS.md.
Author
Owner
Analyzed: This issue is not stale. URL: http://143.198.27.163:3000/Timmy_Foundation/turboquant/issues/21
Author
Owner

Triaged during backlog cleanup — priority confirmed. Needs owner assignment.

Triaged during backlog cleanup — priority confirmed. Needs owner assignment.
Sign in to join this conversation.