[TQ-3] Perplexity quality gate: turbo4 vs f16 #32
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Run perplexity benchmark. Pass: delta < 0.5. Marginal: 0.5-1.0. Fail: > 1.0.
TQ-3: PERPLEXITY QUALITY GATE — PASS
Test: hermes4:14b on wikitext-2-raw, 10 chunks, 2048 context
turbo4 is LOWER perplexity than f16 — it actually improves quality.
Performance
Conclusion
TurboQuant turbo4 on M3 Max Metal: quality PASSES, performance excellent.
Ready for Gemma 4 when download completes.
Automated triage: Issue reviewed and remains open. Please ensure you provide clear reproduction steps and keep the discussion focused.
We got 293 tokens per second locally on my mac?
Yes — with one important qualifier.
The
293.71 tokens/secondnumber already recorded here was local prompt-eval throughput from the wikitext-2 perplexity run onhermes4:14bover Metal on the Mac. It was not free-running decode speed, and it was not Gemma 4 yet.So the precise truthful claim is:
hermes4:14bwikitext-2-raw, 10 chunks, 2048 ctx)If we want the next proof-bearing number, it should be:
turbo4vsf16pairSo: yes, locally measured — but benchmark context matters.