Files
turboquant/benchmarks
Alexander Whitestone 78352b0a91
Some checks failed
Smoke Test / smoke (pull_request) Failing after 4s
feat: add long-session quality test (Issue #12)
50-turn multi-phase conversation test that detects quality degradation
under sustained context pressure. Supports Ollama and llama-server
backends with KV cache type configuration.

Phases: code_gen -> debug -> refactor -> test -> iterate
Metrics: quality score, coherence drift, hallucinated references,
repetition ratio, prompt relevance.

Includes --compare mode for side-by-side KV type comparison.
Acceptance: run on both TurboQuant and FP16, compare results.
2026-04-13 15:16:35 -04:00
..