Some checks failed
Smoke Test / smoke (pull_request) Failing after 4s
50-turn multi-phase conversation test that detects quality degradation under sustained context pressure. Supports Ollama and llama-server backends with KV cache type configuration. Phases: code_gen -> debug -> refactor -> test -> iterate Metrics: quality score, coherence drift, hallucinated references, repetition ratio, prompt relevance. Includes --compare mode for side-by-side KV type comparison. Acceptance: run on both TurboQuant and FP16, compare results.