Commit Graph

1 Commits

Author SHA1 Message Date
Alexander Whitestone
78352b0a91 feat: add long-session quality test (Issue #12)
Some checks failed
Smoke Test / smoke (pull_request) Failing after 4s
50-turn multi-phase conversation test that detects quality degradation
under sustained context pressure. Supports Ollama and llama-server
backends with KV cache type configuration.

Phases: code_gen -> debug -> refactor -> test -> iterate
Metrics: quality score, coherence drift, hallucinated references,
repetition ratio, prompt relevance.

Includes --compare mode for side-by-side KV type comparison.
Acceptance: run on both TurboQuant and FP16, compare results.
2026-04-13 15:16:35 -04:00