Reusable harness for replaying JSONL corpora against live agents. Supports Ollama, hermes, and mock backends. Captures transcripts, scores responses, auto-files P0 issues. Closes #652
23 KiB
23 KiB