Commit Graph

1 Commits

Author SHA1 Message Date
Alexander Payne
c0dc4052a3 feat: add memory bakeoff runner for baseline vs MemPalace vs Hindsight evaluation
Some checks failed
Test / pytest (pull_request) Failing after 9s
Implements issue #230 by creating:
- prompts/matrix.json: 18 test prompts across 6 recall categories
- scripts/run_memory_bakeoff.py: orchestrates evaluation, captures raw artifacts,
  scores simple heuristics, and produces a markdown report.

Backends:
- Baseline: knowledge/index.json bootstrap (keyword-match retrieval)
- MemPalace: via nexus.mempalace.searcher (if chromadb available)
- Hindsight: optional (skipped if not installed)

Accepts CLI options for matrix, category, limit, model, and dry-run.
Captures context and answers for downstream manual review.

Closes #230
2026-04-29 18:00:00 -04:00