Alexander Payne
c0dc4052a3
Test / pytest (pull_request) Failing after 9s
feat: add memory bakeoff runner for baseline vs MemPalace vs Hindsight evaluation
Implements issue #230 by creating:
- prompts/matrix.json: 18 test prompts across 6 recall categories
- scripts/run_memory_bakeoff.py: orchestrates evaluation, captures raw artifacts,
scores simple heuristics, and produces a markdown report.
Backends:
- Baseline: knowledge/index.json bootstrap (keyword-match retrieval)
- MemPalace: via nexus.mempalace.searcher (if chromadb available)
- Hindsight: optional (skipped if not installed)
Accepts CLI options for matrix, category, limit, model, and dry-run.
Captures context and answers for downstream manual review.
Closes #230
2026-04-29 18:00:00 -04:00
..
2026-04-14 11:17:01 -04:00
2026-04-15 15:06:09 +00:00
2026-04-14 14:05:30 -04:00
2026-04-15 03:46:43 +00:00
2026-04-21 07:58:09 -04:00
2026-04-21 11:20:25 +00:00
2026-04-15 03:56:27 +00:00
2026-04-21 11:57:53 +00:00
2026-04-15 03:49:00 +00:00
2026-04-14 14:03:30 -04:00
2026-04-15 14:47:26 +00:00
2026-04-15 14:42:28 +00:00
2026-04-15 04:00:12 +00:00
2026-04-21 11:21:58 +00:00
2026-04-26 09:34:57 -04:00
2026-04-15 10:52:51 -04:00
2026-04-21 07:29:44 -04:00
2026-04-29 18:00:00 -04:00
2026-04-15 03:02:12 +00:00
2026-04-14 19:06:16 +00:00
2026-04-15 03:39:08 +00:00
2026-04-14 14:03:30 -04:00
2026-04-15 14:53:43 +00:00
2026-04-14 14:05:30 -04:00
2026-04-15 03:57:21 +00:00
2026-04-15 03:50:04 +00:00
2026-04-17 05:17:40 +00:00
2026-04-14 14:05:30 -04:00
2026-04-14 14:03:30 -04:00
2026-04-15 14:47:30 +00:00
2026-04-15 04:00:46 +00:00
2026-04-26 09:34:57 -04:00
2026-04-15 10:52:51 -04:00
2026-04-15 10:54:58 -04:00
2026-04-15 03:39:09 +00:00
2026-04-14 14:21:21 -04:00