Compare commits
1 Commits
step35/669
...
sprint/iss
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
b8862bbad1 |
116
reports/evaluations/mempalace-v3-integration-evaluation.md
Normal file
116
reports/evaluations/mempalace-v3-integration-evaluation.md
Normal file
@@ -0,0 +1,116 @@
|
||||
# MemPalace v3.0.0 Integration — Before/After Evaluation
|
||||
|
||||
**Closes:** #568
|
||||
**Issue:** #764
|
||||
**Date:** 2026-04-16
|
||||
**Status:** ✅ Complete — Recommendation: integrate as primary memory layer
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Formalized evaluation of **MemPalace v3.0.0** (`github.com/milla-jovovich/mempalace`) as a memory layer for the Timmy/Hermes agent stack.
|
||||
|
||||
| Property | Value |
|
||||
|---|---|
|
||||
| Version | 3.0.0 |
|
||||
| Backend | ChromaDB (local) |
|
||||
| Cloud dependencies | **Zero** |
|
||||
| API calls required | **Zero** (baseline) |
|
||||
| MCP compatible | Yes |
|
||||
| Recommendation | **Integrate as primary memory layer** |
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
| Metric | Value | Notes |
|
||||
|---|---|---|
|
||||
| LongMemEval R@5 | **96.6%** | Raw ChromaDB, zero API calls |
|
||||
| Palace structure boost | **+34%** | Wing + room filtering vs flat retrieval |
|
||||
| Wake-up context size | **210 tokens** | L0 identity + L1 compressed project state |
|
||||
| Hybrid R@5 (optional) | 100% | With Haiku rerank (optional API) |
|
||||
|
||||
---
|
||||
|
||||
## Benchmark Results
|
||||
|
||||
| Benchmark | Mode | Score | API Required |
|
||||
|---|---|---:|---|
|
||||
| LongMemEval R@5 | Raw ChromaDB only | **96.6%** | Zero |
|
||||
| LongMemEval R@5 | Hybrid + Haiku rerank | 100% | Optional Haiku |
|
||||
| LoCoMo R@10 | Raw, session level | 60.3% | Zero |
|
||||
| Personal palace R@10 | Heuristic bench | 85% | Zero |
|
||||
| Palace structure impact | Wing + room filtering | **+34%** R@10 | Zero |
|
||||
|
||||
---
|
||||
|
||||
## Before vs After (Synthetic Evaluation)
|
||||
|
||||
### Test Setup
|
||||
- 4-file synthetic project: `README.md`, `auth.md`, `deployment.md`, `main.py`
|
||||
- Mined into MemPalace palace
|
||||
- 4 standard queries executed
|
||||
|
||||
### Before (Keyword/BM25 Baseline)
|
||||
|
||||
| Query | Returns | Limitations |
|
||||
|---|---|---|
|
||||
| `authentication` | `auth.md` only | Exact match; misses implementation context |
|
||||
| `docker nginx SSL` | `deployment.md` | Requires manual keyword logic |
|
||||
| `keycloak OAuth` | `auth.md` | No semantic cross-reference |
|
||||
| `postgresql database` | `README.md` (maybe) | Index-dependent |
|
||||
|
||||
**Problems:** no semantic ranking, exact match bias, no durable conversation memory, no palace structure, no wake-up context.
|
||||
|
||||
### After (MemPalace)
|
||||
|
||||
| Query | Results | Score | Notes |
|
||||
|---|---|---:|---|
|
||||
| `authentication` | `auth.md`, `main.py` | -0.139 | Finds auth discussion + implementation |
|
||||
| `docker nginx SSL` | `deployment.md`, `auth.md` | 0.447 | Deployment hit + related JWT context |
|
||||
| `keycloak OAuth` | `auth.md`, `main.py` | -0.029 | Conceptual + implementation evidence |
|
||||
| `postgresql database` | `README.md`, `main.py` | 0.025 | Decision + implementation |
|
||||
|
||||
**Improvements:** semantic ranking, cross-file references, palace-structured retrieval, wake-up context artifact.
|
||||
|
||||
---
|
||||
|
||||
## Wake-up Context
|
||||
|
||||
- ~210 tokens total
|
||||
- L0 identity placeholder
|
||||
- L1 compressed project state
|
||||
- Enables cold-start agent bootstrapping without re-reading full corpus
|
||||
|
||||
---
|
||||
|
||||
## Integration Recommendation
|
||||
|
||||
**Verdict: Integrate MemPalace v3.0.0 as the primary memory layer for Timmy/Hermes.**
|
||||
|
||||
Rationale:
|
||||
1. **96.6% R@5 with zero API calls** — production-grade retrieval without cloud dependency
|
||||
2. **+34% retrieval boost from palace structure** — structured memory outperforms flat search
|
||||
3. **210-token wake-up context** — enables fast cold-start agent initialization
|
||||
4. **Fully local** — aligns with sovereignty requirements
|
||||
5. **MCP compatible** — integrates with existing Hermes agent infrastructure
|
||||
|
||||
### Next Steps
|
||||
- [ ] Deploy MemPalace on Ezra's Hermes home (see `docs/MEMPALACE_EZRA_INTEGRATION.md`)
|
||||
- [ ] Run live operational benchmarks on real Timmy corpus
|
||||
- [ ] Post live metrics back to this evaluation
|
||||
- [ ] Compare against Engram direction before final fleet default decision
|
||||
|
||||
### Scope Boundary
|
||||
This evaluation covers synthetic benchmarks and paper-level metrics. Live operational testing on production data is pending and should be tracked separately.
|
||||
|
||||
---
|
||||
|
||||
## Related
|
||||
|
||||
- Issue #568 — Original evaluation request
|
||||
- Issue #764 — This formalized report
|
||||
- PR #569 — Original draft
|
||||
- `docs/MEMPALACE_EZRA_INTEGRATION.md — Ezra integration packet
|
||||
- `reports/evaluations/2026-04-06-mempalace-evaluation.md` — Earlier evaluation draft
|
||||
Reference in New Issue
Block a user