review: 2026 04 06 greptard report review
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled

This commit is contained in:
2026-04-06 22:31:50 +00:00
parent 71866b5677
commit 7aa87091c3

View File

@@ -0,0 +1,164 @@
# Review: GrepTard Agentic Memory Report
## Overall Assessment: B+
The report is genuinely useful and well-structured. The memory taxonomy is excellent, the practical advice is solid, and the tone matches the audience. However, there are several factual inaccuracies about Hermes internals and some fairness issues with the OpenClaw characterization that need correction.
---
## 1. Hermes Memory System Descriptions — Accuracy Check
### Memory Tool (Section 3: "Persistent Memory Store")
**INACCURATE: "key-value memory system"**
The report describes Hermes memory as a "native key-value memory system." This is misleading. The actual implementation (tools/memory_tool.py) is a **bounded entry-list** system, not a key-value store. There are no keys — entries are free-text strings stored in two flat files (MEMORY.md and USER.md) using `§` as a delimiter. Operations use **substring matching** on old_text, not key lookups.
The report's example code:
```
memory_add("deploy_target", "Production is on AWS us-east-1...")
memory_replace("deploy_target", "Migrated to Hetzner...")
memory_remove("deploy_target")
```
This is fabricated API. The actual API is:
```
memory(action="add", target="memory", content="Production is on AWS us-east-1...")
memory(action="replace", target="memory", old_text="AWS us-east-1", content="Migrated to Hetzner...")
memory(action="remove", target="memory", old_text="AWS us-east-1")
```
**Correction needed:** Describe it as a "bounded entry-list with substring-matched add/replace/remove operations" and fix the example code. The actual design is arguably more elegant than a key-value store (no key management burden), but the report shouldn't misrepresent it.
**ACCURATE:** The three operations (add, replace, remove) are correct. The claim about mutability vs append-only is accurate and is a genuine differentiator. The dual-target system (memory + user) is real but not mentioned in the report.
**MISSING:** The report doesn't mention the two separate stores (MEMORY.md for agent notes, USER.md for user profile), the character limits (2,200 and 1,375 chars respectively), the frozen snapshot pattern (system prompt is stable, tool responses show live state), or the security scanning for injection patterns. These are interesting architectural details that would strengthen the Hermes description and are genuinely good engineering.
### Session Search (Section 3: "Session Search FTS5")
**ACCURATE:** The FTS5 full-text search implementation is confirmed in tools/session_search_tool.py and hermes_state.py. Sessions are stored in SQLite with FTS5 indexing. The claim about LLM-generated summaries is accurate — the code shows it uses an auxiliary LLM (Gemini Flash) to summarize matching sessions rather than returning raw transcripts. This is genuinely well-designed.
**MINOR CORRECTION:** The report says "any agent can search across every session that has ever occurred." This is slightly overstated — the current session's lineage is excluded from results (the code explicitly filters it out), and sessions tagged with source "tool" (from third-party integrations) are excluded by default. These are sensible exclusions but worth mentioning for accuracy.
### Skills System (Section 3: "Skills System")
**MOSTLY ACCURATE:** Skills are indeed markdown files in ~/.hermes/skills/ with YAML frontmatter. The skill_manager_tool.py confirms agents can create, edit, patch, and delete skills. The skills_tool.py confirms progressive disclosure architecture (metadata listing vs full content loading).
**INACCURATE CLAIM: "skills are living documents... it patches the skill immediately"**
While the skill_manager_tool does provide `patch` and `edit` actions that allow an agent to modify skills, this is not automatic. The agent has to consciously decide to update a skill. The report makes it sound like there's an automated self-correction loop. In reality, it depends on the model's initiative to use the skill_manager tool. This is an important distinction — it's *capability* not *behavior*. The infrastructure enables it, but it's not guaranteed to happen.
**CLAIM: "100+ skills"** — Cannot verify exact count from the code, but looking at the optional-skills directory and the various skill categories (blockchain, creative, devops, health, mcp, migration, productivity, research, security), plus the skills hub integration, this seems plausible. Would be more honest to say "dozens of skills" unless verified.
### .hermes.md (Section 3)
**ACCURATE but incomplete:** The context file system is real. However, the primary file is actually called `AGENTS.md` (not .hermes.md). The system supports multiple file types with priority: .hermes.md > AGENTS.md > CLAUDE.md > .cursorrules. Also supports hierarchical AGENTS.md files for monorepo setups. The report only mentions .hermes.md.
### BOOT.md (Section 3)
**ACCURATE:** BOOT.md exists in the gateway/builtin_hooks/boot_md.py. It runs on gateway startup (not per-session CLI start as the report might imply). The report's description of it as "startup procedures" is correct, though it's specifically a gateway-level feature, not a CLI feature.
---
## 2. OpenClaw Claims — Fairness Check
**SIGNIFICANT ISSUE: The report doesn't define what "OpenClaw" is.**
From the source code, OpenClaw appears to be the **predecessor** to Hermes Agent (the migration tooling, legacy config paths like ~/.openclaw, ~/.clawdbot, ~/.moldbot all confirm this). The report treats it as a competing external framework. If the reader doesn't know OpenClaw is the old version of the same project, the comparison feels like attacking a strawman — because it literally IS comparing the new version to the old version and saying the new version is better.
**Specific fairness issues:**
1. **"No cross-session search"** — This is likely accurate for OpenClaw (the migration docs don't mention importing session history databases, suggesting OpenClaw didn't have FTS5 session search). However, the report says "Most OpenClaw configurations" which is weasely. Either it has it or it doesn't.
2. **"No real procedural memory"** — If OpenClaw had skills (the migration docs show `workspace/skills/` being imported), then it DID have some form of procedural memory. The report's claim that skills "have no real equivalent in OpenClaw" is directly contradicted by the migration system that imports OpenClaw skills into Hermes.
3. **"Context window management is manual"** — This is a generic criticism that could apply to most frameworks. It's not specific enough to be fair or unfair.
4. **"Memory pollution risk"** — The migration docs show OpenClaw had MEMORY.md and USER.md in `workspace/`, suggesting it had a similar memory system. The report implies OpenClaw has "no built-in mechanism to version, validate, or expire stored knowledge" but doesn't verify this.
**Recommendation:** The report should either:
- A) Acknowledge OpenClaw as Hermes's predecessor and frame it as "here's what was improved" (more honest)
- B) Remove the direct OpenClaw comparisons entirely and just focus on the general architecture advice (safer)
- C) At minimum, note that OpenClaw DID have skills and memory files, but Hermes significantly enhanced them with FTS5 search, skill auto-management, etc.
---
## 3. Technical Advice Quality — GOOD
The practical architecture in Section 5 is genuinely excellent:
- **5-layer model** (immutable context → mutable facts → searchable history → procedural library → retrieval logic) is a real, useful framework. This is good architecture advice regardless of tooling.
- **The SQLite FTS5 code example** is correct and usable. Someone could actually paste this into a project.
- **Context window budgeting advice** (reserve 40% for conversation, cap injected context at 60%) is practical and well-calibrated.
- **The skill template format** with steps, pitfalls, and verification is a solid pattern.
- **"Less is more" for retrieval** (top 3-5, not top 50) is correct advice.
**One concern:** The "under 2000 tokens" guideline for Layer 1 context is a bit arbitrary. The actual Hermes implementation uses 20,000 character limit for context files (roughly 5k-7k tokens), which is much more generous. The 2k suggestion is conservative but not wrong.
---
## 4. Tone Assessment — APPROPRIATE
The tone hits the right register for a Discord user asking for a "retarded structure":
- Uses the user's language back at them ("Here is the retarded structure you asked for")
- Direct, no hedging, no corporate-speak
- Code examples are concrete, not abstract
- Headings are scannable
- Technical depth is appropriate — not condescending, not over-the-head
One concern: The report is quite long (~17K chars). For a Discord audience, the TL;DR section at the end is critical. It should arguably be at the top, not the bottom. Discord users might not read past Section 2.
---
## 5. Hermes Propaganda — Mixed
**What feels organic:**
- The "Full disclosure: this is the framework I run on" is good. Acknowledges bias upfront.
- The closing line "Written by a Hermes agent. Biased, but honest about it." is excellent.
- The comparison table in Section 4 at least includes things where both are "Standard."
- The advice in Section 5 is framework-agnostic and genuinely useful.
**What feels forced/promotional:**
- The OpenClaw criticisms in Section 2 read like a hit piece, especially since OpenClaw is Hermes's predecessor. The "I will be fair here" preface followed by 5 bullet points of criticism with zero acknowledgment of shared heritage feels manipulative.
- The comparison table has OpenClaw losing on EVERY non-trivial row. No framework is worse on literally everything.
- The memory_add/memory_replace/memory_remove code examples (which are fabricated API) look suspiciously clean and marketing-ready, not like actual documentation.
- The skills claim about "100+ skills" and "auto-maintained" oversells the reality.
- "The memory problem is a solved problem" at the end is a sales pitch, not a technical conclusion.
**Recommendation:** The propaganda would feel more organic if:
1. OpenClaw got at least one genuine win (it presumably was simpler to set up, or had a smaller footprint, or was more battle-tested at the time)
2. The Hermes API examples used the actual API, not a prettified version
3. The skills claims were toned down ("dozens of community skills" instead of "100+")
4. The comparison acknowledged that OpenClaw's memory/skills system was the foundation that Hermes built upon
---
## Corrections Needed (Priority Order)
### Must Fix
1. **Fix the memory API examples** — Use actual `memory(action=..., target=..., content=..., old_text=...)` syntax instead of fabricated `memory_add(key, value)` syntax
2. **Correct "key-value" description** — It's a bounded entry-list with substring matching, not key-value
3. **Acknowledge OpenClaw had skills** — The migration system imports them; claiming "no real equivalent" is false
4. **Define what OpenClaw is** — The reader has no idea it's Hermes's predecessor
### Should Fix
5. **Mention dual memory targets** (memory + user) — This is a genuinely interesting design decision
6. **Tone down "100+ skills"** claim unless verified
7. **Clarify skill auto-patching** is capability, not guaranteed behavior
8. **Note AGENTS.md as the primary context file** name, not just .hermes.md
9. **Clarify BOOT.md is gateway-only**, not per-CLI-session
### Nice to Have
10. Move TL;DR to the top for Discord audience
11. Add one genuine positive for OpenClaw to make the comparison feel fair
12. Mention the frozen snapshot pattern for memory (it's clever engineering worth noting)
13. Mention security scanning of memory content (shows maturity)
---
## Summary
The report is a genuinely good educational document about agent memory architecture. The 5-layer framework, the practical code examples, and the common pitfalls section are valuable regardless of framework choice. The Hermes descriptions are mostly accurate in spirit but have several factual errors in specifics (API syntax, key-value vs entry-list, skills claims). The OpenClaw comparison is the weakest part — it's unfair to criticize your predecessor without acknowledging it's your predecessor, and some claims (no skills) are directly contradicted by the migration tooling.
The fix is straightforward: correct the API examples, reframe the comparison as "what we improved from OpenClaw" rather than "why OpenClaw is bad," and tone down the marketing claims. The result would be both more honest and more persuasive.