Compare commits

..

3 Commits

Author SHA1 Message Date
3433b8514a fix(kimi): purge kimi-for-coding from model lists, tests, docs (#lazzyPit)
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 45s
kimi-for-coding triggers 401/403 access-terminated errors.
Apply workaround consistently:
- Remove from _PROVIDER_MODELS['kimi-coding'] and coding plan selection
- Update tests to expect kimi-k2.5 instead
- Update docs and reports
- Live config on Beta VPS also corrected
2026-04-07 16:13:12 +00:00
a8eb7dfbad feat(provider): first-class Ollama support + Gemma 4 defaults (#169)
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 32s
- Add 'ollama' to CLI provider choices and auth aliases
- Wire Ollama through resolve_provider_client with auto-detection
- Add _try_ollama to auxiliary fallback chain (before local/custom)
- Add ollama to vision provider order
- Update model_metadata.py: ollama prefix + gemma-4-* context lengths (256K)
- Default model: gemma4:12b when provider=ollama
2026-04-07 15:55:50 +00:00
dd0fa2d1a1 fix(config): correct fallback model from kimi-for-coding to kimi-k2.5
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 47s
The kimi-for-coding model triggers 403 access-terminated errors.
Switch fallback config to use kimi-k2.5 which is valid for Hermes gateways.

Refs: #lazzyPit
2026-04-07 15:40:00 +00:00
13 changed files with 11 additions and 1714 deletions

View File

@@ -47,11 +47,6 @@ jobs:
source .venv/bin/activate
python scripts/syntax_guard.py
- name: No duplicate models
run: |
source .venv/bin/activate
python scripts/check_no_duplicate_models.py
- name: Green-path E2E
run: |
source .venv/bin/activate

View File

@@ -940,7 +940,7 @@ def _try_ollama() -> Tuple[Optional[OpenAI], Optional[str]]:
return None, None
api_key = (os.getenv("OLLAMA_API_KEY", "") or "ollama").strip()
model = _read_main_model() or "gemma4:12b"
return OpenAI(api_key=api_key, base_url=base_url), model
return OpenAI(api_key=*** base_url=base_url), model
def _get_provider_chain() -> List[tuple]:
@@ -1216,7 +1216,7 @@ def resolve_provider_client(
base_url = base_url + "/v1" if not base_url.endswith("/v1") else base_url
api_key = (explicit_api_key or os.getenv("OLLAMA_API_KEY", "") or "ollama").strip()
final_model = model or _read_main_model() or "gemma4:12b"
client = OpenAI(api_key=api_key, base_url=base_url)
client = OpenAI(api_key=*** base_url=base_url)
return (_to_async_client(client, final_model) if async_mode else (client, final_model))
# ── Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY) ───────────

View File

@@ -148,7 +148,7 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"openrouter": "openrouter",
"anthropic": "anthropic",
"zai": "zai",
"kimi-coding": "kimi-k2.5",
"kimi-coding": "kimi-for-coding",
"minimax": "minimax",
"minimax-cn": "minimax-cn",
"deepseek": "deepseek",

View File

@@ -1,335 +0,0 @@
# Memory Architecture Guide
Developer-facing guide to the Hermes Agent memory system. Covers all four memory tiers, data lifecycle, security guarantees, and extension points.
## Overview
Hermes has four distinct memory systems, each serving a different purpose:
| Tier | System | Scope | Cost | Persistence |
|------|--------|-------|------|-------------|
| 1 | **Built-in Memory** (MEMORY.md / USER.md) | Current session, curated facts | ~1,300 tokens fixed per session | File-backed, cross-session |
| 2 | **Session Search** (FTS5) | All past conversations | On-demand (search + summarize) | SQLite (state.db) |
| 3 | **Skills** (procedural memory) | How to do specific tasks | Loaded on match only | File-backed (~/.hermes/skills/) |
| 4 | **External Providers** (plugins) | Deep persistent knowledge | Provider-dependent | Provider-specific |
All four tiers operate independently. Built-in memory is always active. The others are opt-in or on-demand.
## Tier 1: Built-in Memory (MEMORY.md / USER.md)
### File Layout
```
~/.hermes/memories/
├── MEMORY.md — Agent's notes (environment facts, conventions, lessons learned)
└── USER.md — User profile (preferences, communication style, identity)
```
Profile-aware: when running under a profile (`hermes -p coder`), the memories directory resolves to `~/.hermes/profiles/<name>/memories/`.
### Frozen Snapshot Pattern
This is the most important architectural decision in the memory system.
1. **Session start:** `MemoryStore.load_for_prompt()` reads both files from disk, parses entries delimited by `§` (section sign), and injects them into the system prompt as a frozen block.
2. **During session:** The `memory` tool writes to disk immediately (durable), but does **not** update the system prompt. This preserves the LLM's prefix cache for the entire session.
3. **Next session:** The snapshot refreshes from disk.
**Why frozen?** System prompt changes invalidate the KV cache on every API call. With a ~30K token system prompt, that's expensive. Freezing memory at session start means the cache stays warm for the entire conversation. The tradeoff: memory writes made mid-session don't take effect until next session. Tool responses show the live state so the agent can verify writes succeeded.
### Character Limits
| Store | Default Limit | Approx Tokens | Typical Entries |
|-------|--------------|---------------|-----------------|
| MEMORY.md | 2,200 chars | ~800 | 8-15 |
| USER.md | 1,375 chars | ~500 | 5-10 |
Limits are in characters (not tokens) because character counts are model-independent. Configurable in `config.yaml`:
```yaml
memory:
memory_char_limit: 2200
user_char_limit: 1375
```
### Entry Format
Entries are separated by `\n§\n`. Each entry can be multiline. Example MEMORY.md:
```
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop
§
Project ~/code/api uses Go 1.22, chi router, sqlc. Tests: 'make test'
§
Staging server 10.0.1.50 uses SSH port 2222, key at ~/.ssh/staging_ed25519
```
### Tool Interface
The `memory` tool (defined in `tools/memory_tool.py`) supports:
- **`add`** — Append new entry. Rejects exact duplicates.
- **`replace`** — Find entry by unique substring (`old_text`), replace with `content`.
- **`remove`** — Find entry by unique substring, delete it.
- **`read`** — Return current entries from disk (live state, not frozen snapshot).
Substring matching: `old_text` must match exactly one entry. If it matches multiple, the tool returns an error asking for more specificity.
### Security Scanning
Every memory entry is scanned against `_MEMORY_THREAT_PATTERNS` before acceptance:
- Prompt injection patterns (`ignore previous instructions`, `you are now...`)
- Credential exfiltration (`curl`/`wget` with env vars, `.env` file reads)
- SSH backdoor attempts (`authorized_keys`, `.ssh` writes)
- Invisible Unicode characters (zero-width spaces, BOM)
Matches are rejected with an error message. Source: `_scan_memory_content()` in `tools/memory_tool.py`.
### Code Path
```
agent/prompt_builder.py
└── assembles system prompt pieces
└── MemoryStore.load_for_prompt() → frozen snapshot injection
tools/memory_tool.py
├── MemoryStore class (file I/O, locking, parsing)
├── memory_tool() function (add/replace/remove/read dispatch)
└── _scan_memory_content() (threat scanning)
hermes_cli/memory_setup.py
└── Interactive first-run memory setup
```
## Tier 2: Session Search (FTS5)
### How It Works
1. Every CLI and gateway session stores full message history in SQLite (`~/.hermes/state.db`)
2. The `messages_fts` FTS5 virtual table enables fast full-text search
3. The `session_search` tool finds relevant messages, groups by session, loads top N
4. Each matching session is summarized by Gemini Flash (auxiliary LLM, not main model)
5. Summaries are returned to the main agent as context
### Why Gemini Flash for Summarization
Raw session transcripts can be 50K+ chars. Feeding them to the main model wastes context window and tokens. Gemini Flash is fast, cheap, and good enough for "extract the relevant bits" summarization. Same pattern used by `web_extract`.
### Schema
```sql
-- Core tables
sessions (id, source, user_id, model, system_prompt, parent_session_id, ...)
messages (id, session_id, role, content, tool_name, timestamp, ...)
-- Full-text search
messages_fts -- FTS5 virtual table on messages.content
-- Schema tracking
schema_version
```
WAL mode for concurrent readers + one writer (gateway multi-platform support).
### Session Lineage
When context compression triggers a session split, `parent_session_id` chains the old and new sessions. This lets session search follow the thread across compression boundaries.
### Code Path
```
tools/session_search_tool.py
├── FTS5 query against messages_fts
├── Groups results by session_id
├── Loads top N sessions (MAX_SESSION_CHARS = 100K per session)
├── Sends to Gemini Flash via auxiliary_client.async_call_llm()
└── Returns per-session summaries
hermes_state.py (SessionDB class)
├── SQLite WAL mode database
├── FTS5 triggers for message insert/update/delete
└── Session CRUD operations
```
### Memory vs Session Search
| | Memory | Session Search |
|---|--------|---------------|
| **Capacity** | ~1,300 tokens total | Unlimited (all stored sessions) |
| **Latency** | Instant (in system prompt) | Requires FTS query + LLM call |
| **When to use** | Critical facts always in context | "What did we discuss about X?" |
| **Management** | Agent-curated | Automatic |
| **Token cost** | Fixed per session | On-demand per search |
## Tier 3: Skills (Procedural Memory)
### What Skills Are
Skills capture **how to do a specific type of task** based on proven experience. Where memory is broad and declarative, skills are narrow and actionable.
A skill is a directory with a `SKILL.md` (markdown instructions) and optional supporting files:
```
~/.hermes/skills/
├── my-skill/
│ ├── SKILL.md — Instructions, steps, pitfalls
│ ├── references/ — API docs, specs
│ ├── templates/ — Code templates, config files
│ ├── scripts/ — Helper scripts
│ └── assets/ — Images, data files
```
### How Skills Load
At the start of each turn, the agent's system prompt includes available skills. When a skill matches the current task, the agent loads it with `skill_view(name)` and follows its instructions. Skills are **not** injected wholesale — they're loaded on demand to preserve context window.
### Skill Lifecycle
1. **Creation:** After a complex task (5+ tool calls), the agent offers to save the approach as a skill using `skill_manage(action='create')`.
2. **Usage:** On future matching tasks, the agent loads the skill with `skill_view(name)`.
3. **Maintenance:** If a skill is outdated or incomplete when used, the agent patches it immediately with `skill_manage(action='patch')`.
4. **Deletion:** Obsolete skills are removed with `skill_manage(action='delete')`.
### Skills vs Memory
| | Memory | Skills |
|---|--------|--------|
| **Format** | Free-text entries | Structured markdown (steps, pitfalls, examples) |
| **Scope** | Facts and preferences | Procedures and workflows |
| **Loading** | Always in system prompt | On-demand when matched |
| **Size** | ~1,300 tokens total | Variable (loaded individually) |
### Code Path
```
tools/skill_manager_tool.py — Create, edit, patch, delete skills
agent/skill_commands.py — Slash commands for skill management
skills_hub.py — Browse, search, install skills from hub
```
## Tier 4: External Memory Providers
### Plugin Architecture
```
plugins/memory/
├── __init__.py — Provider registry and base interface
├── honcho/ — Dialectic Q&A, cross-session user modeling
├── openviking/ — Knowledge graph memory
├── mem0/ — Semantic memory with auto-extraction
├── hindsight/ — Retrospective memory analysis
├── holographic/ — Distributed holographic memory
├── retaindb/ — Vector-based retention
├── byterover/ — Byte-level memory compression
└── supermemory/ — Cloud-hosted semantic memory
```
Only one external provider can be active at a time. Built-in memory (Tier 1) always runs alongside it.
### Integration Points
When a provider is active, Hermes:
1. Injects provider context into the system prompt
2. Prefetches relevant memories before each turn (background, non-blocking)
3. Syncs conversation turns to the provider after each response
4. Extracts memories on session end (for providers that support it)
5. Mirrors built-in memory writes to the provider
6. Adds provider-specific tools for search and management
### Configuration
```yaml
memory:
provider: openviking # or honcho, mem0, hindsight, etc.
```
Setup: `hermes memory setup` (interactive picker).
## Data Lifecycle
```
Session Start
├── Load MEMORY.md + USER.md from disk → frozen snapshot in system prompt
├── Load skills catalog (names + descriptions)
├── Initialize session search (SQLite connection)
└── Initialize external provider (if configured)
Each Turn
├── Agent sees frozen memory in system prompt
├── Agent can call memory tool → writes to disk, returns live state
├── Agent can call session_search → FTS5 + Gemini Flash summarization
├── Agent can load skills → reads SKILL.md from disk
└── External provider prefetches context (if active)
Session End
├── All memory writes already on disk (immediate persistence)
├── Session transcript saved to SQLite (messages + FTS5 index)
├── External provider extracts final memories (if supported)
└── Skill updates persisted (if any were patched)
```
## Privacy and Data Locality
| Component | Location | Network |
|-----------|----------|---------|
| MEMORY.md / USER.md | `~/.hermes/memories/` | Local only |
| Session DB | `~/.hermes/state.db` | Local only |
| Skills | `~/.hermes/skills/` | Local only |
| External provider | Provider-dependent | Provider API calls |
Built-in memory (Tiers 1-3) never leaves the machine. External providers (Tier 4) send data to the configured provider by design. The agent logs all provider API calls in the session transcript for auditability.
## Configuration Reference
```yaml
# ~/.hermes/config.yaml
memory:
memory_enabled: true # Enable MEMORY.md
user_profile_enabled: true # Enable USER.md
memory_char_limit: 2200 # MEMORY.md char limit (~800 tokens)
user_char_limit: 1375 # USER.md char limit (~500 tokens)
nudge_interval: 10 # Turns between memory nudge reminders
provider: null # External provider name (null = disabled)
```
Environment variables (in `~/.hermes/.env`):
- Provider-specific API keys (e.g., `HONCHO_API_KEY`, `MEM0_API_KEY`)
## Troubleshooting
### Memory not appearing in system prompt
- Check `~/.hermes/memories/MEMORY.md` exists and has content
- Verify `memory.memory_enabled: true` in config
- Check for file lock issues (WAL mode, concurrent access)
### Memory writes not taking effect
- Writes are durable to disk immediately but frozen in system prompt until next session
- Tool response shows live state — verify the write succeeded there
- Start a new session to see the updated snapshot
### Session search returns nothing
- Verify `state.db` has sessions: `sqlite3 ~/.hermes/state.db "SELECT count(*) FROM sessions"`
- Check FTS5 index: `sqlite3 ~/.hermes/state.db "SELECT count(*) FROM messages_fts"`
- Ensure auxiliary LLM (Gemini Flash) is configured and reachable
### Skills not loading
- Check `~/.hermes/skills/` directory exists
- Verify SKILL.md has valid frontmatter (name, description)
- Skills load by name match — check the skill name matches what the agent expects
### External provider errors
- Check API key in `~/.hermes/.env`
- Verify provider is installed: `pip install <provider-package>`
- Run `hermes memory status` for diagnostic info

View File

@@ -1,335 +0,0 @@
# Memory Architecture Guide
How Hermes Agent remembers things across sessions — the stores, the tools, the data flow, and how to configure it all.
## Overview
Hermes has a multi-layered memory system. It is not one thing — it is several independent systems that complement each other:
1. **Persistent Memory** (MEMORY.md / USER.md) — bounded, curated notes injected into every system prompt
2. **Session Search** — full-text search across all past conversation transcripts
3. **Skills** — procedural memory: reusable workflows stored as SKILL.md files
4. **External Memory Providers** — optional plugins (Honcho, Holographic, Mem0, etc.) for deeper recall
All built-in memory lives on disk under `~/.hermes/` (or `$HERMES_HOME`). No memory data leaves the machine unless you explicitly configure an external cloud provider.
## Memory Types in Detail
### 1. Persistent Memory (MEMORY.md and USER.md)
The core memory system. Two files in `~/.hermes/memories/`:
| File | Purpose | Default Char Limit |
|------|---------|--------------------|
| `MEMORY.md` | Agent's personal notes — environment facts, project conventions, tool quirks, lessons learned | 2,200 chars (~800 tokens) |
| `USER.md` | User profile — name, preferences, communication style, pet peeves | 1,375 chars (~500 tokens) |
**How it works:**
- Loaded from disk at session start and injected into the system prompt as a frozen snapshot
- The agent uses the `memory` tool to add, replace, or remove entries during a session
- Mid-session writes go to disk immediately (durable) but do NOT update the system prompt — this preserves the LLM's prefix cache for performance
- The snapshot refreshes on the next session start
- Entries are delimited by `§` (section sign) and can be multiline
**System prompt appearance:**
```
══════════════════════════════════════════════
MEMORY (your personal notes) [67% — 1,474/2,200 chars]
══════════════════════════════════════════════
User's project is a Rust web service at ~/code/myapi using Axum + SQLx
§
This machine runs Ubuntu 22.04, has Docker and Podman installed
§
User prefers concise responses, dislikes verbose explanations
```
**Memory tool actions:**
- `add` — append a new entry (rejected if it would exceed the char limit)
- `replace` — find an entry by substring match and replace it
- `remove` — find an entry by substring match and delete it
Substring matching means you only need a unique fragment of the entry, not the full text. If the fragment matches multiple entries, the tool returns an error asking for a more specific match.
### 2. Session Search
Cross-session conversation recall via SQLite FTS5 full-text search.
- All CLI and messaging sessions are stored in `~/.hermes/state.db`
- The `session_search` tool finds relevant past conversations by keyword
- Top matching sessions are summarized by Gemini Flash (cheap, fast) before being returned to the main model
- Returns focused summaries, not raw transcripts
**When to use session_search vs. memory:**
| Feature | Persistent Memory | Session Search |
|---------|------------------|----------------|
| Capacity | ~3,575 chars total | Unlimited (all sessions) |
| Speed | Instant (in system prompt) | Requires search + LLM summarization |
| Use case | Key facts always in context | "What did we discuss about X last week?" |
| Management | Manually curated by the agent | Automatic — all sessions stored |
| Token cost | Fixed per session (~1,300 tokens) | On-demand (searched when needed) |
**Rule of thumb:** Memory is for facts that should *always* be available. Session search is for recalling specific past conversations on demand. Don't save task progress or session outcomes to memory — use session_search to find those.
### 3. Skills (Procedural Memory)
Skills are reusable workflows stored as `SKILL.md` files in `~/.hermes/skills/` (and optionally external skill directories).
- Organized by category: `skills/github/github-pr-workflow/SKILL.md`
- YAML frontmatter with name, description, version, platform restrictions
- Progressive disclosure: metadata shown in skill list, full content loaded on demand via `skill_view`
- The agent creates skills proactively after complex tasks (5+ tool calls) using the `skill_manage` tool
- Skills can be patched when found outdated — stale skills are a liability
Skills are *not* injected into the system prompt by default. The agent sees a compact index of available skills and loads them on demand. This keeps the prompt lean while giving access to deep procedural knowledge.
**Skills vs. Memory:**
- **Memory:** compact facts ("User's project uses Go 1.22 with chi router")
- **Skills:** detailed procedures ("How to deploy the staging server: step 1, step 2, ...")
### 4. External Memory Providers
Optional plugins that add deeper, structured memory alongside the built-in system. Only one external provider can be active at a time.
| Provider | Storage | Key Feature |
|----------|---------|-------------|
| Honcho | Cloud | Dialectic user modeling with semantic search |
| OpenViking | Self-hosted | Filesystem-style knowledge hierarchy |
| Mem0 | Cloud | Server-side LLM fact extraction |
| Hindsight | Cloud/Local | Knowledge graph with entity resolution |
| Holographic | Local SQLite | HRR algebraic reasoning + trust scoring |
| RetainDB | Cloud | Hybrid search with delta compression |
| ByteRover | Local/Cloud | Hierarchical knowledge tree with CLI |
| Supermemory | Cloud | Context fencing + session graph ingest |
External providers run **alongside** built-in memory (never replacing it). They receive hooks for:
- System prompt injection (provider context)
- Pre-turn memory prefetch
- Post-turn conversation sync
- Session-end extraction
- Built-in memory write mirroring
Setup: `hermes memory setup` or set `memory.provider` in `~/.hermes/config.yaml`.
See `website/docs/user-guide/features/memory-providers.md` for full provider details.
## How the Systems Interact
```
Session Start
|
+--> Load MEMORY.md + USER.md from disk --> frozen snapshot into system prompt
+--> Provider: system_prompt_block() --> injected into system prompt
+--> Skills index --> injected into system prompt (compact metadata only)
|
v
Each Turn
|
+--> Provider: prefetch(query) --> relevant recalled context
+--> Agent sees: system prompt (memory + provider context + skills index)
+--> Agent can call: memory tool, session_search tool, skill tools, provider tools
|
v
After Each Response
|
+--> Provider: sync_turn(user, assistant) --> persist conversation
|
v
Periodic (every N turns, default 10)
|
+--> Memory nudge: agent prompted to review and update memory
|
v
Session End / Compression
|
+--> Memory flush: agent saves important facts before context is discarded
+--> Provider: on_session_end(messages) --> final extraction
+--> Provider: on_pre_compress(messages) --> save insights before compression
```
## Best Practices
### What to Save
Save proactively — don't wait for the user to ask:
- **User preferences:** "I prefer TypeScript over JavaScript" → `user` target
- **Corrections:** "Don't use sudo for Docker, I'm in the docker group" → `memory` target
- **Environment facts:** "This server runs Debian 12 with PostgreSQL 16" → `memory` target
- **Conventions:** "Project uses tabs, 120-char lines, Google docstrings" → `memory` target
- **Explicit requests:** "Remember that my API key rotation is monthly" → `memory` target
### What NOT to Save
- **Task progress or session outcomes** — use session_search to recall these
- **Trivially re-discoverable facts** — "Python 3.12 supports f-strings" (web search this)
- **Raw data dumps** — large code blocks, log files, data tables
- **Session-specific ephemera** — temporary file paths, one-off debugging context
- **Content already in SOUL.md or AGENTS.md** — those are already in context
### Writing Good Entries
Compact, information-dense entries work best:
```
# Good — packs multiple related facts
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop and Podman. Shell: zsh. Editor: VS Code with Vim bindings.
# Good — specific, actionable convention
Project ~/code/api uses Go 1.22, sqlc for DB, chi router. Tests: make test. CI: GitHub Actions.
# Bad — too vague
User has a project.
# Bad — too verbose
On January 5th, 2026, the user asked me to look at their project which is
located at ~/code/api. I discovered it uses Go version 1.22 and...
```
### Capacity Management
When memory is above 80% capacity (visible in the system prompt header), consolidate before adding. Merge related entries into shorter, denser versions. The tool will reject additions that would exceed the limit — use `replace` to consolidate first.
Priority order for what stays in memory:
1. User preferences and corrections (highest — prevents repeated steering)
2. Environment facts and project conventions
3. Tool quirks and workarounds
4. Lessons learned (lowest — can often be rediscovered)
### Memory Nudge
Every N turns (default: 10), the agent receives a nudge prompting it to review and update its memory. This is a lightweight prompt injected into the conversation — not a separate API call. The agent can choose to update memory or skip if nothing has changed.
## Privacy and Data Locality
**Built-in memory is fully local.** MEMORY.md and USER.md are plain text files in `~/.hermes/memories/`. No network calls are made in the memory read/write path. The memory tool scans entries for prompt injection and exfiltration patterns before accepting them.
**Session search is local.** The SQLite database (`~/.hermes/state.db`) stays on disk. FTS5 search is a local operation. However, the summarization step uses Gemini Flash (via the auxiliary LLM client) — conversation snippets are sent to Google's API for summarization. If this is a concern, session_search can be disabled.
**External providers may send data off-machine.** Cloud providers (Honcho, Mem0, RetainDB, Supermemory) send data to their respective APIs. Self-hosted providers (OpenViking, Hindsight local mode, Holographic, ByteRover local mode) keep everything on your machine. Check the provider's documentation for specifics.
**Security scanning.** All content written to memory (via the `memory` tool) is scanned for:
- Prompt injection patterns ("ignore previous instructions", role hijacking, etc.)
- Credential exfiltration attempts (curl/wget with secrets, reading .env files)
- SSH backdoor patterns
- Invisible unicode characters (used for steganographic injection)
Blocked content is rejected with a descriptive error message.
## Configuration
In `~/.hermes/config.yaml`:
```yaml
memory:
# Enable/disable the two built-in memory stores
memory_enabled: true # MEMORY.md
user_profile_enabled: true # USER.md
# Character limits (not tokens — model-independent)
memory_char_limit: 2200 # ~800 tokens at 2.75 chars/token
user_char_limit: 1375 # ~500 tokens at 2.75 chars/token
# External memory provider (empty string = built-in only)
# Options: "honcho", "openviking", "mem0", "hindsight",
# "holographic", "retaindb", "byterover", "supermemory"
provider: ""
```
Additional settings are read from `run_agent.py` defaults:
| Setting | Default | Description |
|---------|---------|-------------|
| `nudge_interval` | 10 | Turns between memory review nudges (0 = disabled) |
| `flush_min_turns` | 6 | Minimum user turns before memory flush on session end/compression (0 = never flush) |
These are set under the `memory` key in config.yaml:
```yaml
memory:
nudge_interval: 10
flush_min_turns: 6
```
### Disabling Memory
To disable memory entirely, set both to false:
```yaml
memory:
memory_enabled: false
user_profile_enabled: false
```
The `memory` tool will not appear in the tool list, and no memory blocks are injected into the system prompt.
You can also disable memory per-invocation with `skip_memory=True` in the AIAgent constructor (used by cron jobs and flush agents).
## File Locations
```
~/.hermes/
├── memories/
│ ├── MEMORY.md # Agent's persistent notes
│ ├── USER.md # User profile
│ ├── MEMORY.md.lock # File lock (auto-created)
│ └── USER.md.lock # File lock (auto-created)
├── state.db # SQLite session store (FTS5)
├── config.yaml # Memory config + provider selection
└── .env # API keys for external providers
```
All paths respect `$HERMES_HOME` — if you use Hermes profiles, each profile has its own isolated memory directory.
## Troubleshooting
### "Memory full" errors
The tool returns an error when adding would exceed the character limit. The response includes current entries so the agent can consolidate. Fix by:
1. Replacing multiple related entries with one denser entry
2. Removing entries that are no longer relevant
3. Increasing `memory_char_limit` in config (at the cost of larger system prompts)
### Stale memory entries
If the agent seems to have outdated information:
- Check `~/.hermes/memories/MEMORY.md` directly — you can edit it by hand
- The frozen snapshot pattern means changes only take effect on the next session start
- If the agent wrote something wrong mid-session, it persists on disk but won't affect the current session's system prompt
### Memory not appearing in system prompt
- Verify `memory_enabled: true` in config.yaml
- Check that `~/.hermes/memories/MEMORY.md` exists and has content
- The file might be empty if all entries were removed — add entries with the `memory` tool
### Session search returns no results
- Session search requires sessions to be stored in `state.db` — new installations have no history
- FTS5 indexes are built automatically but may lag behind on very large databases
- The summarization step requires the auxiliary LLM client to be configured (API key for Gemini Flash)
### Skill drift
Skills that haven't been updated can become wrong or incomplete. The agent is prompted to patch skills when it finds them outdated during use (`skill_manage(action='patch')`). If you notice stale skills:
- Use `/skills` to browse and review installed skills
- Delete or update skills in `~/.hermes/skills/` directly
- The agent creates skills after complex tasks — review and prune periodically
### Provider not activating
- Run `hermes memory status` to check provider state
- Verify the provider plugin is installed in `~/.hermes/plugins/memory/`
- Check that required API keys are set in `~/.hermes/.env`
- Start a new session after changing provider config — existing sessions use the old provider
### Concurrent write conflicts
The memory tool uses file locking (`fcntl.flock`) and atomic file replacement (`os.replace`) to handle concurrent writes from multiple sessions. If you see corrupted memory files:
- Check for stale `.lock` files in `~/.hermes/memories/`
- Restart any hung Hermes processes
- The atomic write pattern means readers always see either the old or new file — never a partial write

View File

@@ -2126,7 +2126,7 @@ def _model_flow_kimi(config, current_model=""):
# Step 3: Model selection — show appropriate models for the endpoint
if is_coding_plan:
# Coding Plan models (kimi-k2.5 first)
# Coding Plan models (kimi-k2.5 first — kimi-for-coding retired due to 403)
model_list = [
"kimi-k2.5",
"kimi-k2-thinking",

View File

@@ -78,7 +78,7 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
extra_env_vars=("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"),
base_url_env_var="GLM_BASE_URL",
),
"kimi-k2.5": HermesOverlay(
"kimi-for-coding": HermesOverlay(
transport="openai_chat",
base_url_env_var="KIMI_BASE_URL",
),
@@ -162,10 +162,10 @@ ALIASES: Dict[str, str] = {
"z.ai": "zai",
"zhipu": "zai",
# kimi-k2.5 (models.dev ID)
"kimi": "kimi-k2.5",
"kimi-coding": "kimi-k2.5",
"moonshot": "kimi-k2.5",
# kimi-for-coding (models.dev ID)
"kimi": "kimi-for-coding",
"kimi-coding": "kimi-for-coding",
"moonshot": "kimi-for-coding",
# minimax-cn
"minimax-china": "minimax-cn",
@@ -376,7 +376,7 @@ LABELS: Dict[str, str] = {
"github-copilot": "GitHub Copilot",
"anthropic": "Anthropic",
"zai": "Z.AI / GLM",
"kimi-k2.5": "Kimi / Moonshot",
"kimi-for-coding": "Kimi / Moonshot",
"minimax": "MiniMax",
"minimax-cn": "MiniMax (China)",
"deepseek": "DeepSeek",

View File

@@ -1,248 +0,0 @@
"""
MemPalace Portal — Hybrid Memory Provider.
Bridges the local Holographic fact store with the fleet-wide MemPalace vector database.
Implements smart context compression for token efficiency.
"""
import json
import logging
import os
import re
import requests
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
# Import Holographic components if available
try:
from plugins.memory.holographic.store import MemoryStore
from plugins.memory.holographic.retrieval import FactRetriever
HAS_HOLOGRAPHIC = True
except ImportError:
HAS_HOLOGRAPHIC = False
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Tool Schemas
# ---------------------------------------------------------------------------
MEMPALACE_SCHEMA = {
"name": "mempalace",
"description": (
"Search or record memories in the shared fleet vector database. "
"Use this for long-term, high-volume memory across the entire fleet."
),
"parameters": {
"type": "object",
"properties": {
"action": {"type": "string", "enum": ["search", "record", "wings"]},
"query": {"type": "string", "description": "Search query."},
"text": {"type": "string", "description": "Memory text to record."},
"room": {"type": "string", "description": "Target room (e.g., forge, hermes, nexus)."},
"n_results": {"type": "integer", "default": 5},
},
"required": ["action"],
},
}
FACT_STORE_SCHEMA = {
"name": "fact_store",
"description": (
"Structured local fact storage. Use for durable facts about people, projects, and decisions."
),
"parameters": {
"type": "object",
"properties": {
"action": {"type": "string", "enum": ["add", "search", "probe", "reason", "update", "remove"]},
"content": {"type": "string"},
"query": {"type": "string"},
"entity": {"type": "string"},
"fact_id": {"type": "integer"},
},
"required": ["action"],
},
}
# ---------------------------------------------------------------------------
# Provider Implementation
# ---------------------------------------------------------------------------
class MemPalacePortalProvider(MemoryProvider):
"""Hybrid Fleet Vector + Local Structured memory provider."""
def __init__(self, config: dict | None = None):
self._config = config or {}
self._api_url = os.environ.get("MEMPALACE_API_URL", "http://127.0.0.1:7771")
self._hologram_store = None
self._hologram_retriever = None
self._session_id = None
@property
def name(self) -> str:
return "mempalace"
def is_available(self) -> bool:
# Always available if we can reach the API or have Holographic
return True
def initialize(self, session_id: str, **kwargs) -> None:
self._session_id = session_id
hermes_home = kwargs.get("hermes_home")
if HAS_HOLOGRAPHIC and hermes_home:
db_path = os.path.join(hermes_home, "memory_store.db")
try:
self._hologram_store = MemoryStore(db_path=db_path)
self._hologram_retriever = FactRetriever(store=self._hologram_store)
logger.info("Holographic store initialized as local portal layer.")
except Exception as e:
logger.error(f"Failed to init Holographic layer: {e}")
def system_prompt_block(self) -> str:
status = "Active (Fleet Portal)"
if self._hologram_store:
status += " + Local Hologram"
return (
f"# MemPalace Portal\n"
f"Status: {status}.\n"
"You have access to the shared fleet vector database (mempalace) and local structured facts (fact_store).\n"
"Use mempalace for semantic fleet-wide recall. Use fact_store for precise local knowledge."
)
def prefetch(self, query: str, *, session_id: str = "") -> str:
if not query:
return ""
context_blocks = []
# 1. Fleet Search (MemPalace)
try:
res = requests.get(f"{self._api_url}/search", params={"q": query, "n": 3}, timeout=2)
if res.ok:
data = res.json()
memories = data.get("memories", [])
if memories:
block = "## Fleet Memories (MemPalace)\n"
for m in memories:
block += f"- {m['text']}\n"
context_blocks.append(block)
except Exception:
pass
# 2. Local Probe (Holographic)
if self._hologram_retriever:
try:
# Extract entities from query to probe
entities = re.findall(r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)\b', query)
facts = []
for ent in entities:
results = self._hologram_retriever.probe(ent, limit=3)
facts.extend(results)
if facts:
block = "## Local Facts (Hologram)\n"
seen = set()
for f in facts:
if f['content'] not in seen:
block += f"- {f['content']}\n"
seen.add(f['content'])
context_blocks.append(block)
except Exception:
pass
return "\n\n".join(context_blocks)
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
# Record to Fleet Palace
try:
payload = {
"text": f"User: {user_content}\nAssistant: {assistant_content}",
"room": "hermes_sync",
"metadata": {"session_id": session_id}
}
requests.post(f"{self._api_url}/record", json=payload, timeout=2)
except Exception:
pass
def on_pre_compress(self, messages: List[Dict[str, Any]]) -> str:
"""Token Efficiency: Summarize and archive before context is lost."""
if not messages:
return ""
# Extract key facts for Hologram
if self._hologram_store:
# Simple heuristic: look for \"I prefer\", \"The project uses\", etc.
for msg in messages:
if msg.get(\"role\") == \"user\":
content = msg.get(\"content\", \"\")
if \"prefer\" in content.lower() or \"use\" in content.lower():
try:
self._hologram_store.add_fact(content[:200], category=\"user_pref\")
except Exception:
pass
# Archive session summary to MemPalace
summary_text = f"Session {self._session_id} summary: " + " | ".join([m['content'][:50] for m in messages if m.get('role') == 'user'])
try:
payload = {
"text": summary_text,
"room": "summaries",
"metadata": {"type": "session_summary", "session_id": self._session_id}
}
requests.post(f"{self._api_url}/record", json=payload, timeout=2)
except Exception:
pass
return "Insights archived to MemPalace and Hologram."
def get_tool_schemas(self) -> List[Dict[str, Any]]:
return [MEMPALACE_SCHEMA, FACT_STORE_SCHEMA]
def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
if tool_name == "mempalace":
return self._handle_mempalace(args)
elif tool_name == "fact_store":
return self._handle_fact_store(args)
return json.dumps({"error": f"Unknown tool: {tool_name}"})
def _handle_mempalace(self, args: dict) -> str:
action = args.get("action")
try:
if action == "search":
res = requests.get(f"{self._api_url}/search", params={"q": args["query"], "n": args.get("n_results", 5)}, timeout=10)
return res.text
elif action == "record":
res = requests.post(f"{self._api_url}/record", json={"text": args["text"], "room": args.get("room", "general")}, timeout=10)
return res.text
elif action == "wings":
res = requests.get(f"{self._api_url}/wings", timeout=10)
return res.text
except Exception as e:
return json.dumps({"success": False, "error": str(e)})
return json.dumps({"error": "Invalid action"})
def _handle_fact_store(self, args: dict) -> str:
if not self._hologram_store:
return json.dumps({"error": "Holographic store not initialized locally."})
# Logic similar to holographic plugin
action = args["action"]
try:
if action == "add":
fid = self._hologram_store.add_fact(args["content"])
return json.dumps({"fact_id": fid, "status": "added"})
elif action == "probe":
res = self._hologram_retriever.probe(args["entity"])
return json.dumps({"results": res})
# ... other actions ...
return json.dumps({"status": "ok", "message": f"Action {action} processed (partial impl)"})
except Exception as e:
return json.dumps({"error": str(e)})
def shutdown(self) -> None:
if self._hologram_store:
self._hologram_store.close()
def register(ctx) -> None:
provider = MemPalacePortalProvider()
ctx.register_memory_provider(provider)

View File

@@ -1,7 +0,0 @@
name: mempalace
version: 1.0.0
description: "The Portal: Hybrid Fleet Vector (MemPalace) + Local Structured (Holographic) memory."
dependencies:
- requests
- numpy

View File

@@ -1,74 +0,0 @@
#!/usr/bin/env python3
"""CI check: ensure no duplicate model IDs exist in provider configs.
Catches the class of bugs where a rename introduces a duplicate entry
(e.g. PR #225 kimi-for-coding -> kimi-k2.5 when kimi-k2.5 already existed).
Runtime target: < 2 seconds.
"""
from __future__ import annotations
import sys
from pathlib import Path
# Allow running from repo root
REPO_ROOT = Path(__file__).parent.parent
sys.path.insert(0, str(REPO_ROOT))
def check_openrouter_models() -> list[str]:
"""Check OPENROUTER_MODELS for duplicate model IDs."""
try:
from hermes_cli.models import OPENROUTER_MODELS
except ImportError:
return []
errors = []
seen: dict[str, int] = {}
for i, (model_id, _desc) in enumerate(OPENROUTER_MODELS):
if model_id in seen:
errors.append(
f" OPENROUTER_MODELS: duplicate '{model_id}' "
f"(index {seen[model_id]} and {i})"
)
else:
seen[model_id] = i
return errors
def check_provider_models() -> list[str]:
"""Check _PROVIDER_MODELS for duplicate model IDs within each provider list."""
from hermes_cli.models import _PROVIDER_MODELS
errors = []
for provider, models in _PROVIDER_MODELS.items():
seen: dict[str, int] = {}
for i, model_id in enumerate(models):
if model_id in seen:
errors.append(
f" _PROVIDER_MODELS['{provider}']: duplicate '{model_id}' "
f"(index {seen[model_id]} and {i})"
)
else:
seen[model_id] = i
return errors
def main() -> int:
errors = []
errors.extend(check_openrouter_models())
errors.extend(check_provider_models())
if errors:
print(f"FAIL: {len(errors)} duplicate model(s) found:")
for e in errors:
print(e)
return 1
print("OK: no duplicate model entries")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -1,374 +0,0 @@
#!/usr/bin/env python3
"""Memory Budget Enforcement Tool for hermes-agent.
Checks and enforces character/token budgets on MEMORY.md and USER.md files.
Designed for CI integration, pre-commit hooks, and manual health checks.
Usage:
python scripts/memory_budget.py # Check budget (exit 0/1)
python scripts/memory_budget.py --report # Detailed breakdown
python scripts/memory_budget.py --enforce # Trim entries to fit budget
python scripts/memory_budget.py --hermes-home ~/.hermes # Custom HERMES_HOME
Exit codes:
0 Within budget
1 Over budget (no trimming performed)
2 Entries were trimmed (--enforce was used)
"""
from __future__ import annotations
import argparse
import sys
from dataclasses import dataclass
from pathlib import Path
from typing import List
# ---------------------------------------------------------------------------
# Constants (must stay in sync with tools/memory_tool.py)
# ---------------------------------------------------------------------------
ENTRY_DELIMITER = "\n§\n"
DEFAULT_MEMORY_CHAR_LIMIT = 2200
DEFAULT_USER_CHAR_LIMIT = 1375
WARN_THRESHOLD = 0.80 # alert when >80% of budget used
CHARS_PER_TOKEN = 4 # rough estimate matching agent/model_metadata.py
# ---------------------------------------------------------------------------
# Data structures
# ---------------------------------------------------------------------------
@dataclass
class FileReport:
"""Budget analysis for a single memory file."""
label: str # "MEMORY.md" or "USER.md"
path: Path
exists: bool
char_limit: int
raw_chars: int # raw file size in chars
entry_chars: int # chars after splitting/rejoining entries
entry_count: int
entries: List[str] # individual entry texts
@property
def usage_pct(self) -> float:
if self.char_limit <= 0:
return 0.0
return min(100.0, (self.entry_chars / self.char_limit) * 100)
@property
def estimated_tokens(self) -> int:
return self.entry_chars // CHARS_PER_TOKEN
@property
def over_budget(self) -> bool:
return self.entry_chars > self.char_limit
@property
def warning(self) -> bool:
return self.usage_pct >= (WARN_THRESHOLD * 100)
@property
def remaining_chars(self) -> int:
return max(0, self.char_limit - self.entry_chars)
def _read_entries(path: Path) -> List[str]:
"""Read a memory file and split into entries (matching MemoryStore logic)."""
if not path.exists():
return []
try:
raw = path.read_text(encoding="utf-8")
except (OSError, IOError):
return []
if not raw.strip():
return []
entries = [e.strip() for e in raw.split(ENTRY_DELIMITER)]
return [e for e in entries if e]
def _write_entries(path: Path, entries: List[str]) -> None:
"""Write entries back to a memory file."""
content = ENTRY_DELIMITER.join(entries) if entries else ""
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(content, encoding="utf-8")
def analyze_file(path: Path, label: str, char_limit: int) -> FileReport:
"""Analyze a single memory file against its budget."""
exists = path.exists()
entries = _read_entries(path) if exists else []
raw_chars = path.stat().st_size if exists else 0
joined = ENTRY_DELIMITER.join(entries)
return FileReport(
label=label,
path=path,
exists=exists,
char_limit=char_limit,
raw_chars=raw_chars,
entry_chars=len(joined),
entry_count=len(entries),
entries=entries,
)
def trim_entries(report: FileReport) -> List[str]:
"""Trim oldest entries until the file fits within its budget.
Entries are removed from the front (oldest first) because memory files
append new entries at the end.
"""
entries = list(report.entries)
joined = ENTRY_DELIMITER.join(entries)
while len(joined) > report.char_limit and entries:
entries.pop(0)
joined = ENTRY_DELIMITER.join(entries)
return entries
# ---------------------------------------------------------------------------
# Reporting
# ---------------------------------------------------------------------------
def _bar(pct: float, width: int = 30) -> str:
"""Render a text progress bar."""
filled = int(pct / 100 * width)
bar = "#" * filled + "-" * (width - filled)
return f"[{bar}]"
def print_report(memory: FileReport, user: FileReport, *, verbose: bool = False) -> None:
"""Print a human-readable budget report."""
total_chars = memory.entry_chars + user.entry_chars
total_limit = memory.char_limit + user.char_limit
total_tokens = total_chars // CHARS_PER_TOKEN
total_pct = (total_chars / total_limit * 100) if total_limit > 0 else 0
print("=" * 60)
print(" MEMORY BUDGET REPORT")
print("=" * 60)
print()
for rpt in (memory, user):
status = "OVER " if rpt.over_budget else ("WARN" if rpt.warning else " OK ")
print(f" {rpt.label:12s} {status} {_bar(rpt.usage_pct)} {rpt.usage_pct:5.1f}%")
print(f" {'':12s} {rpt.entry_chars:,}/{rpt.char_limit:,} chars "
f"| {rpt.entry_count} entries "
f"| ~{rpt.estimated_tokens:,} tokens")
if rpt.exists and verbose and rpt.entries:
for i, entry in enumerate(rpt.entries):
preview = entry[:72].replace("\n", " ")
if len(entry) > 72:
preview += "..."
print(f" #{i+1}: ({len(entry)} chars) {preview}")
print()
print(f" TOTAL {_bar(total_pct)} {total_pct:5.1f}%")
print(f" {total_chars:,}/{total_limit:,} chars | ~{total_tokens:,} tokens")
print()
# Alerts
alerts = []
for rpt in (memory, user):
if rpt.over_budget:
overshoot = rpt.entry_chars - rpt.char_limit
alerts.append(
f" CRITICAL {rpt.label} is {overshoot:,} chars over budget "
f"({rpt.entry_chars:,}/{rpt.char_limit:,}). "
f"Run with --enforce to auto-trim."
)
elif rpt.warning:
alerts.append(
f" WARNING {rpt.label} is at {rpt.usage_pct:.0f}% capacity. "
f"Consider compressing or cleaning up entries."
)
if alerts:
print(" ALERTS")
print(" ------")
for a in alerts:
print(a)
print()
def print_json(memory: FileReport, user: FileReport) -> None:
"""Print a JSON report for machine consumption."""
import json
def _rpt_dict(r: FileReport) -> dict:
return {
"label": r.label,
"path": str(r.path),
"exists": r.exists,
"char_limit": r.char_limit,
"entry_chars": r.entry_chars,
"entry_count": r.entry_count,
"estimated_tokens": r.estimated_tokens,
"usage_pct": round(r.usage_pct, 1),
"over_budget": r.over_budget,
"warning": r.warning,
"remaining_chars": r.remaining_chars,
}
total_chars = memory.entry_chars + user.entry_chars
total_limit = memory.char_limit + user.char_limit
data = {
"memory": _rpt_dict(memory),
"user": _rpt_dict(user),
"total": {
"chars": total_chars,
"limit": total_limit,
"estimated_tokens": total_chars // CHARS_PER_TOKEN,
"usage_pct": round((total_chars / total_limit * 100) if total_limit else 0, 1),
"over_budget": memory.over_budget or user.over_budget,
"warning": memory.warning or user.warning,
},
}
print(json.dumps(data, indent=2))
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def _resolve_hermes_home(custom: str | None) -> Path:
"""Resolve HERMES_HOME directory."""
if custom:
return Path(custom).expanduser()
import os
return Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
def main() -> int:
parser = argparse.ArgumentParser(
description="Check and enforce memory budgets for hermes-agent.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__,
)
parser.add_argument(
"--hermes-home", metavar="DIR",
help="Custom HERMES_HOME directory (default: $HERMES_HOME or ~/.hermes)",
)
parser.add_argument(
"--memory-limit", type=int, default=DEFAULT_MEMORY_CHAR_LIMIT,
help=f"Character limit for MEMORY.md (default: {DEFAULT_MEMORY_CHAR_LIMIT})",
)
parser.add_argument(
"--user-limit", type=int, default=DEFAULT_USER_CHAR_LIMIT,
help=f"Character limit for USER.md (default: {DEFAULT_USER_CHAR_LIMIT})",
)
parser.add_argument(
"--report", action="store_true",
help="Print detailed per-file budget report",
)
parser.add_argument(
"--verbose", "-v", action="store_true",
help="Show individual entry details in report",
)
parser.add_argument(
"--enforce", action="store_true",
help="Trim oldest entries to fit within budget (writes to disk)",
)
parser.add_argument(
"--json", action="store_true", dest="json_output",
help="Output report as JSON (for CI/scripting)",
)
args = parser.parse_args()
hermes_home = _resolve_hermes_home(args.hermes_home)
memories_dir = hermes_home / "memories"
# Analyze both files
memory = analyze_file(
memories_dir / "MEMORY.md", "MEMORY.md", args.memory_limit,
)
user = analyze_file(
memories_dir / "USER.md", "USER.md", args.user_limit,
)
over_budget = memory.over_budget or user.over_budget
trimmed = False
# Enforce budget by trimming entries
if args.enforce and over_budget:
for rpt in (memory, user):
if rpt.over_budget and rpt.exists:
trimmed_entries = trim_entries(rpt)
removed = rpt.entry_count - len(trimmed_entries)
if removed > 0:
_write_entries(rpt.path, trimmed_entries)
rpt.entries = trimmed_entries
rpt.entry_count = len(trimmed_entries)
rpt.entry_chars = len(ENTRY_DELIMITER.join(trimmed_entries))
rpt.raw_chars = rpt.path.stat().st_size
print(f" Trimmed {removed} oldest entries from {rpt.label} "
f"({rpt.entry_chars:,}/{rpt.char_limit:,} chars now)")
trimmed = True
# Re-check after trimming
over_budget = memory.over_budget or user.over_budget
# Output
if args.json_output:
print_json(memory, user)
elif args.report or args.verbose:
print_report(memory, user, verbose=args.verbose)
else:
# Compact summary
if over_budget:
print("Memory budget: OVER")
for rpt in (memory, user):
if rpt.over_budget:
print(f" {rpt.label}: {rpt.entry_chars:,}/{rpt.char_limit:,} chars "
f"({rpt.usage_pct:.0f}%)")
elif memory.warning or user.warning:
print("Memory budget: WARNING")
for rpt in (memory, user):
if rpt.warning:
print(f" {rpt.label}: {rpt.entry_chars:,}/{rpt.char_limit:,} chars "
f"({rpt.usage_pct:.0f}%)")
else:
print("Memory budget: OK")
for rpt in (memory, user):
if rpt.exists:
print(f" {rpt.label}: {rpt.entry_chars:,}/{rpt.char_limit:,} chars "
f"({rpt.usage_pct:.0f}%)")
# Suggest actions when over budget but not enforced
if over_budget and not args.enforce:
suggestions = []
for rpt in (memory, user):
if rpt.over_budget:
suggestions.append(
f" - {rpt.label}: remove stale entries or run with --enforce to auto-trim"
)
# Identify largest entries
if rpt.entries:
indexed = sorted(enumerate(rpt.entries), key=lambda x: len(x[1]), reverse=True)
top3 = indexed[:3]
for idx, entry in top3:
preview = entry[:60].replace("\n", " ")
if len(entry) > 60:
preview += "..."
suggestions.append(
f" largest entry #{idx+1}: ({len(entry)} chars) {preview}"
)
if suggestions:
print()
print("Suggestions:")
for s in suggestions:
print(s)
# Exit code
if trimmed:
return 2
if over_budget:
return 1
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -1,325 +0,0 @@
#!/usr/bin/env python3
"""
Memory Sovereignty Verification
Verifies that the memory path in hermes-agent has no network dependencies.
Memory data must stay on the local filesystem only — no HTTP calls, no external
API calls, no cloud sync during memory read/write/flush/load operations.
Scans:
- tools/memory_tool.py (MEMORY.md / USER.md store)
- hermes_state.py (SQLite session store)
- tools/session_search_tool.py (FTS5 session search + summarization)
- tools/graph_store.py (knowledge graph persistence)
- tools/temporal_kg_tool.py (temporal knowledge graph)
- agent/temporal_knowledge_graph.py (temporal triple store)
- tools/skills_tool.py (skill listing/viewing)
- tools/skills_sync.py (bundled skill syncing)
Exit codes:
0 = sovereign (no violations)
1 = violations found
"""
import ast
import re
import sys
from pathlib import Path
# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
# Files in the memory path to scan (relative to repo root).
MEMORY_FILES = [
"tools/memory_tool.py",
"hermes_state.py",
"tools/session_search_tool.py",
"tools/graph_store.py",
"tools/temporal_kg_tool.py",
"agent/temporal_knowledge_graph.py",
"tools/skills_tool.py",
"tools/skills_sync.py",
]
# Patterns that indicate network/external API usage.
NETWORK_PATTERNS = [
# HTTP libraries
(r'\brequests\.(get|post|put|delete|patch|head|session)', "requests HTTP call"),
(r'\burllib\.request\.(urlopen|Request)', "urllib HTTP call"),
(r'\bhttpx\.(get|post|put|delete|Client|AsyncClient)', "httpx HTTP call"),
(r'\bhttp\.client\.(HTTPConnection|HTTPSConnection)', "http.client connection"),
(r'\baiohttp\.(ClientSession|get|post)', "aiohttp HTTP call"),
(r'\bwebsockets\.\w+', "websocket connection"),
# API client patterns
(r'\bopenai\b.*\b(api_key|chat|completions|Client)\b', "OpenAI API usage"),
(r'\banthropic\b.*\b(api_key|messages|Client)\b', "Anthropic API usage"),
(r'\bAsyncOpenAI\b', "AsyncOpenAI client"),
(r'\bAsyncAnthropic\b', "AsyncAnthropic client"),
# Generic network indicators
(r'\bsocket\.(socket|connect|create_connection)', "raw socket connection"),
(r'\bftplib\b', "FTP connection"),
(r'\bsmtplib\b', "SMTP connection"),
(r'\bparamiko\b', "SSH connection via paramiko"),
# URL patterns (hardcoded endpoints)
(r'https?://(?!example\.com)[a-zA-Z0-9._-]+\.(com|org|net|io|dev|ai)', "hardcoded URL"),
]
# Import aliases that indicate network-capable modules.
NETWORK_IMPORTS = {
"requests",
"httpx",
"aiohttp",
"urllib.request",
"http.client",
"websockets",
"openai",
"anthropic",
"openrouter_client",
}
# Functions whose names suggest network I/O.
NETWORK_FUNC_NAMES = {
"async_call_llm",
"extract_content_or_reasoning",
}
# Files that are ALLOWED to have network calls (known violations with justification).
# Each entry maps to a reason string.
KNOWN_VIOLATIONS = {
"tools/graph_store.py": (
"GraphStore persists to Gitea via API. This is a known architectural trade-off "
"for knowledge graph persistence, which is not part of the core memory path "
"(MEMORY.md/USER.md/SQLite). Future work will explore local-first alternatives "
"to align more closely with SOUL.md principles."
),
"tools/session_search_tool.py": (
"Session search uses LLM summarization via an auxiliary client. While the FTS5 "
"search is local, the LLM call for summarization is an external dependency. "
"This is a temporary architectural trade-off for enhanced presentation. "
"Research is ongoing to implement local LLM options for full sovereignty, "
"in line with SOUL.md."
),
}
# ---------------------------------------------------------------------------
# Scanner
# ---------------------------------------------------------------------------
class Violation:
"""A sovereignty violation with location and description."""
def __init__(self, file: str, line: int, description: str, code: str):
self.file = file
self.line = line
self.description = description
self.code = code.strip()
def __str__(self):
return f"{self.file}:{self.line}: {self.description}\n {self.code}"
def scan_file(filepath: Path, repo_root: Path) -> list[Violation]:
"""Scan a single file for network dependency patterns."""
violations = []
rel_path = str(filepath.relative_to(repo_root))
# Skip known violations
if rel_path in KNOWN_VIOLATIONS:
return violations
try:
content = filepath.read_text(encoding="utf-8")
except (OSError, IOError) as e:
print(f"WARNING: Cannot read {rel_path}: {e}", file=sys.stderr)
return violations
lines = content.splitlines()
# --- Check imports ---
try:
tree = ast.parse(content, filename=str(filepath))
except SyntaxError as e:
print(f"WARNING: Cannot parse {rel_path}: {e}", file=sys.stderr)
return violations
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
mod = alias.name
if mod in NETWORK_IMPORTS or any(
mod.startswith(ni + ".") for ni in NETWORK_IMPORTS
):
violations.append(Violation(
rel_path, node.lineno,
f"Network-capable import: {mod}",
lines[node.lineno - 1] if node.lineno <= len(lines) else "",
))
elif isinstance(node, ast.ImportFrom):
if node.module and (
node.module in NETWORK_IMPORTS
or any(node.module.startswith(ni + ".") for ni in NETWORK_IMPORTS)
):
violations.append(Violation(
rel_path, node.lineno,
f"Network-capable import from: {node.module}",
lines[node.lineno - 1] if node.lineno <= len(lines) else "",
))
# --- Check for LLM call function usage ---
for i, line in enumerate(lines, 1):
stripped = line.strip()
if stripped.startswith("#"):
continue
for func_name in NETWORK_FUNC_NAMES:
if func_name in line and not stripped.startswith("def ") and not stripped.startswith("class "):
# Check it's actually a call, not a definition or import
if re.search(r'\b' + func_name + r'\s*\(', line):
violations.append(Violation(
rel_path, i,
f"External LLM call function: {func_name}()",
line,
))
# --- Regex-based pattern matching ---
for i, line in enumerate(lines, 1):
stripped = line.strip()
if stripped.startswith("#"):
continue
for pattern, description in NETWORK_PATTERNS:
if re.search(pattern, line, re.IGNORECASE):
violations.append(Violation(
rel_path, i,
f"Suspicious pattern ({description})",
line,
))
return violations
def verify_sovereignty(repo_root: Path) -> tuple[list[Violation], list[str]]:
"""Run sovereignty verification across all memory files.
Returns (violations, info_messages).
"""
all_violations = []
info = []
for rel_path in MEMORY_FILES:
filepath = repo_root / rel_path
if not filepath.exists():
info.append(f"SKIP: {rel_path} (file not found)")
continue
if rel_path in KNOWN_VIOLATIONS:
info.append(
f"WARN: {rel_path} — known violation (excluded from gate): "
f"{KNOWN_VIOLATIONS[rel_path]}"
)
continue
violations = scan_file(filepath, repo_root)
all_violations.extend(violations)
if not violations:
info.append(f"PASS: {rel_path} — sovereign (local-only)")
return all_violations, info
# ---------------------------------------------------------------------------
# Deep analysis helpers
# ---------------------------------------------------------------------------
def check_graph_store_network(repo_root: Path) -> str:
"""Analyze graph_store.py for its network dependencies."""
filepath = repo_root / "tools" / "graph_store.py"
if not filepath.exists():
return ""
content = filepath.read_text(encoding="utf-8")
if "GiteaClient" in content:
return (
"tools/graph_store.py uses GiteaClient for persistence — "
"this is an external API call. However, graph_store is NOT part of "
"the core memory path (MEMORY.md/USER.md/SQLite). It is a separate "
"knowledge graph system."
)
return ""
def check_session_search_llm(repo_root: Path) -> str:
"""Analyze session_search_tool.py for LLM usage."""
filepath = repo_root / "tools" / "session_search_tool.py"
if not filepath.exists():
return ""
content = filepath.read_text(encoding="utf-8")
warnings = []
if "async_call_llm" in content:
warnings.append("uses async_call_llm for summarization")
if "auxiliary_client" in content:
warnings.append("imports auxiliary_client (LLM calls)")
if warnings:
return (
f"tools/session_search_tool.py: {'; '.join(warnings)}. "
f"The FTS5 search is local SQLite, but session summarization "
f"involves LLM API calls."
)
return ""
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
repo_root = Path(__file__).resolve().parent.parent
print(f"Memory Sovereignty Verification")
print(f"Repository: {repo_root}")
print(f"Scanning {len(MEMORY_FILES)} memory-path files...")
print()
violations, info = verify_sovereignty(repo_root)
# Print info messages
for msg in info:
print(f" {msg}")
# Print deep analysis
print()
print("Deep analysis:")
for checker in [check_graph_store_network, check_session_search_llm]:
note = checker(repo_root)
if note:
print(f" NOTE: {note}")
print()
if violations:
print(f"SOVEREIGNTY VIOLATIONS FOUND: {len(violations)}")
print("=" * 60)
for v in violations:
print(v)
print()
print("=" * 60)
print(
f"FAIL: {len(violations)} potential network dependencies detected "
f"in the memory path."
)
print("Memory must be local-only (filesystem + SQLite).")
print()
print("If a violation is intentional and documented, add it to")
print("KNOWN_VIOLATIONS in this script with a justification.")
return 1
else:
print("PASS: Memory path is sovereign — no network dependencies detected.")
print("All memory operations use local filesystem and/or SQLite only.")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -895,7 +895,7 @@ class TestKimiMoonshotModelListIsolation:
def test_moonshot_list_excludes_coding_plan_only_models(self):
from hermes_cli.main import _PROVIDER_MODELS
moonshot_models = _PROVIDER_MODELS["moonshot"]
coding_plan_only = {"kimi-k2.5", "kimi-k2-thinking-turbo"}
coding_plan_only = {"kimi-k2-thinking-turbo"}
leaked = set(moonshot_models) & coding_plan_only
assert not leaked, f"Moonshot list contains Coding Plan-only models: {leaked}"