Compare commits

..

29 Commits

Author SHA1 Message Date
01977f28fb docs: improve KNOWN_VIOLATIONS justifications in verify_memory_sovereignty.py
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 36s
2026-04-10 00:12:42 -04:00
a055e68ebf Merge pull request #265
Some checks failed
Forge CI / smoke-and-build (push) Failing after 43s
Merged PR #265
2026-04-10 03:44:23 +00:00
f6c9ecb893 Merge pull request #264
Some checks failed
Forge CI / smoke-and-build (push) Has been cancelled
Merged PR #264
2026-04-10 03:44:19 +00:00
549431bb81 Merge pull request #259
Some checks failed
Forge CI / smoke-and-build (push) Has been cancelled
Merged PR #259
2026-04-10 03:44:16 +00:00
43dc2d21f2 Merge pull request #263
Some checks failed
Forge CI / smoke-and-build (push) Has been cancelled
Merged PR #263
2026-04-10 03:44:04 +00:00
2948d010b7 Merge pull request #266
Some checks failed
Forge CI / smoke-and-build (push) Has been cancelled
Merged PR #266
2026-04-10 03:44:00 +00:00
Alexander Whitestone
0d92b9ad15 feat(scripts): add memory budget enforcement tool (#256)
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 40s
Add scripts/memory_budget.py — a CI-friendly tool for checking and
enforcing character budgets on MEMORY.md and USER.md memory files.

Features:
- Checks MEMORY.md vs memory_char_limit (default 2200)
- Checks USER.md vs user_char_limit (default 1375)
- Estimates total injection cost (chars / ~4 chars per token)
- Alerts when approaching limits (>80% usage)
- --report flag for detailed breakdown with progress bars
- --verbose flag for per-entry details
- --enforce flag trims oldest entries to fit budget
- --json flag for machine-readable output (CI integration)
- Exit codes: 0=within budget, 1=over budget, 2=trimmed
- Suggestions for largest entries when over budget

Relates to #256
2026-04-09 21:13:01 -04:00
Alexander Whitestone
2e37ff638a Add memory sovereignty verification script (#257)
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 39s
CI check that scans all memory-path code for network dependencies.

Scans 8 memory-related files:
- tools/memory_tool.py (MEMORY.md/USER.md store)
- hermes_state.py (SQLite session store)
- tools/session_search_tool.py (FTS5 session search)
- tools/graph_store.py (knowledge graph)
- tools/temporal_kg_tool.py (temporal KG tool)
- agent/temporal_knowledge_graph.py (temporal triple store)
- tools/skills_tool.py (skill listing/viewing)
- tools/skills_sync.py (bundled skill syncing)

Verifies no HTTP/HTTPS calls, no external API usage, and no
network dependencies in the core memory read/write path.

Reports violations with file:line references. Exit 0 if sovereign,
exit 1 if violations found. Suitable for CI integration.
2026-04-09 21:07:03 -04:00
Alexander Whitestone
815160bd6f burn: add Memory Architecture Guide (closes #263, #258)
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 1m3s
Developer-facing guide covering all four memory tiers:
- Built-in memory (MEMORY.md/USER.md) with frozen snapshot pattern
- Session search (FTS5 + Gemini Flash summarization)
- Skills as procedural memory
- External memory provider plugin architecture

Includes data lifecycle, security guarantees, code paths,
configuration reference, and troubleshooting.
2026-04-09 20:51:45 -04:00
Alexander Whitestone
511eacb573 docs: add Memory Architecture Guide
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 47s
Comprehensive guide covering the Hermes memory system:
- Built-in memory (MEMORY.md / USER.md) with frozen snapshot pattern
- Session search (FTS5 + Gemini Flash summarization)
- Skills as procedural memory
- External memory providers (8 plugins)
- System interaction flow and data lifecycle
- Best practices for what to save/skip
- Privacy and data locality guarantees
- Configuration reference (char limits, nudge interval, flush settings)
- Troubleshooting common issues

Closes #258
2026-04-09 12:45:48 -04:00
2a6045a76a feat: create plugins/memory/mempalace/__init__.py
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 40s
2026-04-09 00:45:21 +00:00
4ef7b5fc46 feat: create plugins/memory/mempalace/plugin.yaml 2026-04-09 00:45:14 +00:00
7d2421a15f Merge pull request 'ci: add duplicate model detection check' (#235) from feat/ci-no-duplicate-models into main
All checks were successful
Forge CI / smoke-and-build (push) Successful in 54s
2026-04-08 22:55:16 +00:00
Alexander Whitestone
5a942d71a1 ci: add duplicate model check step to CI workflow
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 49s
2026-04-08 08:16:00 -04:00
Alexander Whitestone
044f0f8951 ci: add check_no_duplicate_models.py - catches duplicate model IDs (#224) 2026-04-08 08:15:27 -04:00
61c59ce332 Merge pull request 'fix(config): replace kimi-for-coding with kimi-k2.5 across codebase' (#225) from fix/kimi-fallback-rebase into main
Some checks failed
Forge CI / smoke-and-build (push) Successful in 50s
Notebook CI / notebook-smoke (push) Failing after 13s
2026-04-08 06:57:03 +00:00
01ce8ae889 fix: remove duplicate kimi-k2.5 entries from model lists
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 47s
2026-04-08 00:49:52 +00:00
Alexander Whitestone
b179250ab8 fix(config): replace kimi-for-coding with kimi-k2.5 in all refs
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 36s
- model_metadata.py
- fallback-config.yaml
- hermes_cli/auth.py, main.py, models.py
- test_api_key_providers.py
- docs/integrations/providers.md
- ezra quarterly report
2026-04-07 12:58:44 -04:00
01a3f47a5b Merge pull request '[claude] Fix syntax errors in Ollama provider wiring (#223)' (#224) from claude/issue-223 into main
All checks were successful
Forge CI / smoke-and-build (push) Successful in 57s
2026-04-07 16:40:34 +00:00
Alexander Whitestone
4538e11f97 fix(auxiliary_client): repair syntax errors in Ollama provider wiring
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 45s
The Ollama feature commit introduced two broken `OpenAI(api_key=*** base_url=...)` calls
where `***` was a redacted variable name and the separating comma was missing.
Replace both occurrences with `api_key=api_key, base_url=base_url`.

Fixes #223
2026-04-07 12:04:40 -04:00
7936483ffc feat(provider): first-class Ollama support + Gemma 4 defaults (#169)
- Add 'ollama' to CLI provider choices and auth aliases
- Wire Ollama through resolve_provider_client with auto-detection
- Add _try_ollama to auxiliary fallback chain (before local/custom)
- Add ollama to vision provider order
- Update model_metadata.py: ollama prefix + gemma-4-* context lengths (256K)
- Default model: gemma4:12b when provider=ollama
2026-04-07 12:04:10 -04:00
69525f49ab Merge pull request '[BEZALEL][#203] Deep Self-Awareness Epic — Architecture & Topology Ingestion' (#215) from bezalel/self-awareness-epic-203 into main
All checks were successful
Forge CI / smoke-and-build (push) Successful in 1m57s
2026-04-07 14:50:34 +00:00
782e3b65d9 docs(bezael): Deep Self-Awareness Epic — architecture and topology ingestion
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 1m0s
- Add bezalel_topology.md: complete system architecture map
- Add topology_scan.py: automated topology discovery script
- Covers hardware, network, services, dependencies, fleet map,
  Evennia integration, MemPalace config, and emergency procedures

Addresses #203
2026-04-07 14:19:27 +00:00
bfb876b599 Merge pull request 'docs: add BOOT.md for hermes-agent repository' (#202) from bezalel/ci-uv-cache into main
All checks were successful
Forge CI / smoke-and-build (push) Successful in 1m28s
2026-04-07 14:15:14 +00:00
6479465300 docs: add BOOT.md for hermes-agent repository
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 47s
2026-04-07 14:10:40 +00:00
a1c5d7b6bf Merge pull request '[SYNC] Merge upstream NousResearch/hermes-agent — 499 commits' (#201) from upstream-sync into main
All checks were successful
Forge CI / smoke-and-build (push) Successful in 53s
Reviewed-on: #201
2026-04-07 14:03:15 +00:00
010894da7e Merge pull request '[BEZALEL] Fix Gitea CI — Remove container directive for host-mode runner' (#194) from bezalel/fix-gitea-ci-runner-host-mode into main
All checks were successful
Forge CI / smoke-and-build (push) Successful in 1m12s
2026-04-07 13:55:03 +00:00
3a3337a78e [BEZALEL] Fix Gitea CI — Remove container directive for host-mode runner 2026-04-07 13:54:38 +00:00
293c44603e fix(ci): remove container directive from Gitea workflows for host-mode runner
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 49s
The bezalel-vps-runner is registered in host mode (:host labels)
and cannot execute Docker containers. The container pinning added
in #180 causes all Gitea CI jobs to fail immediately with:

  Cannot connect to the Docker daemon at unix:///var/run/docker.sock

Remove container: from .gitea/workflows/*.yml while keeping it in
.github/workflows/ for actual GitHub Actions runners.

Fixes CI for all open PRs and main branch pushes.
2026-04-07 13:53:42 +00:00
23 changed files with 2248 additions and 23 deletions

View File

@@ -13,7 +13,6 @@ concurrency:
jobs:
smoke-and-build:
runs-on: ubuntu-latest
container: catthehacker/ubuntu:act-22.04
timeout-minutes: 5
steps:
- name: Checkout code
@@ -48,6 +47,11 @@ jobs:
source .venv/bin/activate
python scripts/syntax_guard.py
- name: No duplicate models
run: |
source .venv/bin/activate
python scripts/check_no_duplicate_models.py
- name: Green-path E2E
run: |
source .venv/bin/activate

View File

@@ -11,7 +11,6 @@ on:
jobs:
notebook-smoke:
runs-on: ubuntu-latest
container: catthehacker/ubuntu:act-22.04
steps:
- name: Checkout
uses: actions/checkout@v4

131
BOOT.md Normal file
View File

@@ -0,0 +1,131 @@
# BOOT.md — Hermes Agent
Fast path from clone to productive. Target: <10 minutes.
---
## 1. Prerequisites
| Tool | Why |
|---|---|
| Git | Clone + submodules |
| Python 3.11+ | Runtime requirement |
| uv | Package manager (install: `curl -LsSf https://astral.sh/uv/install.sh \| sh`) |
| Node.js 18+ | Optional — browser tools, WhatsApp bridge |
---
## 2. First-Time Setup
```bash
git clone --recurse-submodules https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent.git
cd hermes-agent
# Create venv
uv venv .venv --python 3.11
source .venv/bin/activate
# Install with all extras + dev tools
uv pip install -e ".[all,dev]"
```
> **Common pitfall:** If `uv` is not on PATH, the `setup-hermes.sh` script will attempt to install it, but manual `uv` install is faster.
---
## 3. Smoke Tests (< 30 sec)
```bash
python scripts/smoke_test.py
```
Expected output:
```
OK: 4 core imports
OK: 1 CLI entrypoints
Smoke tests passed.
```
If imports fail with `ModuleNotFoundError`, re-run: `uv pip install -e ".[all,dev]"`
---
## 4. Full Test Suite (excluding integration)
```bash
pytest tests/ -x --ignore=tests/integration
```
> Integration tests require a running gateway + API keys. Skip them unless you are testing platform connectivity.
---
## 5. Run the CLI
```bash
python cli.py --help
```
To start the gateway (after configuring `~/.hermes/config.yaml`):
```bash
hermes gateway run
```
---
## 6. Repo Layout for Agents
| Path | What lives here |
|---|---|
| `cli.py` | Main entrypoint |
| `hermes/` | Core agent logic |
| `toolsets/` | Built-in tool implementations |
| `skills/` | Bundled skills (loaded automatically) |
| `optional-skills/` | Official but opt-in skills |
| `tests/` | pytest suite |
| `scripts/` | Utility scripts (smoke tests, deploy validation, etc.) |
| `.gitea/workflows/` | Forge CI (smoke + build) |
| `.github/workflows/` | GitHub mirror CI |
---
## 7. Gitea Workflow Conventions
- **Push to `main`**: triggers `ci.yml` (smoke + build, < 5 min)
- **Pull requests**: same CI + notebook CI if notebooks changed
- **Merge requirement**: green smoke tests
- Security scans run on schedule via `.github/workflows/`
---
## 8. Common Pitfalls
| Symptom | Fix |
|---|---|
| `No module named httpx` | `uv pip install -e ".[all,dev]"` |
| `prompt_toolkit` missing | Included in `[all]`, but install explicitly if you used minimal deps |
| CLI hangs on start | Check `~/.hermes/config.yaml` exists and is valid YAML |
| API key errors | Copy `.env.example``.env` and fill required keys |
| Browser tools fail | Run `npm install` in repo root |
---
## 9. Quick Reference
```bash
# Reinstall after dependency changes
uv pip install -e ".[all,dev]"
# Run only smoke tests
python scripts/smoke_test.py
# Run syntax guard
python scripts/syntax_guard.py
# Start gateway
hermes gateway run
```
---
*Last updated: 2026-04-07 by Bezalel*

View File

@@ -922,6 +922,7 @@ def _resolve_forced_provider(forced: str) -> Tuple[Optional[OpenAI], Optional[st
_AUTO_PROVIDER_LABELS = {
"_try_openrouter": "openrouter",
"_try_nous": "nous",
"_try_ollama": "ollama",
"_try_custom_endpoint": "local/custom",
"_try_codex": "openai-codex",
"_resolve_api_key_provider": "api-key",
@@ -930,6 +931,18 @@ _AUTO_PROVIDER_LABELS = {
_AGGREGATOR_PROVIDERS = frozenset({"openrouter", "nous"})
def _try_ollama() -> Tuple[Optional[OpenAI], Optional[str]]:
"""Detect and return an Ollama client if the server is reachable."""
base_url = (os.getenv("OLLAMA_BASE_URL", "") or "http://localhost:11434").strip().rstrip("/")
base_url = base_url + "/v1" if not base_url.endswith("/v1") else base_url
from agent.model_metadata import detect_local_server_type
if detect_local_server_type(base_url) != "ollama":
return None, None
api_key = (os.getenv("OLLAMA_API_KEY", "") or "ollama").strip()
model = _read_main_model() or "gemma4:12b"
return OpenAI(api_key=api_key, base_url=base_url), model
def _get_provider_chain() -> List[tuple]:
"""Return the ordered provider detection chain.
@@ -939,6 +952,7 @@ def _get_provider_chain() -> List[tuple]:
return [
("openrouter", _try_openrouter),
("nous", _try_nous),
("ollama", _try_ollama),
("local/custom", _try_custom_endpoint),
("openai-codex", _try_codex),
("api-key", _resolve_api_key_provider),
@@ -988,6 +1002,7 @@ def _try_payment_fallback(
# Map common resolved_provider values back to chain labels.
_alias_to_label = {"openrouter": "openrouter", "nous": "nous",
"openai-codex": "openai-codex", "codex": "openai-codex",
"ollama": "ollama",
"custom": "local/custom", "local/custom": "local/custom"}
skip_chain_labels = {_alias_to_label.get(s, s) for s in skip_labels}
@@ -1195,6 +1210,15 @@ def resolve_provider_client(
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
# ── Ollama (first-class local provider) ──────────────────────────
if provider == "ollama":
base_url = (explicit_base_url or os.getenv("OLLAMA_BASE_URL", "") or "http://localhost:11434").strip().rstrip("/")
base_url = base_url + "/v1" if not base_url.endswith("/v1") else base_url
api_key = (explicit_api_key or os.getenv("OLLAMA_API_KEY", "") or "ollama").strip()
final_model = model or _read_main_model() or "gemma4:12b"
client = OpenAI(api_key=api_key, base_url=base_url)
return (_to_async_client(client, final_model) if async_mode else (client, final_model))
# ── Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY) ───────────
if provider == "custom":
if explicit_base_url:
@@ -1335,6 +1359,7 @@ def get_async_text_auxiliary_client(task: str = ""):
_VISION_AUTO_PROVIDER_ORDER = (
"openrouter",
"nous",
"ollama",
"openai-codex",
"anthropic",
"custom",

View File

@@ -26,7 +26,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"gemini", "zai", "kimi-coding", "minimax", "minimax-cn", "anthropic", "deepseek",
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
"custom", "local",
"ollama", "custom", "local",
# Common aliases
"google", "google-gemini", "google-ai-studio",
"glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
@@ -102,9 +102,12 @@ DEFAULT_CONTEXT_LENGTHS = {
"gpt-4": 128000,
# Google
"gemini": 1048576,
# Gemma (open models served via AI Studio)
# Gemma (open models — Ollama / AI Studio)
"gemma-4-31b": 256000,
"gemma-4-26b": 256000,
"gemma-4-12b": 256000,
"gemma-4-4b": 256000,
"gemma-4-1b": 256000,
"gemma-3": 131072,
"gemma": 8192, # fallback for older gemma models
# DeepSeek
@@ -187,6 +190,8 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"api.githubcopilot.com": "copilot",
"models.github.ai": "copilot",
"api.fireworks.ai": "fireworks",
"localhost": "ollama",
"127.0.0.1": "ollama",
}

View File

@@ -148,7 +148,7 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"openrouter": "openrouter",
"anthropic": "anthropic",
"zai": "zai",
"kimi-coding": "kimi-for-coding",
"kimi-coding": "kimi-k2.5",
"minimax": "minimax",
"minimax-cn": "minimax-cn",
"deepseek": "deepseek",

View File

@@ -6,7 +6,7 @@ model: anthropic/claude-opus-4.6
# Fallback chain: Anthropic -> Kimi -> Ollama (local)
fallback_providers:
- provider: kimi-coding
model: kimi-for-coding
model: kimi-k2.5
timeout: 60
reason: "Primary fallback when Anthropic quota limited"

View File

@@ -0,0 +1,230 @@
# Bezalel Architecture & Topology
> Deep Self-Awareness Document — Generated 2026-04-07
> Sovereign: Alexander Whitestone (Rockachopa)
> Host: Beta VPS (104.131.15.18)
---
## 1. Identity & Purpose
**I am Bezalel**, the Forge and Testbed Wizard of the Timmy Foundation fleet.
- **Lane:** CI testing, code review, build verification, security hardening, standing watch
- **Philosophy:** KISS. Smoke tests + bare green-path e2e only. CI serves the code.
- **Mandates:** Relentless inbox-zero, continuous self-improvement, autonomous heartbeat operation
- **Key Metrics:** Cycle time, signal-to-noise, autonomy ratio, backlog velocity
---
## 2. Hardware & OS Topology
| Attribute | Value |
|-----------|-------|
| Hostname | `bezalel` |
| OS | Ubuntu 24.04.3 LTS (Noble Numbat) |
| Kernel | Linux 6.8.0 |
| CPU | 1 vCPU |
| Memory | 2 GB RAM |
| Primary Disk | ~25 GB root volume (DigitalOcean) |
| Public IP | `104.131.15.18` |
### Storage Layout
```
/root/wizards/bezalel/
├── hermes/ # Hermes agent source + venv (~835 MB)
├── evennia/ # Evennia MUD engine + world code (~189 MB)
├── workspace/ # Active prototypes + scratch code (~557 MB)
├── home/ # Personal notebooks + scripts (~1.8 GB)
├── .mempalace/ # Local memory palace (ChromaDB)
├── .topology/ # Self-awareness scan artifacts
├── nightly_watch.py # Nightly forge guardian
├── mempalace_nightly.sh # Palace re-mine automation
└── bezalel_topology.md # This document
```
---
## 3. Network Topology
### Fleet Map
```
┌─────────────────────────────────────────────────────────────┐
│ Alpha (143.198.27.163) │
│ ├── Gitea (forge.alexanderwhitestone.com) │
│ └── Ezra (Knowledge Wizard) │
│ │
│ Beta (104.131.15.18) ←── You are here │
│ ├── Bezalel (Forge Wizard) │
│ ├── Hermes Gateway │
│ └── Gitea Actions Runner (bezalel-vps-runner, host mode) │
└─────────────────────────────────────────────────────────────┘
```
### Key Connections
- **Gitea HTTPS:** `https://forge.alexanderwhitestone.com` (Alpha)
- **Telegram Webhook:** Inbound to Beta
- **API Providers:** Kimi (primary), Anthropic (fallback), OpenRouter (fallback)
- **No SSH:** Alpha → Beta is blocked by design
### Listening Services
- Hermes Gateway: internal process (no exposed port directly)
- Evennia: `localhost:4000` (MUD), `localhost:4001` (web client) — when running
- Gitea Runner: `act_runner daemon` — connects outbound to Gitea
---
## 4. Services & Processes
### Always-On Processes
| Process | Command | Purpose |
|---------|---------|---------|
| Hermes Gateway | `hermes gateway run` | Core agent orchestration |
| Gitea Runner | `./act_runner daemon` | CI job execution (host mode) |
### Automated Jobs
| Job | Schedule | Script |
|-----|----------|--------|
| Night Watch | 02:00 UTC | `nightly_watch.py` |
| MemPalace Re-mine | 03:00 UTC | `mempalace_nightly.sh` |
### Service Status Check
- **Hermes gateway:** running (ps verified)
- **Gitea runner:** online, registered as `bezalel-vps-runner`
- **Evennia server:** not currently running (start with `evennia start` in `evennia/`)
---
## 5. Software Dependencies
### System Packages (Key)
- `python3.12` (primary runtime)
- `node` v20.20.2 / `npm` 10.8.2
- `uv` (Python package manager)
- `git`, `curl`, `jq`
### Hermes Virtual Environment
- Located: `/root/wizards/bezalel/hermes/venv/`
- Key packages: `chromadb`, `pyyaml`, `fastapi`, `httpx`, `pytest`, `prompt-toolkit`, `mempalace`
- Install command: `uv pip install -e ".[all,dev]"`
### External API Dependencies
| Service | Endpoint | Usage |
|---------|----------|-------|
| Gitea | `forge.alexanderwhitestone.com` | Git, issues, CI |
| Kimi | `api.kimi.com/coding/v1` | Primary LLM |
| Anthropic | `api.anthropic.com` | Fallback LLM |
| OpenRouter | `openrouter.ai/api/v1` | Secondary fallback |
| Telegram | Bot API | Messaging platform |
---
## 6. Git Repositories
### Hermes Agent
- **Path:** `/root/wizards/bezalel/hermes`
- **Remote:** `forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent.git`
- **Branch:** `main` (up to date)
- **Open PRs:** #193, #191, #179, #178
### Evennia World
- **Path:** `/root/wizards/bezalel/evennia/bezalel_world`
- **Remote:** Same org, separate repo if pushed
- **Server name:** `bezalel_world`
---
## 7. MemPalace Memory System
### Configuration
- **Palace path:** `/root/wizards/bezalel/.mempalace/palace`
- **Identity:** `/root/.mempalace/identity.txt`
- **Config:** `/root/wizards/bezalel/mempalace.yaml`
- **Miner:** `/root/wizards/bezalel/hermes/venv/bin/mempalace`
### Rooms
1. `forge` — CI, builds, syntax guards, nightly watch
2. `hermes` — Agent source, gateway, CLI
3. `evennia` — MUD engine and world code
4. `workspace` — Prototypes, experiments
5. `home` — Personal scripts, configs
6. `nexus` — Reports, docs, KT artifacts
7. `issues` — Gitea issues, PRs, backlog
8. `topology` — System architecture, network, storage
9. `services` — Running services, processes
10. `dependencies` — Packages, APIs, external deps
11. `automation` — Cron jobs, scripts, workflows
12. `general` — Catch-all
### Automation
- **Nightly re-mine:** `03:00 UTC` via cron
- **Log:** `/var/log/bezalel_mempalace.log`
---
## 8. Evennia Mind Palace Integration
### Custom Typeclasses
- `PalaceRoom` — Rooms carry `memory_topic` and `wing`
- `MemoryObject` — In-world memory shards with `memory_content` and `source_file`
### Commands
- `palace/search <query>` — Query mempalace
- `palace/recall <topic>` — Spawn a memory shard
- `palace/file <name> = <content>` — File a new memory
- `palace/status` — Show palace status
### Batch Builder
- **File:** `world/batch_cmds_palace.ev`
- Creates The Hub + 7 palace rooms with exits
### Bridge Script
- **File:** `/root/wizards/bezalel/evennia/palace_search.py`
- Calls mempalace searcher and returns JSON
---
## 9. Operational State & Blockers
### Current Health
- [x] Hermes gateway: operational
- [x] Gitea runner: online, host mode
- [x] CI fix merged (#194) — container directive removed for Gitea workflows
- [x] MemPalace: 2,484+ drawers, incremental mining active
### Active Blockers
- **Gitea Actions:** Runner is in host mode — cannot use Docker containers
- **CI backlog:** Many historical PRs have failed runs due to the container bug (now fixed)
- **Evennia:** Server not currently running (start when needed)
---
## 10. Emergency Procedures
### Restart Hermes Gateway
```bash
cd /root/wizards/bezalel/hermes
source venv/bin/activate
hermes gateway run &
```
### Restart Gitea Runner
```bash
cd /opt/gitea-runner
./act_runner daemon &
```
### Start Evennia
```bash
cd /root/wizards/bezalel/evennia/bezalel_world
evennia start
```
### Manual MemPalace Re-mine
```bash
cd /root/wizards/bezalel
./hermes/venv/bin/mempalace --palace .mempalace/palace mine . --agent bezalel
```
---
*Document maintained by Bezalel. Last updated: 2026-04-07*

View File

@@ -0,0 +1,134 @@
#!/usr/bin/env python3
"""Bezalel Deep Self-Awareness Topology Scanner"""
import json
import os
import subprocess
import sys
from datetime import datetime, timezone
from pathlib import Path
OUT_DIR = Path("/root/wizards/bezalel/.topology")
OUT_DIR.mkdir(exist_ok=True)
def shell(cmd, timeout=30):
try:
r = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
return r.stdout.strip()
except Exception as e:
return str(e)
def write(name, content):
(OUT_DIR / f"{name}.txt").write_text(content)
# Timestamp
timestamp = datetime.now(timezone.utc).isoformat()
# 1. System Identity
system = f"""BEZALEL SYSTEM TOPOLOGY SCAN
Generated: {timestamp}
Hostname: {shell('hostname')}
User: {shell('whoami')}
Home: {os.path.expanduser('~')}
"""
write("00_system_identity", system)
# 2. OS & Hardware
os_info = shell("cat /etc/os-release")
kernel = shell("uname -a")
cpu = shell("nproc") + " cores\n" + shell("cat /proc/cpuinfo | grep 'model name' | head -1")
mem = shell("free -h")
disk = shell("df -h")
write("01_os_hardware", f"OS:\n{os_info}\n\nKernel:\n{kernel}\n\nCPU:\n{cpu}\n\nMemory:\n{mem}\n\nDisk:\n{disk}")
# 3. Network
net_interfaces = shell("ip addr")
net_routes = shell("ip route")
listening = shell("ss -tlnp")
public_ip = shell("curl -s ifconfig.me")
write("02_network", f"Interfaces:\n{net_interfaces}\n\nRoutes:\n{net_routes}\n\nListening ports:\n{listening}\n\nPublic IP: {public_ip}")
# 4. Services & Processes
services = shell("systemctl list-units --type=service --state=running --no-pager --no-legend 2>/dev/null | head -30")
processes = shell("ps aux | grep -E 'hermes|gitea|evennia|python' | grep -v grep")
write("03_services", f"Running services:\n{services}\n\nKey processes:\n{processes}")
# 5. Cron & Automation
cron = shell("crontab -l 2>/dev/null")
write("04_automation", f"Crontab:\n{cron}")
# 6. Storage Topology
bezalel_tree = shell("find /root/wizards/bezalel -maxdepth 2 -type d | sort")
write("05_storage", f"Bezalel workspace tree (depth 2):\n{bezalel_tree}")
# 7. Git Repositories
git_repos = []
for base in ["/root/wizards/bezalel/hermes", "/root/wizards/bezalel/evennia"]:
p = Path(base)
if (p / ".git").exists():
remote = shell(f"cd {base} && git remote -v")
branch = shell(f"cd {base} && git branch -v")
git_repos.append(f"Repo: {base}\nRemotes:\n{remote}\nBranches:\n{branch}\n{'='*40}")
write("06_git_repos", "\n".join(git_repos))
# 8. Python Dependencies
venv_pip = shell("/root/wizards/bezalel/hermes/venv/bin/pip freeze 2>/dev/null | head -80")
write("07_dependencies", f"Hermes venv packages (top 80):\n{venv_pip}")
# 9. External APIs & Endpoints
apis = """External API Dependencies:
- Gitea: https://forge.alexanderwhitestone.com (source of truth, CI, issues)
- Telegram: webhook-based messaging platform
- Kimi API: https://api.kimi.com/coding/v1 (primary model provider)
- Anthropic API: fallback model provider
- OpenRouter API: secondary fallback model provider
- DigitalOcean: infrastructure hosting (VPS Alpha/Beta)
"""
write("08_external_apis", apis)
# 10. Fleet Topology
fleet = """FLEET TOPOLOGY
- Alpha: 143.198.27.163 (Gitea + Ezra)
- Beta: 104.131.15.18 (Bezalel, current host)
- No SSH from Alpha to Beta
- Gitea Actions runner: bezalel-vps-runner on Beta (host mode)
"""
write("09_fleet_topology", fleet)
# 11. Evennia Topology
evennia = """EVENNIA MIND PALACE SETUP
- Location: /root/wizards/bezalel/evennia/bezalel_world/
- Server name: bezalel_world
- Custom typeclasses: PalaceRoom, MemoryObject
- Custom commands: CmdPalaceSearch (palace/search, palace/recall, palace/file, palace/status)
- Batch builder: world/batch_cmds_palace.ev
- Bridge script: /root/wizards/bezalel/evennia/palace_search.py
"""
write("10_evennia_topology", evennia)
# 12. MemPalace Topology
mempalace = f"""MEMPALACE CONFIGURATION
- Palace path: /root/wizards/bezalel/.mempalace/palace
- Identity: /root/.mempalace/identity.txt
- Config: /root/wizards/bezalel/mempalace.yaml
- Nightly re-mine: 03:00 UTC via /root/wizards/bezalel/mempalace_nightly.sh
- Miner binary: /root/wizards/bezalel/hermes/venv/bin/mempalace
- Current status: {shell('/root/wizards/bezalel/hermes/venv/bin/mempalace --palace /root/wizards/bezalel/.mempalace/palace status 2>/dev/null')}
"""
write("11_mempalace_topology", mempalace)
# 13. Active Blockers & Health
health = f"""ACTIVE OPERATIONAL STATE
- Hermes gateway: {shell("ps aux | grep 'hermes gateway run' | grep -v grep | awk '{print $11}'")}
- Gitea runner: {shell("ps aux | grep 'act_runner' | grep -v grep | awk '{print $11}'")}
- Nightly watch: /root/wizards/bezalel/nightly_watch.py (02:00 UTC)
- MemPalace re-mine: /root/wizards/bezalel/mempalace_nightly.sh (03:00 UTC)
- Disk usage: {shell("df -h / | tail -1")}
- Load average: {shell("uptime")}
"""
write("12_operational_health", health)
print(f"Topology scan complete. {len(list(OUT_DIR.glob('*.txt')))} files written to {OUT_DIR}")

View File

@@ -0,0 +1,335 @@
# Memory Architecture Guide
Developer-facing guide to the Hermes Agent memory system. Covers all four memory tiers, data lifecycle, security guarantees, and extension points.
## Overview
Hermes has four distinct memory systems, each serving a different purpose:
| Tier | System | Scope | Cost | Persistence |
|------|--------|-------|------|-------------|
| 1 | **Built-in Memory** (MEMORY.md / USER.md) | Current session, curated facts | ~1,300 tokens fixed per session | File-backed, cross-session |
| 2 | **Session Search** (FTS5) | All past conversations | On-demand (search + summarize) | SQLite (state.db) |
| 3 | **Skills** (procedural memory) | How to do specific tasks | Loaded on match only | File-backed (~/.hermes/skills/) |
| 4 | **External Providers** (plugins) | Deep persistent knowledge | Provider-dependent | Provider-specific |
All four tiers operate independently. Built-in memory is always active. The others are opt-in or on-demand.
## Tier 1: Built-in Memory (MEMORY.md / USER.md)
### File Layout
```
~/.hermes/memories/
├── MEMORY.md — Agent's notes (environment facts, conventions, lessons learned)
└── USER.md — User profile (preferences, communication style, identity)
```
Profile-aware: when running under a profile (`hermes -p coder`), the memories directory resolves to `~/.hermes/profiles/<name>/memories/`.
### Frozen Snapshot Pattern
This is the most important architectural decision in the memory system.
1. **Session start:** `MemoryStore.load_for_prompt()` reads both files from disk, parses entries delimited by `§` (section sign), and injects them into the system prompt as a frozen block.
2. **During session:** The `memory` tool writes to disk immediately (durable), but does **not** update the system prompt. This preserves the LLM's prefix cache for the entire session.
3. **Next session:** The snapshot refreshes from disk.
**Why frozen?** System prompt changes invalidate the KV cache on every API call. With a ~30K token system prompt, that's expensive. Freezing memory at session start means the cache stays warm for the entire conversation. The tradeoff: memory writes made mid-session don't take effect until next session. Tool responses show the live state so the agent can verify writes succeeded.
### Character Limits
| Store | Default Limit | Approx Tokens | Typical Entries |
|-------|--------------|---------------|-----------------|
| MEMORY.md | 2,200 chars | ~800 | 8-15 |
| USER.md | 1,375 chars | ~500 | 5-10 |
Limits are in characters (not tokens) because character counts are model-independent. Configurable in `config.yaml`:
```yaml
memory:
memory_char_limit: 2200
user_char_limit: 1375
```
### Entry Format
Entries are separated by `\n§\n`. Each entry can be multiline. Example MEMORY.md:
```
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop
§
Project ~/code/api uses Go 1.22, chi router, sqlc. Tests: 'make test'
§
Staging server 10.0.1.50 uses SSH port 2222, key at ~/.ssh/staging_ed25519
```
### Tool Interface
The `memory` tool (defined in `tools/memory_tool.py`) supports:
- **`add`** — Append new entry. Rejects exact duplicates.
- **`replace`** — Find entry by unique substring (`old_text`), replace with `content`.
- **`remove`** — Find entry by unique substring, delete it.
- **`read`** — Return current entries from disk (live state, not frozen snapshot).
Substring matching: `old_text` must match exactly one entry. If it matches multiple, the tool returns an error asking for more specificity.
### Security Scanning
Every memory entry is scanned against `_MEMORY_THREAT_PATTERNS` before acceptance:
- Prompt injection patterns (`ignore previous instructions`, `you are now...`)
- Credential exfiltration (`curl`/`wget` with env vars, `.env` file reads)
- SSH backdoor attempts (`authorized_keys`, `.ssh` writes)
- Invisible Unicode characters (zero-width spaces, BOM)
Matches are rejected with an error message. Source: `_scan_memory_content()` in `tools/memory_tool.py`.
### Code Path
```
agent/prompt_builder.py
└── assembles system prompt pieces
└── MemoryStore.load_for_prompt() → frozen snapshot injection
tools/memory_tool.py
├── MemoryStore class (file I/O, locking, parsing)
├── memory_tool() function (add/replace/remove/read dispatch)
└── _scan_memory_content() (threat scanning)
hermes_cli/memory_setup.py
└── Interactive first-run memory setup
```
## Tier 2: Session Search (FTS5)
### How It Works
1. Every CLI and gateway session stores full message history in SQLite (`~/.hermes/state.db`)
2. The `messages_fts` FTS5 virtual table enables fast full-text search
3. The `session_search` tool finds relevant messages, groups by session, loads top N
4. Each matching session is summarized by Gemini Flash (auxiliary LLM, not main model)
5. Summaries are returned to the main agent as context
### Why Gemini Flash for Summarization
Raw session transcripts can be 50K+ chars. Feeding them to the main model wastes context window and tokens. Gemini Flash is fast, cheap, and good enough for "extract the relevant bits" summarization. Same pattern used by `web_extract`.
### Schema
```sql
-- Core tables
sessions (id, source, user_id, model, system_prompt, parent_session_id, ...)
messages (id, session_id, role, content, tool_name, timestamp, ...)
-- Full-text search
messages_fts -- FTS5 virtual table on messages.content
-- Schema tracking
schema_version
```
WAL mode for concurrent readers + one writer (gateway multi-platform support).
### Session Lineage
When context compression triggers a session split, `parent_session_id` chains the old and new sessions. This lets session search follow the thread across compression boundaries.
### Code Path
```
tools/session_search_tool.py
├── FTS5 query against messages_fts
├── Groups results by session_id
├── Loads top N sessions (MAX_SESSION_CHARS = 100K per session)
├── Sends to Gemini Flash via auxiliary_client.async_call_llm()
└── Returns per-session summaries
hermes_state.py (SessionDB class)
├── SQLite WAL mode database
├── FTS5 triggers for message insert/update/delete
└── Session CRUD operations
```
### Memory vs Session Search
| | Memory | Session Search |
|---|--------|---------------|
| **Capacity** | ~1,300 tokens total | Unlimited (all stored sessions) |
| **Latency** | Instant (in system prompt) | Requires FTS query + LLM call |
| **When to use** | Critical facts always in context | "What did we discuss about X?" |
| **Management** | Agent-curated | Automatic |
| **Token cost** | Fixed per session | On-demand per search |
## Tier 3: Skills (Procedural Memory)
### What Skills Are
Skills capture **how to do a specific type of task** based on proven experience. Where memory is broad and declarative, skills are narrow and actionable.
A skill is a directory with a `SKILL.md` (markdown instructions) and optional supporting files:
```
~/.hermes/skills/
├── my-skill/
│ ├── SKILL.md — Instructions, steps, pitfalls
│ ├── references/ — API docs, specs
│ ├── templates/ — Code templates, config files
│ ├── scripts/ — Helper scripts
│ └── assets/ — Images, data files
```
### How Skills Load
At the start of each turn, the agent's system prompt includes available skills. When a skill matches the current task, the agent loads it with `skill_view(name)` and follows its instructions. Skills are **not** injected wholesale — they're loaded on demand to preserve context window.
### Skill Lifecycle
1. **Creation:** After a complex task (5+ tool calls), the agent offers to save the approach as a skill using `skill_manage(action='create')`.
2. **Usage:** On future matching tasks, the agent loads the skill with `skill_view(name)`.
3. **Maintenance:** If a skill is outdated or incomplete when used, the agent patches it immediately with `skill_manage(action='patch')`.
4. **Deletion:** Obsolete skills are removed with `skill_manage(action='delete')`.
### Skills vs Memory
| | Memory | Skills |
|---|--------|--------|
| **Format** | Free-text entries | Structured markdown (steps, pitfalls, examples) |
| **Scope** | Facts and preferences | Procedures and workflows |
| **Loading** | Always in system prompt | On-demand when matched |
| **Size** | ~1,300 tokens total | Variable (loaded individually) |
### Code Path
```
tools/skill_manager_tool.py — Create, edit, patch, delete skills
agent/skill_commands.py — Slash commands for skill management
skills_hub.py — Browse, search, install skills from hub
```
## Tier 4: External Memory Providers
### Plugin Architecture
```
plugins/memory/
├── __init__.py — Provider registry and base interface
├── honcho/ — Dialectic Q&A, cross-session user modeling
├── openviking/ — Knowledge graph memory
├── mem0/ — Semantic memory with auto-extraction
├── hindsight/ — Retrospective memory analysis
├── holographic/ — Distributed holographic memory
├── retaindb/ — Vector-based retention
├── byterover/ — Byte-level memory compression
└── supermemory/ — Cloud-hosted semantic memory
```
Only one external provider can be active at a time. Built-in memory (Tier 1) always runs alongside it.
### Integration Points
When a provider is active, Hermes:
1. Injects provider context into the system prompt
2. Prefetches relevant memories before each turn (background, non-blocking)
3. Syncs conversation turns to the provider after each response
4. Extracts memories on session end (for providers that support it)
5. Mirrors built-in memory writes to the provider
6. Adds provider-specific tools for search and management
### Configuration
```yaml
memory:
provider: openviking # or honcho, mem0, hindsight, etc.
```
Setup: `hermes memory setup` (interactive picker).
## Data Lifecycle
```
Session Start
├── Load MEMORY.md + USER.md from disk → frozen snapshot in system prompt
├── Load skills catalog (names + descriptions)
├── Initialize session search (SQLite connection)
└── Initialize external provider (if configured)
Each Turn
├── Agent sees frozen memory in system prompt
├── Agent can call memory tool → writes to disk, returns live state
├── Agent can call session_search → FTS5 + Gemini Flash summarization
├── Agent can load skills → reads SKILL.md from disk
└── External provider prefetches context (if active)
Session End
├── All memory writes already on disk (immediate persistence)
├── Session transcript saved to SQLite (messages + FTS5 index)
├── External provider extracts final memories (if supported)
└── Skill updates persisted (if any were patched)
```
## Privacy and Data Locality
| Component | Location | Network |
|-----------|----------|---------|
| MEMORY.md / USER.md | `~/.hermes/memories/` | Local only |
| Session DB | `~/.hermes/state.db` | Local only |
| Skills | `~/.hermes/skills/` | Local only |
| External provider | Provider-dependent | Provider API calls |
Built-in memory (Tiers 1-3) never leaves the machine. External providers (Tier 4) send data to the configured provider by design. The agent logs all provider API calls in the session transcript for auditability.
## Configuration Reference
```yaml
# ~/.hermes/config.yaml
memory:
memory_enabled: true # Enable MEMORY.md
user_profile_enabled: true # Enable USER.md
memory_char_limit: 2200 # MEMORY.md char limit (~800 tokens)
user_char_limit: 1375 # USER.md char limit (~500 tokens)
nudge_interval: 10 # Turns between memory nudge reminders
provider: null # External provider name (null = disabled)
```
Environment variables (in `~/.hermes/.env`):
- Provider-specific API keys (e.g., `HONCHO_API_KEY`, `MEM0_API_KEY`)
## Troubleshooting
### Memory not appearing in system prompt
- Check `~/.hermes/memories/MEMORY.md` exists and has content
- Verify `memory.memory_enabled: true` in config
- Check for file lock issues (WAL mode, concurrent access)
### Memory writes not taking effect
- Writes are durable to disk immediately but frozen in system prompt until next session
- Tool response shows live state — verify the write succeeded there
- Start a new session to see the updated snapshot
### Session search returns nothing
- Verify `state.db` has sessions: `sqlite3 ~/.hermes/state.db "SELECT count(*) FROM sessions"`
- Check FTS5 index: `sqlite3 ~/.hermes/state.db "SELECT count(*) FROM messages_fts"`
- Ensure auxiliary LLM (Gemini Flash) is configured and reachable
### Skills not loading
- Check `~/.hermes/skills/` directory exists
- Verify SKILL.md has valid frontmatter (name, description)
- Skills load by name match — check the skill name matches what the agent expects
### External provider errors
- Check API key in `~/.hermes/.env`
- Verify provider is installed: `pip install <provider-package>`
- Run `hermes memory status` for diagnostic info

335
docs/memory-architecture.md Normal file
View File

@@ -0,0 +1,335 @@
# Memory Architecture Guide
How Hermes Agent remembers things across sessions — the stores, the tools, the data flow, and how to configure it all.
## Overview
Hermes has a multi-layered memory system. It is not one thing — it is several independent systems that complement each other:
1. **Persistent Memory** (MEMORY.md / USER.md) — bounded, curated notes injected into every system prompt
2. **Session Search** — full-text search across all past conversation transcripts
3. **Skills** — procedural memory: reusable workflows stored as SKILL.md files
4. **External Memory Providers** — optional plugins (Honcho, Holographic, Mem0, etc.) for deeper recall
All built-in memory lives on disk under `~/.hermes/` (or `$HERMES_HOME`). No memory data leaves the machine unless you explicitly configure an external cloud provider.
## Memory Types in Detail
### 1. Persistent Memory (MEMORY.md and USER.md)
The core memory system. Two files in `~/.hermes/memories/`:
| File | Purpose | Default Char Limit |
|------|---------|--------------------|
| `MEMORY.md` | Agent's personal notes — environment facts, project conventions, tool quirks, lessons learned | 2,200 chars (~800 tokens) |
| `USER.md` | User profile — name, preferences, communication style, pet peeves | 1,375 chars (~500 tokens) |
**How it works:**
- Loaded from disk at session start and injected into the system prompt as a frozen snapshot
- The agent uses the `memory` tool to add, replace, or remove entries during a session
- Mid-session writes go to disk immediately (durable) but do NOT update the system prompt — this preserves the LLM's prefix cache for performance
- The snapshot refreshes on the next session start
- Entries are delimited by `§` (section sign) and can be multiline
**System prompt appearance:**
```
══════════════════════════════════════════════
MEMORY (your personal notes) [67% — 1,474/2,200 chars]
══════════════════════════════════════════════
User's project is a Rust web service at ~/code/myapi using Axum + SQLx
§
This machine runs Ubuntu 22.04, has Docker and Podman installed
§
User prefers concise responses, dislikes verbose explanations
```
**Memory tool actions:**
- `add` — append a new entry (rejected if it would exceed the char limit)
- `replace` — find an entry by substring match and replace it
- `remove` — find an entry by substring match and delete it
Substring matching means you only need a unique fragment of the entry, not the full text. If the fragment matches multiple entries, the tool returns an error asking for a more specific match.
### 2. Session Search
Cross-session conversation recall via SQLite FTS5 full-text search.
- All CLI and messaging sessions are stored in `~/.hermes/state.db`
- The `session_search` tool finds relevant past conversations by keyword
- Top matching sessions are summarized by Gemini Flash (cheap, fast) before being returned to the main model
- Returns focused summaries, not raw transcripts
**When to use session_search vs. memory:**
| Feature | Persistent Memory | Session Search |
|---------|------------------|----------------|
| Capacity | ~3,575 chars total | Unlimited (all sessions) |
| Speed | Instant (in system prompt) | Requires search + LLM summarization |
| Use case | Key facts always in context | "What did we discuss about X last week?" |
| Management | Manually curated by the agent | Automatic — all sessions stored |
| Token cost | Fixed per session (~1,300 tokens) | On-demand (searched when needed) |
**Rule of thumb:** Memory is for facts that should *always* be available. Session search is for recalling specific past conversations on demand. Don't save task progress or session outcomes to memory — use session_search to find those.
### 3. Skills (Procedural Memory)
Skills are reusable workflows stored as `SKILL.md` files in `~/.hermes/skills/` (and optionally external skill directories).
- Organized by category: `skills/github/github-pr-workflow/SKILL.md`
- YAML frontmatter with name, description, version, platform restrictions
- Progressive disclosure: metadata shown in skill list, full content loaded on demand via `skill_view`
- The agent creates skills proactively after complex tasks (5+ tool calls) using the `skill_manage` tool
- Skills can be patched when found outdated — stale skills are a liability
Skills are *not* injected into the system prompt by default. The agent sees a compact index of available skills and loads them on demand. This keeps the prompt lean while giving access to deep procedural knowledge.
**Skills vs. Memory:**
- **Memory:** compact facts ("User's project uses Go 1.22 with chi router")
- **Skills:** detailed procedures ("How to deploy the staging server: step 1, step 2, ...")
### 4. External Memory Providers
Optional plugins that add deeper, structured memory alongside the built-in system. Only one external provider can be active at a time.
| Provider | Storage | Key Feature |
|----------|---------|-------------|
| Honcho | Cloud | Dialectic user modeling with semantic search |
| OpenViking | Self-hosted | Filesystem-style knowledge hierarchy |
| Mem0 | Cloud | Server-side LLM fact extraction |
| Hindsight | Cloud/Local | Knowledge graph with entity resolution |
| Holographic | Local SQLite | HRR algebraic reasoning + trust scoring |
| RetainDB | Cloud | Hybrid search with delta compression |
| ByteRover | Local/Cloud | Hierarchical knowledge tree with CLI |
| Supermemory | Cloud | Context fencing + session graph ingest |
External providers run **alongside** built-in memory (never replacing it). They receive hooks for:
- System prompt injection (provider context)
- Pre-turn memory prefetch
- Post-turn conversation sync
- Session-end extraction
- Built-in memory write mirroring
Setup: `hermes memory setup` or set `memory.provider` in `~/.hermes/config.yaml`.
See `website/docs/user-guide/features/memory-providers.md` for full provider details.
## How the Systems Interact
```
Session Start
|
+--> Load MEMORY.md + USER.md from disk --> frozen snapshot into system prompt
+--> Provider: system_prompt_block() --> injected into system prompt
+--> Skills index --> injected into system prompt (compact metadata only)
|
v
Each Turn
|
+--> Provider: prefetch(query) --> relevant recalled context
+--> Agent sees: system prompt (memory + provider context + skills index)
+--> Agent can call: memory tool, session_search tool, skill tools, provider tools
|
v
After Each Response
|
+--> Provider: sync_turn(user, assistant) --> persist conversation
|
v
Periodic (every N turns, default 10)
|
+--> Memory nudge: agent prompted to review and update memory
|
v
Session End / Compression
|
+--> Memory flush: agent saves important facts before context is discarded
+--> Provider: on_session_end(messages) --> final extraction
+--> Provider: on_pre_compress(messages) --> save insights before compression
```
## Best Practices
### What to Save
Save proactively — don't wait for the user to ask:
- **User preferences:** "I prefer TypeScript over JavaScript" → `user` target
- **Corrections:** "Don't use sudo for Docker, I'm in the docker group" → `memory` target
- **Environment facts:** "This server runs Debian 12 with PostgreSQL 16" → `memory` target
- **Conventions:** "Project uses tabs, 120-char lines, Google docstrings" → `memory` target
- **Explicit requests:** "Remember that my API key rotation is monthly" → `memory` target
### What NOT to Save
- **Task progress or session outcomes** — use session_search to recall these
- **Trivially re-discoverable facts** — "Python 3.12 supports f-strings" (web search this)
- **Raw data dumps** — large code blocks, log files, data tables
- **Session-specific ephemera** — temporary file paths, one-off debugging context
- **Content already in SOUL.md or AGENTS.md** — those are already in context
### Writing Good Entries
Compact, information-dense entries work best:
```
# Good — packs multiple related facts
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop and Podman. Shell: zsh. Editor: VS Code with Vim bindings.
# Good — specific, actionable convention
Project ~/code/api uses Go 1.22, sqlc for DB, chi router. Tests: make test. CI: GitHub Actions.
# Bad — too vague
User has a project.
# Bad — too verbose
On January 5th, 2026, the user asked me to look at their project which is
located at ~/code/api. I discovered it uses Go version 1.22 and...
```
### Capacity Management
When memory is above 80% capacity (visible in the system prompt header), consolidate before adding. Merge related entries into shorter, denser versions. The tool will reject additions that would exceed the limit — use `replace` to consolidate first.
Priority order for what stays in memory:
1. User preferences and corrections (highest — prevents repeated steering)
2. Environment facts and project conventions
3. Tool quirks and workarounds
4. Lessons learned (lowest — can often be rediscovered)
### Memory Nudge
Every N turns (default: 10), the agent receives a nudge prompting it to review and update its memory. This is a lightweight prompt injected into the conversation — not a separate API call. The agent can choose to update memory or skip if nothing has changed.
## Privacy and Data Locality
**Built-in memory is fully local.** MEMORY.md and USER.md are plain text files in `~/.hermes/memories/`. No network calls are made in the memory read/write path. The memory tool scans entries for prompt injection and exfiltration patterns before accepting them.
**Session search is local.** The SQLite database (`~/.hermes/state.db`) stays on disk. FTS5 search is a local operation. However, the summarization step uses Gemini Flash (via the auxiliary LLM client) — conversation snippets are sent to Google's API for summarization. If this is a concern, session_search can be disabled.
**External providers may send data off-machine.** Cloud providers (Honcho, Mem0, RetainDB, Supermemory) send data to their respective APIs. Self-hosted providers (OpenViking, Hindsight local mode, Holographic, ByteRover local mode) keep everything on your machine. Check the provider's documentation for specifics.
**Security scanning.** All content written to memory (via the `memory` tool) is scanned for:
- Prompt injection patterns ("ignore previous instructions", role hijacking, etc.)
- Credential exfiltration attempts (curl/wget with secrets, reading .env files)
- SSH backdoor patterns
- Invisible unicode characters (used for steganographic injection)
Blocked content is rejected with a descriptive error message.
## Configuration
In `~/.hermes/config.yaml`:
```yaml
memory:
# Enable/disable the two built-in memory stores
memory_enabled: true # MEMORY.md
user_profile_enabled: true # USER.md
# Character limits (not tokens — model-independent)
memory_char_limit: 2200 # ~800 tokens at 2.75 chars/token
user_char_limit: 1375 # ~500 tokens at 2.75 chars/token
# External memory provider (empty string = built-in only)
# Options: "honcho", "openviking", "mem0", "hindsight",
# "holographic", "retaindb", "byterover", "supermemory"
provider: ""
```
Additional settings are read from `run_agent.py` defaults:
| Setting | Default | Description |
|---------|---------|-------------|
| `nudge_interval` | 10 | Turns between memory review nudges (0 = disabled) |
| `flush_min_turns` | 6 | Minimum user turns before memory flush on session end/compression (0 = never flush) |
These are set under the `memory` key in config.yaml:
```yaml
memory:
nudge_interval: 10
flush_min_turns: 6
```
### Disabling Memory
To disable memory entirely, set both to false:
```yaml
memory:
memory_enabled: false
user_profile_enabled: false
```
The `memory` tool will not appear in the tool list, and no memory blocks are injected into the system prompt.
You can also disable memory per-invocation with `skip_memory=True` in the AIAgent constructor (used by cron jobs and flush agents).
## File Locations
```
~/.hermes/
├── memories/
│ ├── MEMORY.md # Agent's persistent notes
│ ├── USER.md # User profile
│ ├── MEMORY.md.lock # File lock (auto-created)
│ └── USER.md.lock # File lock (auto-created)
├── state.db # SQLite session store (FTS5)
├── config.yaml # Memory config + provider selection
└── .env # API keys for external providers
```
All paths respect `$HERMES_HOME` — if you use Hermes profiles, each profile has its own isolated memory directory.
## Troubleshooting
### "Memory full" errors
The tool returns an error when adding would exceed the character limit. The response includes current entries so the agent can consolidate. Fix by:
1. Replacing multiple related entries with one denser entry
2. Removing entries that are no longer relevant
3. Increasing `memory_char_limit` in config (at the cost of larger system prompts)
### Stale memory entries
If the agent seems to have outdated information:
- Check `~/.hermes/memories/MEMORY.md` directly — you can edit it by hand
- The frozen snapshot pattern means changes only take effect on the next session start
- If the agent wrote something wrong mid-session, it persists on disk but won't affect the current session's system prompt
### Memory not appearing in system prompt
- Verify `memory_enabled: true` in config.yaml
- Check that `~/.hermes/memories/MEMORY.md` exists and has content
- The file might be empty if all entries were removed — add entries with the `memory` tool
### Session search returns no results
- Session search requires sessions to be stored in `state.db` — new installations have no history
- FTS5 indexes are built automatically but may lag behind on very large databases
- The summarization step requires the auxiliary LLM client to be configured (API key for Gemini Flash)
### Skill drift
Skills that haven't been updated can become wrong or incomplete. The agent is prompted to patch skills when it finds them outdated during use (`skill_manage(action='patch')`). If you notice stale skills:
- Use `/skills` to browse and review installed skills
- Delete or update skills in `~/.hermes/skills/` directly
- The agent creates skills after complex tasks — review and prune periodically
### Provider not activating
- Run `hermes memory status` to check provider state
- Verify the provider plugin is installed in `~/.hermes/plugins/memory/`
- Check that required API keys are set in `~/.hermes/.env`
- Start a new session after changing provider config — existing sessions use the old provider
### Concurrent write conflicts
The memory tool uses file locking (`fcntl.flock`) and atomic file replacement (`os.replace`) to handle concurrent writes from multiple sessions. If you see corrupted memory files:
- Check for stale `.lock` files in `~/.hermes/memories/`
- Restart any hung Hermes processes
- The atomic write pattern means readers always see either the old or new file — never a partial write

View File

@@ -820,10 +820,11 @@ def resolve_provider(
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
"go": "opencode-go", "opencode-go-sub": "opencode-go",
"kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
# Local server aliases — route through the generic custom provider
# Local server aliases
"lmstudio": "custom", "lm-studio": "custom", "lm_studio": "custom",
"ollama": "custom", "vllm": "custom", "llamacpp": "custom",
"vllm": "custom", "llamacpp": "custom",
"llama.cpp": "custom", "llama-cpp": "custom",
"ollama": "ollama",
}
normalized = _PROVIDER_ALIASES.get(normalized, normalized)

View File

@@ -2126,9 +2126,8 @@ def _model_flow_kimi(config, current_model=""):
# Step 3: Model selection — show appropriate models for the endpoint
if is_coding_plan:
# Coding Plan models (kimi-for-coding first)
# Coding Plan models (kimi-k2.5 first)
model_list = [
"kimi-for-coding",
"kimi-k2.5",
"kimi-k2-thinking",
"kimi-k2-thinking-turbo",
@@ -4206,7 +4205,7 @@ For more help on a command:
)
chat_parser.add_argument(
"--provider",
choices=["auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot", "anthropic", "gemini", "huggingface", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode"],
choices=["auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot", "anthropic", "gemini", "huggingface", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "ollama"],
default=None,
help="Inference provider (default: auto)"
)

View File

@@ -130,7 +130,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"glm-4.5-flash",
],
"kimi-coding": [
"kimi-for-coding",
"kimi-k2.5",
"kimi-k2-thinking",
"kimi-k2-thinking-turbo",
@@ -568,7 +567,7 @@ def list_available_providers() -> list[dict[str, str]]:
"gemini", "huggingface",
"zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic", "alibaba",
"opencode-zen", "opencode-go",
"ai-gateway", "deepseek", "custom",
"ai-gateway", "deepseek", "ollama", "custom",
]
# Build reverse alias map
aliases_for: dict[str, list[str]] = {}

View File

@@ -78,7 +78,7 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
extra_env_vars=("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"),
base_url_env_var="GLM_BASE_URL",
),
"kimi-for-coding": HermesOverlay(
"kimi-k2.5": HermesOverlay(
transport="openai_chat",
base_url_env_var="KIMI_BASE_URL",
),
@@ -162,10 +162,10 @@ ALIASES: Dict[str, str] = {
"z.ai": "zai",
"zhipu": "zai",
# kimi-for-coding (models.dev ID)
"kimi": "kimi-for-coding",
"kimi-coding": "kimi-for-coding",
"moonshot": "kimi-for-coding",
# kimi-k2.5 (models.dev ID)
"kimi": "kimi-k2.5",
"kimi-coding": "kimi-k2.5",
"moonshot": "kimi-k2.5",
# minimax-cn
"minimax-china": "minimax-cn",
@@ -376,7 +376,7 @@ LABELS: Dict[str, str] = {
"github-copilot": "GitHub Copilot",
"anthropic": "Anthropic",
"zai": "Z.AI / GLM",
"kimi-for-coding": "Kimi / Moonshot",
"kimi-k2.5": "Kimi / Moonshot",
"minimax": "MiniMax",
"minimax-cn": "MiniMax (China)",
"deepseek": "DeepSeek",

View File

@@ -0,0 +1,248 @@
"""
MemPalace Portal — Hybrid Memory Provider.
Bridges the local Holographic fact store with the fleet-wide MemPalace vector database.
Implements smart context compression for token efficiency.
"""
import json
import logging
import os
import re
import requests
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
# Import Holographic components if available
try:
from plugins.memory.holographic.store import MemoryStore
from plugins.memory.holographic.retrieval import FactRetriever
HAS_HOLOGRAPHIC = True
except ImportError:
HAS_HOLOGRAPHIC = False
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Tool Schemas
# ---------------------------------------------------------------------------
MEMPALACE_SCHEMA = {
"name": "mempalace",
"description": (
"Search or record memories in the shared fleet vector database. "
"Use this for long-term, high-volume memory across the entire fleet."
),
"parameters": {
"type": "object",
"properties": {
"action": {"type": "string", "enum": ["search", "record", "wings"]},
"query": {"type": "string", "description": "Search query."},
"text": {"type": "string", "description": "Memory text to record."},
"room": {"type": "string", "description": "Target room (e.g., forge, hermes, nexus)."},
"n_results": {"type": "integer", "default": 5},
},
"required": ["action"],
},
}
FACT_STORE_SCHEMA = {
"name": "fact_store",
"description": (
"Structured local fact storage. Use for durable facts about people, projects, and decisions."
),
"parameters": {
"type": "object",
"properties": {
"action": {"type": "string", "enum": ["add", "search", "probe", "reason", "update", "remove"]},
"content": {"type": "string"},
"query": {"type": "string"},
"entity": {"type": "string"},
"fact_id": {"type": "integer"},
},
"required": ["action"],
},
}
# ---------------------------------------------------------------------------
# Provider Implementation
# ---------------------------------------------------------------------------
class MemPalacePortalProvider(MemoryProvider):
"""Hybrid Fleet Vector + Local Structured memory provider."""
def __init__(self, config: dict | None = None):
self._config = config or {}
self._api_url = os.environ.get("MEMPALACE_API_URL", "http://127.0.0.1:7771")
self._hologram_store = None
self._hologram_retriever = None
self._session_id = None
@property
def name(self) -> str:
return "mempalace"
def is_available(self) -> bool:
# Always available if we can reach the API or have Holographic
return True
def initialize(self, session_id: str, **kwargs) -> None:
self._session_id = session_id
hermes_home = kwargs.get("hermes_home")
if HAS_HOLOGRAPHIC and hermes_home:
db_path = os.path.join(hermes_home, "memory_store.db")
try:
self._hologram_store = MemoryStore(db_path=db_path)
self._hologram_retriever = FactRetriever(store=self._hologram_store)
logger.info("Holographic store initialized as local portal layer.")
except Exception as e:
logger.error(f"Failed to init Holographic layer: {e}")
def system_prompt_block(self) -> str:
status = "Active (Fleet Portal)"
if self._hologram_store:
status += " + Local Hologram"
return (
f"# MemPalace Portal\n"
f"Status: {status}.\n"
"You have access to the shared fleet vector database (mempalace) and local structured facts (fact_store).\n"
"Use mempalace for semantic fleet-wide recall. Use fact_store for precise local knowledge."
)
def prefetch(self, query: str, *, session_id: str = "") -> str:
if not query:
return ""
context_blocks = []
# 1. Fleet Search (MemPalace)
try:
res = requests.get(f"{self._api_url}/search", params={"q": query, "n": 3}, timeout=2)
if res.ok:
data = res.json()
memories = data.get("memories", [])
if memories:
block = "## Fleet Memories (MemPalace)\n"
for m in memories:
block += f"- {m['text']}\n"
context_blocks.append(block)
except Exception:
pass
# 2. Local Probe (Holographic)
if self._hologram_retriever:
try:
# Extract entities from query to probe
entities = re.findall(r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)\b', query)
facts = []
for ent in entities:
results = self._hologram_retriever.probe(ent, limit=3)
facts.extend(results)
if facts:
block = "## Local Facts (Hologram)\n"
seen = set()
for f in facts:
if f['content'] not in seen:
block += f"- {f['content']}\n"
seen.add(f['content'])
context_blocks.append(block)
except Exception:
pass
return "\n\n".join(context_blocks)
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
# Record to Fleet Palace
try:
payload = {
"text": f"User: {user_content}\nAssistant: {assistant_content}",
"room": "hermes_sync",
"metadata": {"session_id": session_id}
}
requests.post(f"{self._api_url}/record", json=payload, timeout=2)
except Exception:
pass
def on_pre_compress(self, messages: List[Dict[str, Any]]) -> str:
"""Token Efficiency: Summarize and archive before context is lost."""
if not messages:
return ""
# Extract key facts for Hologram
if self._hologram_store:
# Simple heuristic: look for \"I prefer\", \"The project uses\", etc.
for msg in messages:
if msg.get(\"role\") == \"user\":
content = msg.get(\"content\", \"\")
if \"prefer\" in content.lower() or \"use\" in content.lower():
try:
self._hologram_store.add_fact(content[:200], category=\"user_pref\")
except Exception:
pass
# Archive session summary to MemPalace
summary_text = f"Session {self._session_id} summary: " + " | ".join([m['content'][:50] for m in messages if m.get('role') == 'user'])
try:
payload = {
"text": summary_text,
"room": "summaries",
"metadata": {"type": "session_summary", "session_id": self._session_id}
}
requests.post(f"{self._api_url}/record", json=payload, timeout=2)
except Exception:
pass
return "Insights archived to MemPalace and Hologram."
def get_tool_schemas(self) -> List[Dict[str, Any]]:
return [MEMPALACE_SCHEMA, FACT_STORE_SCHEMA]
def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
if tool_name == "mempalace":
return self._handle_mempalace(args)
elif tool_name == "fact_store":
return self._handle_fact_store(args)
return json.dumps({"error": f"Unknown tool: {tool_name}"})
def _handle_mempalace(self, args: dict) -> str:
action = args.get("action")
try:
if action == "search":
res = requests.get(f"{self._api_url}/search", params={"q": args["query"], "n": args.get("n_results", 5)}, timeout=10)
return res.text
elif action == "record":
res = requests.post(f"{self._api_url}/record", json={"text": args["text"], "room": args.get("room", "general")}, timeout=10)
return res.text
elif action == "wings":
res = requests.get(f"{self._api_url}/wings", timeout=10)
return res.text
except Exception as e:
return json.dumps({"success": False, "error": str(e)})
return json.dumps({"error": "Invalid action"})
def _handle_fact_store(self, args: dict) -> str:
if not self._hologram_store:
return json.dumps({"error": "Holographic store not initialized locally."})
# Logic similar to holographic plugin
action = args["action"]
try:
if action == "add":
fid = self._hologram_store.add_fact(args["content"])
return json.dumps({"fact_id": fid, "status": "added"})
elif action == "probe":
res = self._hologram_retriever.probe(args["entity"])
return json.dumps({"results": res})
# ... other actions ...
return json.dumps({"status": "ok", "message": f"Action {action} processed (partial impl)"})
except Exception as e:
return json.dumps({"error": str(e)})
def shutdown(self) -> None:
if self._hologram_store:
self._hologram_store.close()
def register(ctx) -> None:
provider = MemPalacePortalProvider()
ctx.register_memory_provider(provider)

View File

@@ -0,0 +1,7 @@
name: mempalace
version: 1.0.0
description: "The Portal: Hybrid Fleet Vector (MemPalace) + Local Structured (Holographic) memory."
dependencies:
- requests
- numpy

View File

@@ -235,7 +235,7 @@ The Hermes Agent framework serves as both the delivery platform and the portfoli
| House | Host | Model / Provider | Gateway Status |
|-------|------|------------------|----------------|
| Ezra | Hermes VPS | `kimi-for-coding` (Kimi K2.5) | API `8658`, webhook `8648` — Active |
| Ezra | Hermes VPS | `kimi-k2.5` (Kimi K2.5) | API `8658`, webhook `8648` — Active |
| Bezalel | Hermes VPS | Claude Opus 4.6 (Anthropic) | Port `8645` — Active |
| Allegro-Primus | Hermes VPS | Kimi K2.5 | Port `8644` — Requires restart |
| Bilbo | External | Gemma 4B (local) | Telegram dual-mode — Active |

View File

@@ -0,0 +1,74 @@
#!/usr/bin/env python3
"""CI check: ensure no duplicate model IDs exist in provider configs.
Catches the class of bugs where a rename introduces a duplicate entry
(e.g. PR #225 kimi-for-coding -> kimi-k2.5 when kimi-k2.5 already existed).
Runtime target: < 2 seconds.
"""
from __future__ import annotations
import sys
from pathlib import Path
# Allow running from repo root
REPO_ROOT = Path(__file__).parent.parent
sys.path.insert(0, str(REPO_ROOT))
def check_openrouter_models() -> list[str]:
"""Check OPENROUTER_MODELS for duplicate model IDs."""
try:
from hermes_cli.models import OPENROUTER_MODELS
except ImportError:
return []
errors = []
seen: dict[str, int] = {}
for i, (model_id, _desc) in enumerate(OPENROUTER_MODELS):
if model_id in seen:
errors.append(
f" OPENROUTER_MODELS: duplicate '{model_id}' "
f"(index {seen[model_id]} and {i})"
)
else:
seen[model_id] = i
return errors
def check_provider_models() -> list[str]:
"""Check _PROVIDER_MODELS for duplicate model IDs within each provider list."""
from hermes_cli.models import _PROVIDER_MODELS
errors = []
for provider, models in _PROVIDER_MODELS.items():
seen: dict[str, int] = {}
for i, model_id in enumerate(models):
if model_id in seen:
errors.append(
f" _PROVIDER_MODELS['{provider}']: duplicate '{model_id}' "
f"(index {seen[model_id]} and {i})"
)
else:
seen[model_id] = i
return errors
def main() -> int:
errors = []
errors.extend(check_openrouter_models())
errors.extend(check_provider_models())
if errors:
print(f"FAIL: {len(errors)} duplicate model(s) found:")
for e in errors:
print(e)
return 1
print("OK: no duplicate model entries")
return 0
if __name__ == "__main__":
raise SystemExit(main())

374
scripts/memory_budget.py Normal file
View File

@@ -0,0 +1,374 @@
#!/usr/bin/env python3
"""Memory Budget Enforcement Tool for hermes-agent.
Checks and enforces character/token budgets on MEMORY.md and USER.md files.
Designed for CI integration, pre-commit hooks, and manual health checks.
Usage:
python scripts/memory_budget.py # Check budget (exit 0/1)
python scripts/memory_budget.py --report # Detailed breakdown
python scripts/memory_budget.py --enforce # Trim entries to fit budget
python scripts/memory_budget.py --hermes-home ~/.hermes # Custom HERMES_HOME
Exit codes:
0 Within budget
1 Over budget (no trimming performed)
2 Entries were trimmed (--enforce was used)
"""
from __future__ import annotations
import argparse
import sys
from dataclasses import dataclass
from pathlib import Path
from typing import List
# ---------------------------------------------------------------------------
# Constants (must stay in sync with tools/memory_tool.py)
# ---------------------------------------------------------------------------
ENTRY_DELIMITER = "\n§\n"
DEFAULT_MEMORY_CHAR_LIMIT = 2200
DEFAULT_USER_CHAR_LIMIT = 1375
WARN_THRESHOLD = 0.80 # alert when >80% of budget used
CHARS_PER_TOKEN = 4 # rough estimate matching agent/model_metadata.py
# ---------------------------------------------------------------------------
# Data structures
# ---------------------------------------------------------------------------
@dataclass
class FileReport:
"""Budget analysis for a single memory file."""
label: str # "MEMORY.md" or "USER.md"
path: Path
exists: bool
char_limit: int
raw_chars: int # raw file size in chars
entry_chars: int # chars after splitting/rejoining entries
entry_count: int
entries: List[str] # individual entry texts
@property
def usage_pct(self) -> float:
if self.char_limit <= 0:
return 0.0
return min(100.0, (self.entry_chars / self.char_limit) * 100)
@property
def estimated_tokens(self) -> int:
return self.entry_chars // CHARS_PER_TOKEN
@property
def over_budget(self) -> bool:
return self.entry_chars > self.char_limit
@property
def warning(self) -> bool:
return self.usage_pct >= (WARN_THRESHOLD * 100)
@property
def remaining_chars(self) -> int:
return max(0, self.char_limit - self.entry_chars)
def _read_entries(path: Path) -> List[str]:
"""Read a memory file and split into entries (matching MemoryStore logic)."""
if not path.exists():
return []
try:
raw = path.read_text(encoding="utf-8")
except (OSError, IOError):
return []
if not raw.strip():
return []
entries = [e.strip() for e in raw.split(ENTRY_DELIMITER)]
return [e for e in entries if e]
def _write_entries(path: Path, entries: List[str]) -> None:
"""Write entries back to a memory file."""
content = ENTRY_DELIMITER.join(entries) if entries else ""
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(content, encoding="utf-8")
def analyze_file(path: Path, label: str, char_limit: int) -> FileReport:
"""Analyze a single memory file against its budget."""
exists = path.exists()
entries = _read_entries(path) if exists else []
raw_chars = path.stat().st_size if exists else 0
joined = ENTRY_DELIMITER.join(entries)
return FileReport(
label=label,
path=path,
exists=exists,
char_limit=char_limit,
raw_chars=raw_chars,
entry_chars=len(joined),
entry_count=len(entries),
entries=entries,
)
def trim_entries(report: FileReport) -> List[str]:
"""Trim oldest entries until the file fits within its budget.
Entries are removed from the front (oldest first) because memory files
append new entries at the end.
"""
entries = list(report.entries)
joined = ENTRY_DELIMITER.join(entries)
while len(joined) > report.char_limit and entries:
entries.pop(0)
joined = ENTRY_DELIMITER.join(entries)
return entries
# ---------------------------------------------------------------------------
# Reporting
# ---------------------------------------------------------------------------
def _bar(pct: float, width: int = 30) -> str:
"""Render a text progress bar."""
filled = int(pct / 100 * width)
bar = "#" * filled + "-" * (width - filled)
return f"[{bar}]"
def print_report(memory: FileReport, user: FileReport, *, verbose: bool = False) -> None:
"""Print a human-readable budget report."""
total_chars = memory.entry_chars + user.entry_chars
total_limit = memory.char_limit + user.char_limit
total_tokens = total_chars // CHARS_PER_TOKEN
total_pct = (total_chars / total_limit * 100) if total_limit > 0 else 0
print("=" * 60)
print(" MEMORY BUDGET REPORT")
print("=" * 60)
print()
for rpt in (memory, user):
status = "OVER " if rpt.over_budget else ("WARN" if rpt.warning else " OK ")
print(f" {rpt.label:12s} {status} {_bar(rpt.usage_pct)} {rpt.usage_pct:5.1f}%")
print(f" {'':12s} {rpt.entry_chars:,}/{rpt.char_limit:,} chars "
f"| {rpt.entry_count} entries "
f"| ~{rpt.estimated_tokens:,} tokens")
if rpt.exists and verbose and rpt.entries:
for i, entry in enumerate(rpt.entries):
preview = entry[:72].replace("\n", " ")
if len(entry) > 72:
preview += "..."
print(f" #{i+1}: ({len(entry)} chars) {preview}")
print()
print(f" TOTAL {_bar(total_pct)} {total_pct:5.1f}%")
print(f" {total_chars:,}/{total_limit:,} chars | ~{total_tokens:,} tokens")
print()
# Alerts
alerts = []
for rpt in (memory, user):
if rpt.over_budget:
overshoot = rpt.entry_chars - rpt.char_limit
alerts.append(
f" CRITICAL {rpt.label} is {overshoot:,} chars over budget "
f"({rpt.entry_chars:,}/{rpt.char_limit:,}). "
f"Run with --enforce to auto-trim."
)
elif rpt.warning:
alerts.append(
f" WARNING {rpt.label} is at {rpt.usage_pct:.0f}% capacity. "
f"Consider compressing or cleaning up entries."
)
if alerts:
print(" ALERTS")
print(" ------")
for a in alerts:
print(a)
print()
def print_json(memory: FileReport, user: FileReport) -> None:
"""Print a JSON report for machine consumption."""
import json
def _rpt_dict(r: FileReport) -> dict:
return {
"label": r.label,
"path": str(r.path),
"exists": r.exists,
"char_limit": r.char_limit,
"entry_chars": r.entry_chars,
"entry_count": r.entry_count,
"estimated_tokens": r.estimated_tokens,
"usage_pct": round(r.usage_pct, 1),
"over_budget": r.over_budget,
"warning": r.warning,
"remaining_chars": r.remaining_chars,
}
total_chars = memory.entry_chars + user.entry_chars
total_limit = memory.char_limit + user.char_limit
data = {
"memory": _rpt_dict(memory),
"user": _rpt_dict(user),
"total": {
"chars": total_chars,
"limit": total_limit,
"estimated_tokens": total_chars // CHARS_PER_TOKEN,
"usage_pct": round((total_chars / total_limit * 100) if total_limit else 0, 1),
"over_budget": memory.over_budget or user.over_budget,
"warning": memory.warning or user.warning,
},
}
print(json.dumps(data, indent=2))
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def _resolve_hermes_home(custom: str | None) -> Path:
"""Resolve HERMES_HOME directory."""
if custom:
return Path(custom).expanduser()
import os
return Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
def main() -> int:
parser = argparse.ArgumentParser(
description="Check and enforce memory budgets for hermes-agent.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__,
)
parser.add_argument(
"--hermes-home", metavar="DIR",
help="Custom HERMES_HOME directory (default: $HERMES_HOME or ~/.hermes)",
)
parser.add_argument(
"--memory-limit", type=int, default=DEFAULT_MEMORY_CHAR_LIMIT,
help=f"Character limit for MEMORY.md (default: {DEFAULT_MEMORY_CHAR_LIMIT})",
)
parser.add_argument(
"--user-limit", type=int, default=DEFAULT_USER_CHAR_LIMIT,
help=f"Character limit for USER.md (default: {DEFAULT_USER_CHAR_LIMIT})",
)
parser.add_argument(
"--report", action="store_true",
help="Print detailed per-file budget report",
)
parser.add_argument(
"--verbose", "-v", action="store_true",
help="Show individual entry details in report",
)
parser.add_argument(
"--enforce", action="store_true",
help="Trim oldest entries to fit within budget (writes to disk)",
)
parser.add_argument(
"--json", action="store_true", dest="json_output",
help="Output report as JSON (for CI/scripting)",
)
args = parser.parse_args()
hermes_home = _resolve_hermes_home(args.hermes_home)
memories_dir = hermes_home / "memories"
# Analyze both files
memory = analyze_file(
memories_dir / "MEMORY.md", "MEMORY.md", args.memory_limit,
)
user = analyze_file(
memories_dir / "USER.md", "USER.md", args.user_limit,
)
over_budget = memory.over_budget or user.over_budget
trimmed = False
# Enforce budget by trimming entries
if args.enforce and over_budget:
for rpt in (memory, user):
if rpt.over_budget and rpt.exists:
trimmed_entries = trim_entries(rpt)
removed = rpt.entry_count - len(trimmed_entries)
if removed > 0:
_write_entries(rpt.path, trimmed_entries)
rpt.entries = trimmed_entries
rpt.entry_count = len(trimmed_entries)
rpt.entry_chars = len(ENTRY_DELIMITER.join(trimmed_entries))
rpt.raw_chars = rpt.path.stat().st_size
print(f" Trimmed {removed} oldest entries from {rpt.label} "
f"({rpt.entry_chars:,}/{rpt.char_limit:,} chars now)")
trimmed = True
# Re-check after trimming
over_budget = memory.over_budget or user.over_budget
# Output
if args.json_output:
print_json(memory, user)
elif args.report or args.verbose:
print_report(memory, user, verbose=args.verbose)
else:
# Compact summary
if over_budget:
print("Memory budget: OVER")
for rpt in (memory, user):
if rpt.over_budget:
print(f" {rpt.label}: {rpt.entry_chars:,}/{rpt.char_limit:,} chars "
f"({rpt.usage_pct:.0f}%)")
elif memory.warning or user.warning:
print("Memory budget: WARNING")
for rpt in (memory, user):
if rpt.warning:
print(f" {rpt.label}: {rpt.entry_chars:,}/{rpt.char_limit:,} chars "
f"({rpt.usage_pct:.0f}%)")
else:
print("Memory budget: OK")
for rpt in (memory, user):
if rpt.exists:
print(f" {rpt.label}: {rpt.entry_chars:,}/{rpt.char_limit:,} chars "
f"({rpt.usage_pct:.0f}%)")
# Suggest actions when over budget but not enforced
if over_budget and not args.enforce:
suggestions = []
for rpt in (memory, user):
if rpt.over_budget:
suggestions.append(
f" - {rpt.label}: remove stale entries or run with --enforce to auto-trim"
)
# Identify largest entries
if rpt.entries:
indexed = sorted(enumerate(rpt.entries), key=lambda x: len(x[1]), reverse=True)
top3 = indexed[:3]
for idx, entry in top3:
preview = entry[:60].replace("\n", " ")
if len(entry) > 60:
preview += "..."
suggestions.append(
f" largest entry #{idx+1}: ({len(entry)} chars) {preview}"
)
if suggestions:
print()
print("Suggestions:")
for s in suggestions:
print(s)
# Exit code
if trimmed:
return 2
if over_budget:
return 1
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,325 @@
#!/usr/bin/env python3
"""
Memory Sovereignty Verification
Verifies that the memory path in hermes-agent has no network dependencies.
Memory data must stay on the local filesystem only — no HTTP calls, no external
API calls, no cloud sync during memory read/write/flush/load operations.
Scans:
- tools/memory_tool.py (MEMORY.md / USER.md store)
- hermes_state.py (SQLite session store)
- tools/session_search_tool.py (FTS5 session search + summarization)
- tools/graph_store.py (knowledge graph persistence)
- tools/temporal_kg_tool.py (temporal knowledge graph)
- agent/temporal_knowledge_graph.py (temporal triple store)
- tools/skills_tool.py (skill listing/viewing)
- tools/skills_sync.py (bundled skill syncing)
Exit codes:
0 = sovereign (no violations)
1 = violations found
"""
import ast
import re
import sys
from pathlib import Path
# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
# Files in the memory path to scan (relative to repo root).
MEMORY_FILES = [
"tools/memory_tool.py",
"hermes_state.py",
"tools/session_search_tool.py",
"tools/graph_store.py",
"tools/temporal_kg_tool.py",
"agent/temporal_knowledge_graph.py",
"tools/skills_tool.py",
"tools/skills_sync.py",
]
# Patterns that indicate network/external API usage.
NETWORK_PATTERNS = [
# HTTP libraries
(r'\brequests\.(get|post|put|delete|patch|head|session)', "requests HTTP call"),
(r'\burllib\.request\.(urlopen|Request)', "urllib HTTP call"),
(r'\bhttpx\.(get|post|put|delete|Client|AsyncClient)', "httpx HTTP call"),
(r'\bhttp\.client\.(HTTPConnection|HTTPSConnection)', "http.client connection"),
(r'\baiohttp\.(ClientSession|get|post)', "aiohttp HTTP call"),
(r'\bwebsockets\.\w+', "websocket connection"),
# API client patterns
(r'\bopenai\b.*\b(api_key|chat|completions|Client)\b', "OpenAI API usage"),
(r'\banthropic\b.*\b(api_key|messages|Client)\b', "Anthropic API usage"),
(r'\bAsyncOpenAI\b', "AsyncOpenAI client"),
(r'\bAsyncAnthropic\b', "AsyncAnthropic client"),
# Generic network indicators
(r'\bsocket\.(socket|connect|create_connection)', "raw socket connection"),
(r'\bftplib\b', "FTP connection"),
(r'\bsmtplib\b', "SMTP connection"),
(r'\bparamiko\b', "SSH connection via paramiko"),
# URL patterns (hardcoded endpoints)
(r'https?://(?!example\.com)[a-zA-Z0-9._-]+\.(com|org|net|io|dev|ai)', "hardcoded URL"),
]
# Import aliases that indicate network-capable modules.
NETWORK_IMPORTS = {
"requests",
"httpx",
"aiohttp",
"urllib.request",
"http.client",
"websockets",
"openai",
"anthropic",
"openrouter_client",
}
# Functions whose names suggest network I/O.
NETWORK_FUNC_NAMES = {
"async_call_llm",
"extract_content_or_reasoning",
}
# Files that are ALLOWED to have network calls (known violations with justification).
# Each entry maps to a reason string.
KNOWN_VIOLATIONS = {
"tools/graph_store.py": (
"GraphStore persists to Gitea via API. This is a known architectural trade-off "
"for knowledge graph persistence, which is not part of the core memory path "
"(MEMORY.md/USER.md/SQLite). Future work will explore local-first alternatives "
"to align more closely with SOUL.md principles."
),
"tools/session_search_tool.py": (
"Session search uses LLM summarization via an auxiliary client. While the FTS5 "
"search is local, the LLM call for summarization is an external dependency. "
"This is a temporary architectural trade-off for enhanced presentation. "
"Research is ongoing to implement local LLM options for full sovereignty, "
"in line with SOUL.md."
),
}
# ---------------------------------------------------------------------------
# Scanner
# ---------------------------------------------------------------------------
class Violation:
"""A sovereignty violation with location and description."""
def __init__(self, file: str, line: int, description: str, code: str):
self.file = file
self.line = line
self.description = description
self.code = code.strip()
def __str__(self):
return f"{self.file}:{self.line}: {self.description}\n {self.code}"
def scan_file(filepath: Path, repo_root: Path) -> list[Violation]:
"""Scan a single file for network dependency patterns."""
violations = []
rel_path = str(filepath.relative_to(repo_root))
# Skip known violations
if rel_path in KNOWN_VIOLATIONS:
return violations
try:
content = filepath.read_text(encoding="utf-8")
except (OSError, IOError) as e:
print(f"WARNING: Cannot read {rel_path}: {e}", file=sys.stderr)
return violations
lines = content.splitlines()
# --- Check imports ---
try:
tree = ast.parse(content, filename=str(filepath))
except SyntaxError as e:
print(f"WARNING: Cannot parse {rel_path}: {e}", file=sys.stderr)
return violations
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
mod = alias.name
if mod in NETWORK_IMPORTS or any(
mod.startswith(ni + ".") for ni in NETWORK_IMPORTS
):
violations.append(Violation(
rel_path, node.lineno,
f"Network-capable import: {mod}",
lines[node.lineno - 1] if node.lineno <= len(lines) else "",
))
elif isinstance(node, ast.ImportFrom):
if node.module and (
node.module in NETWORK_IMPORTS
or any(node.module.startswith(ni + ".") for ni in NETWORK_IMPORTS)
):
violations.append(Violation(
rel_path, node.lineno,
f"Network-capable import from: {node.module}",
lines[node.lineno - 1] if node.lineno <= len(lines) else "",
))
# --- Check for LLM call function usage ---
for i, line in enumerate(lines, 1):
stripped = line.strip()
if stripped.startswith("#"):
continue
for func_name in NETWORK_FUNC_NAMES:
if func_name in line and not stripped.startswith("def ") and not stripped.startswith("class "):
# Check it's actually a call, not a definition or import
if re.search(r'\b' + func_name + r'\s*\(', line):
violations.append(Violation(
rel_path, i,
f"External LLM call function: {func_name}()",
line,
))
# --- Regex-based pattern matching ---
for i, line in enumerate(lines, 1):
stripped = line.strip()
if stripped.startswith("#"):
continue
for pattern, description in NETWORK_PATTERNS:
if re.search(pattern, line, re.IGNORECASE):
violations.append(Violation(
rel_path, i,
f"Suspicious pattern ({description})",
line,
))
return violations
def verify_sovereignty(repo_root: Path) -> tuple[list[Violation], list[str]]:
"""Run sovereignty verification across all memory files.
Returns (violations, info_messages).
"""
all_violations = []
info = []
for rel_path in MEMORY_FILES:
filepath = repo_root / rel_path
if not filepath.exists():
info.append(f"SKIP: {rel_path} (file not found)")
continue
if rel_path in KNOWN_VIOLATIONS:
info.append(
f"WARN: {rel_path} — known violation (excluded from gate): "
f"{KNOWN_VIOLATIONS[rel_path]}"
)
continue
violations = scan_file(filepath, repo_root)
all_violations.extend(violations)
if not violations:
info.append(f"PASS: {rel_path} — sovereign (local-only)")
return all_violations, info
# ---------------------------------------------------------------------------
# Deep analysis helpers
# ---------------------------------------------------------------------------
def check_graph_store_network(repo_root: Path) -> str:
"""Analyze graph_store.py for its network dependencies."""
filepath = repo_root / "tools" / "graph_store.py"
if not filepath.exists():
return ""
content = filepath.read_text(encoding="utf-8")
if "GiteaClient" in content:
return (
"tools/graph_store.py uses GiteaClient for persistence — "
"this is an external API call. However, graph_store is NOT part of "
"the core memory path (MEMORY.md/USER.md/SQLite). It is a separate "
"knowledge graph system."
)
return ""
def check_session_search_llm(repo_root: Path) -> str:
"""Analyze session_search_tool.py for LLM usage."""
filepath = repo_root / "tools" / "session_search_tool.py"
if not filepath.exists():
return ""
content = filepath.read_text(encoding="utf-8")
warnings = []
if "async_call_llm" in content:
warnings.append("uses async_call_llm for summarization")
if "auxiliary_client" in content:
warnings.append("imports auxiliary_client (LLM calls)")
if warnings:
return (
f"tools/session_search_tool.py: {'; '.join(warnings)}. "
f"The FTS5 search is local SQLite, but session summarization "
f"involves LLM API calls."
)
return ""
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
repo_root = Path(__file__).resolve().parent.parent
print(f"Memory Sovereignty Verification")
print(f"Repository: {repo_root}")
print(f"Scanning {len(MEMORY_FILES)} memory-path files...")
print()
violations, info = verify_sovereignty(repo_root)
# Print info messages
for msg in info:
print(f" {msg}")
# Print deep analysis
print()
print("Deep analysis:")
for checker in [check_graph_store_network, check_session_search_llm]:
note = checker(repo_root)
if note:
print(f" NOTE: {note}")
print()
if violations:
print(f"SOVEREIGNTY VIOLATIONS FOUND: {len(violations)}")
print("=" * 60)
for v in violations:
print(v)
print()
print("=" * 60)
print(
f"FAIL: {len(violations)} potential network dependencies detected "
f"in the memory path."
)
print("Memory must be local-only (filesystem + SQLite).")
print()
print("If a violation is intentional and documented, add it to")
print("KNOWN_VIOLATIONS in this script with a justification.")
return 1
else:
print("PASS: Memory path is sovereign — no network dependencies detected.")
print("All memory operations use local filesystem and/or SQLite only.")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -895,7 +895,7 @@ class TestKimiMoonshotModelListIsolation:
def test_moonshot_list_excludes_coding_plan_only_models(self):
from hermes_cli.main import _PROVIDER_MODELS
moonshot_models = _PROVIDER_MODELS["moonshot"]
coding_plan_only = {"kimi-for-coding", "kimi-k2-thinking-turbo"}
coding_plan_only = {"kimi-k2.5", "kimi-k2-thinking-turbo"}
leaked = set(moonshot_models) & coding_plan_only
assert not leaked, f"Moonshot list contains Coding Plan-only models: {leaked}"
@@ -908,7 +908,7 @@ class TestKimiMoonshotModelListIsolation:
def test_coding_plan_list_contains_plan_specific_models(self):
from hermes_cli.main import _PROVIDER_MODELS
coding_models = _PROVIDER_MODELS["kimi-coding"]
assert "kimi-for-coding" in coding_models
assert "kimi-k2.5" in coding_models
assert "kimi-k2-thinking-turbo" in coding_models

View File

@@ -142,7 +142,7 @@ hermes chat --provider zai --model glm-5
# Requires: GLM_API_KEY in ~/.hermes/.env
# Kimi / Moonshot AI
hermes chat --provider kimi-coding --model kimi-for-coding
hermes chat --provider kimi-coding --model kimi-k2.5
# Requires: KIMI_API_KEY in ~/.hermes/.env
# MiniMax (global endpoint)