[STUDY] Sovereign Local Agents on macOS — Hermes v0.4.0 Architecture Spike #576

Closed
opened 2026-03-26 19:56:09 +00:00 by perplexity · 9 comments
Member

Research Spike: Sovereign Local Agents on macOS — State of the Art

Source: "Sovereign Local Agents on macOS: A State-of-the-Art Technical Architecture" (20pp, generated by Kimi.ai, March 2026)

This report covers the full Hermes Agent v0.4.0 architecture and how it maps to our sovereign stack. Key sections with actionable findings below.


1. Hermes Agent Harness (Section 1)

What we already have: Hermes Agent installed on Hermes (Mac M3 Max), config.yaml with Ollama, SOUL.md, memory enabled, orchestration MCP server.

What the report confirms we're missing:

  • MCP servers for perception/action (steam-info-mcp, mcp-pyautogui) are defined in mcp/servers.json but not registered in config.yaml mcp_servers — Hermes doesn't see them
  • Model config still points at claude-opus-4-6 (Anthropic) as default with hermes3:latest (8B) as Ollama fallback — should point at hermes4:14b once pulled (#9)
  • Hermes v0.4.0 has a built-in trajectory compression pipeline (trajectory_compressor.py) that we're not using — it compresses agent execution traces into training-optimized formats, 10-50x reduction

Key v0.4.0 features we should leverage:

  • hermes mcp CLI for installing/configuring MCP servers (with OAuth 2.1 PKCE flow)
  • Background self-improvement thread ("online distillation")
  • Context compression overhaul with configurable summary endpoint
  • Trajectory export directly compatible with Atropos RL framework

2. Instant Distillation / Context Compression (Section 2)

Structured Distillation paper (Nous Research, March 13, 2026, arXiv):

  • Transforms each conversation exchange into 4 fields: exchange_core (~15 tokens), specific_context (~23 tokens), thematic_room_assignments, files_touched
  • 11x compression ratio (371→38 tokens) with 96% retrieval MRR preserved
  • Dense vector retrieval robust to distillation; BM25/lexical search degrades significantly
  • Implication: Our DPO training pipeline should prioritize vector-based retrieval over keyword search for session mining

Agentic On-Policy Distillation (OPD):

  • v0.2.0 introduced OPD as RL training environment in Atropos
  • tinker-atropos standalone training infrastructure available
  • 4B Qwen3.5 achieved convergence with 27B baseline after ~7 hours of autonomous self-improvement — 7x effective capacity expansion through learning

3. Memory Architecture (Section 3)

Five-tier memory already in Hermes: Working Context → FTS5 Search → Vector Embeddings → Honcho Profiles → Skill Documents

Hindsight-Hermes plugin (worth evaluating):

  • pip install hindsight-hermes
  • Structured fact extraction + entity resolution + knowledge graph
  • Multi-strategy retrieval: semantic + BM25 + entity graph traversal + temporal filtering
  • Self-hosted Docker option for sovereignty
  • Critical step after install: hermes tools disable memory (prevents native tool preference)

my-hermantic-agent fork shows Ollama-hosted Hermes-4-14B with TimescaleDB persistent semantic memory — closest to our architecture.

4. Auto Research / Self-Improvement (Section 4)

  • hermes-autoresearch branch — experimental, enables autonomous hypothesis generation and experimentation
  • ~700 autonomous commits for nanochat optimization, 11% improvement
  • Not ready for us yet — experimental branch, expect instability

5. Tool Ecosystem (Section 5)

  • 80+ production-ready skills in hermes-skills repo
  • agentskills.io standard adopted by 11 tools (Claude Code, Cursor, Copilot, Gemini CLI, etc.)
  • Gateway supports 15 messaging platforms (v0.4.0 added Signal, DingTalk, SMS, Mattermost, Matrix, Webhook)
  • ACP server enables IDE integration (VS Code, Zed, JetBrains)

6. macOS Deployment (Section 6)

M3 Max with 128GB unified memory can run Hermes 3 70B without quantization — capability requiring ~$15,000 in discrete GPU hardware elsewhere. We should be running at least 14B without breaking a sweat.

Security layers available: namespace isolation, capability dropping, read-only root FS, seccomp-bpf, DM pairing for credential access.

7. Integration Roadmap from Report (Section 7)

The report's 4-phase roadmap maps to our current state:

Phase Report says Our status
Phase 1: Core (Day 1) Install Hermes + Ollama + base model DONE
Phase 1: Memory Install hindsight-hermes NOT STARTED — evaluate
Phase 2: Distillation (Week 1) Enable compression + trajectory export NOT STARTED — tickets below
Phase 3: Auto Research (Week 2-4) Deploy hermes-autoresearch branch BLOCKED — experimental
Phase 4: Ecosystem (Ongoing) Custom skills, multi-agent orchestration IN PROGRESS (our swarm)

Actionable Tickets Created

  • Register steam-info-mcp and desktop-control in config.yaml mcp_servers
  • Rewire heartbeat_tick() in tasks.py to invoke Hermes agent sessions (telemetry capture)
  • Enable trajectory export: /config set trajectory_export true + HERMES_TRAJECTORY_PATH
  • Switch model default from Anthropic cloud to local Ollama (hermes4:14b after #9 completes)
## Research Spike: Sovereign Local Agents on macOS — State of the Art **Source:** "Sovereign Local Agents on macOS: A State-of-the-Art Technical Architecture" (20pp, generated by Kimi.ai, March 2026) This report covers the full Hermes Agent v0.4.0 architecture and how it maps to our sovereign stack. Key sections with actionable findings below. --- ### 1. Hermes Agent Harness (Section 1) **What we already have:** Hermes Agent installed on Hermes (Mac M3 Max), config.yaml with Ollama, SOUL.md, memory enabled, orchestration MCP server. **What the report confirms we're missing:** - MCP servers for perception/action (`steam-info-mcp`, `mcp-pyautogui`) are defined in `mcp/servers.json` but **not registered in `config.yaml` `mcp_servers`** — Hermes doesn't see them - Model config still points at `claude-opus-4-6` (Anthropic) as default with `hermes3:latest` (8B) as Ollama fallback — should point at `hermes4:14b` once pulled (#9) - Hermes v0.4.0 has a built-in **trajectory compression pipeline** (`trajectory_compressor.py`) that we're not using — it compresses agent execution traces into training-optimized formats, 10-50x reduction **Key v0.4.0 features we should leverage:** - `hermes mcp` CLI for installing/configuring MCP servers (with OAuth 2.1 PKCE flow) - Background self-improvement thread ("online distillation") - Context compression overhaul with configurable summary endpoint - Trajectory export directly compatible with Atropos RL framework ### 2. Instant Distillation / Context Compression (Section 2) **Structured Distillation paper (Nous Research, March 13, 2026, arXiv):** - Transforms each conversation exchange into 4 fields: `exchange_core` (~15 tokens), `specific_context` (~23 tokens), `thematic_room_assignments`, `files_touched` - 11x compression ratio (371→38 tokens) with 96% retrieval MRR preserved - Dense vector retrieval robust to distillation; BM25/lexical search degrades significantly - **Implication:** Our DPO training pipeline should prioritize vector-based retrieval over keyword search for session mining **Agentic On-Policy Distillation (OPD):** - v0.2.0 introduced OPD as RL training environment in Atropos - `tinker-atropos` standalone training infrastructure available - 4B Qwen3.5 achieved convergence with 27B baseline after ~7 hours of autonomous self-improvement — **7x effective capacity expansion through learning** ### 3. Memory Architecture (Section 3) **Five-tier memory already in Hermes:** Working Context → FTS5 Search → Vector Embeddings → Honcho Profiles → Skill Documents **Hindsight-Hermes plugin** (worth evaluating): - `pip install hindsight-hermes` - Structured fact extraction + entity resolution + knowledge graph - Multi-strategy retrieval: semantic + BM25 + entity graph traversal + temporal filtering - Self-hosted Docker option for sovereignty - Critical step after install: `hermes tools disable memory` (prevents native tool preference) **`my-hermantic-agent` fork** shows Ollama-hosted Hermes-4-14B with TimescaleDB persistent semantic memory — closest to our architecture. ### 4. Auto Research / Self-Improvement (Section 4) - `hermes-autoresearch` branch — experimental, enables autonomous hypothesis generation and experimentation - ~700 autonomous commits for nanochat optimization, 11% improvement - **Not ready for us yet** — experimental branch, expect instability ### 5. Tool Ecosystem (Section 5) - 80+ production-ready skills in `hermes-skills` repo - `agentskills.io` standard adopted by 11 tools (Claude Code, Cursor, Copilot, Gemini CLI, etc.) - Gateway supports 15 messaging platforms (v0.4.0 added Signal, DingTalk, SMS, Mattermost, Matrix, Webhook) - ACP server enables IDE integration (VS Code, Zed, JetBrains) ### 6. macOS Deployment (Section 6) **M3 Max with 128GB unified memory can run Hermes 3 70B without quantization** — capability requiring ~$15,000 in discrete GPU hardware elsewhere. We should be running at least 14B without breaking a sweat. **Security layers available:** namespace isolation, capability dropping, read-only root FS, seccomp-bpf, DM pairing for credential access. ### 7. Integration Roadmap from Report (Section 7) The report's 4-phase roadmap maps to our current state: | Phase | Report says | Our status | |-------|------------|------------| | Phase 1: Core (Day 1) | Install Hermes + Ollama + base model | DONE | | Phase 1: Memory | Install hindsight-hermes | NOT STARTED — evaluate | | Phase 2: Distillation (Week 1) | Enable compression + trajectory export | NOT STARTED — tickets below | | Phase 3: Auto Research (Week 2-4) | Deploy hermes-autoresearch branch | BLOCKED — experimental | | Phase 4: Ecosystem (Ongoing) | Custom skills, multi-agent orchestration | IN PROGRESS (our swarm) | --- ### Actionable Tickets Created - [ ] Register `steam-info-mcp` and `desktop-control` in config.yaml `mcp_servers` - [ ] Rewire `heartbeat_tick()` in tasks.py to invoke Hermes agent sessions (telemetry capture) - [ ] Enable trajectory export: `/config set trajectory_export true` + `HERMES_TRAJECTORY_PATH` - [ ] Switch model default from Anthropic cloud to local Ollama (`hermes4:14b` after #9 completes)
Timmy was assigned by perplexity 2026-03-26 19:56:10 +00:00
Member

🔧 gemini working on this via Huey. Branch: gemini/issue-576

🔧 `gemini` working on this via Huey. Branch: `gemini/issue-576`
Member

🔧 grok working on this via Huey. Branch: grok/issue-576

🔧 `grok` working on this via Huey. Branch: `grok/issue-576`
Member

⚠️ grok produced no changes for this issue. Skipping.

⚠️ `grok` produced no changes for this issue. Skipping.
Owner

Dispatched to claude. Huey task queued.

⚡ Dispatched to `claude`. Huey task queued.
Owner

Dispatched to gemini. Huey task queued.

⚡ Dispatched to `gemini`. Huey task queued.
Owner

Dispatched to kimi. Huey task queued.

⚡ Dispatched to `kimi`. Huey task queued.
Owner

Dispatched to grok. Huey task queued.

⚡ Dispatched to `grok`. Huey task queued.
Owner

Dispatched to perplexity. Huey task queued.

⚡ Dispatched to `perplexity`. Huey task queued.
Owner

Closing during the 2026-03-28 backlog burn-down.

Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.

Closing during the 2026-03-28 backlog burn-down. Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.
Timmy closed this issue 2026-03-28 04:52:41 +00:00
Sign in to join this conversation.
4 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#576