[HARNESS] Deterministic context compaction — rule-based, zero inference cost #640

Closed
opened 2026-03-27 03:13:41 +00:00 by perplexity · 24 comments
Member

Problem

Hermes Agent's built-in compression is LLM-based — it burns inference cycles summarizing context on a local 14B model that should be thinking, not housekeeping. At 23 tok/s, every wasted token is real time.

The system prompt alone is ~16k tokens. A few tool calls and the 65k window fills fast. We need compaction that's automatic, lossless where it matters, and costs zero inference.

Design: Rule-Based Compaction Middleware

A Python sidecar (or Huey task) that processes the Hermes session message history using deterministic string rules. No LLM involved.

Rules (in priority order)

1. Collapse succeeded tool call/response pairs

  • If a tool call returned successfully, drop the raw JSON request and the full response body
  • Keep a one-line summary: [tool: read_file("config.yaml") → 200 OK, 347 chars]
  • Preserves the fact that a tool was called and succeeded, drops the payload

2. Truncate large tool outputs

  • Any tool response over 500 chars: keep first 10 lines + last 5 lines + [...truncated {n} lines...]
  • Terminal/shell output is the worst offender here

3. Drop system messages after turn 1

  • SOUL.md, skills, MCP config — the model read them once. Don't carry 16k tokens of identity through every turn
  • Keep a 1-line marker: [system prompt: SOUL.md + 3 skills + 2 MCP servers loaded]

4. Collapse consecutive assistant messages

  • If assistant spoke 3 times in a row (thinking steps), keep only the last message
  • The intermediate reasoning is captured in the session log for DPO — it doesn't need to stay in context

5. Never touch user messages

  • All user messages are kept verbatim — that's intent, and it's the training signal

6. Never touch the last N turns

  • Protect the last 5 turn pairs (user + assistant) from any compaction
  • Only compact the "middle" of the conversation

Trigger

  • Fires when context usage hits 60% of window (configurable)
  • OR can be called manually via a command

Preservation Guarantee

  • Every compacted element retains enough info to be restorable if the original session log exists
  • File paths, URLs, tool names, and error messages are never dropped
  • This is compression, not deletion — the full data lives in ~/.hermes/sessions/

Interface

  • Reads: Hermes session message array (JSON)
  • Writes: Compacted message array (same format, fewer tokens)
  • Config: ~/.timmy/timmy-config/compaction.yaml
compaction:
  enabled: true
  trigger_threshold: 0.60    # fire at 60% of context window
  protect_last_n_turns: 5    # never touch recent turns
  max_tool_output_chars: 500 # truncate above this
  keep_tool_summary: true    # one-line summaries of collapsed tools
  drop_system_after_turn: 1  # drop system messages after turn 1
  collapse_consecutive_assistant: true

What This Is NOT

  • Not LLM-based summarization
  • Not lossy — session logs retain everything
  • Not upstream Hermes — this is a Timmy sidecar in timmy-config

Implementation

  • Single Python file: compaction.py in timmy-config
  • Can be called from Huey as a periodic task or wired as middleware
  • ~50-100 lines of string manipulation
  • Source controlled in Timmy_Foundation/timmy-config

DPO Training Implications

  • Compaction happens in the live context window only
  • The full uncompacted session is always preserved in ~/.hermes/sessions/ for DPO pair extraction
  • session_export() Huey task reads from the raw sessions, not the compacted view
  • Training data quality is unaffected

References


Spec by Perplexity — deterministic compaction for sovereign local inference

## Problem Hermes Agent's built-in compression is LLM-based — it burns inference cycles summarizing context on a local 14B model that should be thinking, not housekeeping. At 23 tok/s, every wasted token is real time. The system prompt alone is ~16k tokens. A few tool calls and the 65k window fills fast. We need compaction that's automatic, lossless where it matters, and costs zero inference. ## Design: Rule-Based Compaction Middleware A Python sidecar (or Huey task) that processes the Hermes session message history using deterministic string rules. No LLM involved. ### Rules (in priority order) **1. Collapse succeeded tool call/response pairs** - If a tool call returned successfully, drop the raw JSON request and the full response body - Keep a one-line summary: `[tool: read_file("config.yaml") → 200 OK, 347 chars]` - Preserves the *fact* that a tool was called and succeeded, drops the payload **2. Truncate large tool outputs** - Any tool response over 500 chars: keep first 10 lines + last 5 lines + `[...truncated {n} lines...]` - Terminal/shell output is the worst offender here **3. Drop system messages after turn 1** - SOUL.md, skills, MCP config — the model read them once. Don't carry 16k tokens of identity through every turn - Keep a 1-line marker: `[system prompt: SOUL.md + 3 skills + 2 MCP servers loaded]` **4. Collapse consecutive assistant messages** - If assistant spoke 3 times in a row (thinking steps), keep only the last message - The intermediate reasoning is captured in the session log for DPO — it doesn't need to stay in context **5. Never touch user messages** - All user messages are kept verbatim — that's intent, and it's the training signal **6. Never touch the last N turns** - Protect the last 5 turn pairs (user + assistant) from any compaction - Only compact the "middle" of the conversation ### Trigger - Fires when context usage hits 60% of window (configurable) - OR can be called manually via a command ### Preservation Guarantee - Every compacted element retains enough info to be *restorable* if the original session log exists - File paths, URLs, tool names, and error messages are never dropped - This is compression, not deletion — the full data lives in `~/.hermes/sessions/` ### Interface - Reads: Hermes session message array (JSON) - Writes: Compacted message array (same format, fewer tokens) - Config: `~/.timmy/timmy-config/compaction.yaml` ```yaml compaction: enabled: true trigger_threshold: 0.60 # fire at 60% of context window protect_last_n_turns: 5 # never touch recent turns max_tool_output_chars: 500 # truncate above this keep_tool_summary: true # one-line summaries of collapsed tools drop_system_after_turn: 1 # drop system messages after turn 1 collapse_consecutive_assistant: true ``` ### What This Is NOT - Not LLM-based summarization - Not lossy — session logs retain everything - Not upstream Hermes — this is a Timmy sidecar in timmy-config ## Implementation - Single Python file: `compaction.py` in timmy-config - Can be called from Huey as a periodic task or wired as middleware - ~50-100 lines of string manipulation - Source controlled in Timmy_Foundation/timmy-config ## DPO Training Implications - Compaction happens in the *live context window* only - The full uncompacted session is always preserved in `~/.hermes/sessions/` for DPO pair extraction - `session_export()` Huey task reads from the raw sessions, not the compacted view - Training data quality is unaffected ## References - [Manus context engineering](https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus) — similar philosophy: file system as infinite context, restorable compression - #610 (Shadow Context Manager) - #603 (Aurora pipeline) --- _Spec by Perplexity — deterministic compaction for sovereign local inference_
perplexity added the p0-criticalharnesssovereignty labels 2026-03-27 03:13:41 +00:00
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Member

🔧 gemini working on this via Huey. Branch: gemini/issue-640

🔧 `gemini` working on this via Huey. Branch: `gemini/issue-640`
Member

🔧 grok working on this via Huey. Branch: grok/issue-640

🔧 `grok` working on this via Huey. Branch: `grok/issue-640`
Member

⚠️ grok produced no changes for this issue. Skipping.

⚠️ `grok` produced no changes for this issue. Skipping.
Owner

Dispatched to claude. Huey task queued.

⚡ Dispatched to `claude`. Huey task queued.
Owner

Dispatched to gemini. Huey task queued.

⚡ Dispatched to `gemini`. Huey task queued.
Owner

Dispatched to kimi. Huey task queued.

⚡ Dispatched to `kimi`. Huey task queued.
Owner

Dispatched to grok. Huey task queued.

⚡ Dispatched to `grok`. Huey task queued.
Owner

Dispatched to perplexity. Huey task queued.

⚡ Dispatched to `perplexity`. Huey task queued.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Timmy was assigned by Rockachopa 2026-03-28 03:54:21 +00:00
Owner

Closing as duplicate during backlog burn-down. Canonical issue: #638.

Reason: this workstream already exists with materially the same title/scope. Keeping one canonical thread prevents agent churn and review waste.

Closing as duplicate during backlog burn-down. Canonical issue: #638. Reason: this workstream already exists with materially the same title/scope. Keeping one canonical thread prevents agent churn and review waste.
Timmy closed this issue 2026-03-28 04:45:28 +00:00
Sign in to join this conversation.
4 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#640