[HARNESS] Deterministic context compaction — rule-based, zero inference cost #640

New Issue

perplexity · 2026-03-27T03:13:41Z

perplexity commented

2026-03-27 03:13:41 +00:00

Problem

Hermes Agent's built-in compression is LLM-based — it burns inference cycles summarizing context on a local 14B model that should be thinking, not housekeeping. At 23 tok/s, every wasted token is real time.

The system prompt alone is ~16k tokens. A few tool calls and the 65k window fills fast. We need compaction that's automatic, lossless where it matters, and costs zero inference.

Design: Rule-Based Compaction Middleware

A Python sidecar (or Huey task) that processes the Hermes session message history using deterministic string rules. No LLM involved.

Rules (in priority order)

1. Collapse succeeded tool call/response pairs

If a tool call returned successfully, drop the raw JSON request and the full response body
Keep a one-line summary: [tool: read_file("config.yaml") → 200 OK, 347 chars]
Preserves the fact that a tool was called and succeeded, drops the payload

2. Truncate large tool outputs

Any tool response over 500 chars: keep first 10 lines + last 5 lines + [...truncated {n} lines...]
Terminal/shell output is the worst offender here

3. Drop system messages after turn 1

SOUL.md, skills, MCP config — the model read them once. Don't carry 16k tokens of identity through every turn
Keep a 1-line marker: [system prompt: SOUL.md + 3 skills + 2 MCP servers loaded]

4. Collapse consecutive assistant messages

If assistant spoke 3 times in a row (thinking steps), keep only the last message
The intermediate reasoning is captured in the session log for DPO — it doesn't need to stay in context

5. Never touch user messages

All user messages are kept verbatim — that's intent, and it's the training signal

6. Never touch the last N turns

Protect the last 5 turn pairs (user + assistant) from any compaction
Only compact the "middle" of the conversation

Trigger

Fires when context usage hits 60% of window (configurable)
OR can be called manually via a command

Preservation Guarantee

Every compacted element retains enough info to be restorable if the original session log exists
File paths, URLs, tool names, and error messages are never dropped
This is compression, not deletion — the full data lives in ~/.hermes/sessions/

Interface

Reads: Hermes session message array (JSON)
Writes: Compacted message array (same format, fewer tokens)
Config: ~/.timmy/timmy-config/compaction.yaml

compaction:
  enabled: true
  trigger_threshold: 0.60    # fire at 60% of context window
  protect_last_n_turns: 5    # never touch recent turns
  max_tool_output_chars: 500 # truncate above this
  keep_tool_summary: true    # one-line summaries of collapsed tools
  drop_system_after_turn: 1  # drop system messages after turn 1
  collapse_consecutive_assistant: true

What This Is NOT

Not LLM-based summarization
Not lossy — session logs retain everything
Not upstream Hermes — this is a Timmy sidecar in timmy-config

Implementation

Single Python file: compaction.py in timmy-config
Can be called from Huey as a periodic task or wired as middleware
~50-100 lines of string manipulation
Source controlled in Timmy_Foundation/timmy-config

DPO Training Implications

Compaction happens in the live context window only
The full uncompacted session is always preserved in ~/.hermes/sessions/ for DPO pair extraction
session_export() Huey task reads from the raw sessions, not the compacted view
Training data quality is unaffected

References

Manus context engineering — similar philosophy: file system as infinite context, restorable compression
#610 (Shadow Context Manager)
#603 (Aurora pipeline)

Spec by Perplexity — deterministic compaction for sovereign local inference

## Problem Hermes Agent's built-in compression is LLM-based — it burns inference cycles summarizing context on a local 14B model that should be thinking, not housekeeping. At 23 tok/s, every wasted token is real time. The system prompt alone is ~16k tokens. A few tool calls and the 65k window fills fast. We need compaction that's automatic, lossless where it matters, and costs zero inference. ## Design: Rule-Based Compaction Middleware A Python sidecar (or Huey task) that processes the Hermes session message history using deterministic string rules. No LLM involved. ### Rules (in priority order) **1. Collapse succeeded tool call/response pairs** - If a tool call returned successfully, drop the raw JSON request and the full response body - Keep a one-line summary: `[tool: read_file("config.yaml") → 200 OK, 347 chars]` - Preserves the *fact* that a tool was called and succeeded, drops the payload **2. Truncate large tool outputs** - Any tool response over 500 chars: keep first 10 lines + last 5 lines + `[...truncated {n} lines...]` - Terminal/shell output is the worst offender here **3. Drop system messages after turn 1** - SOUL.md, skills, MCP config — the model read them once. Don't carry 16k tokens of identity through every turn - Keep a 1-line marker: `[system prompt: SOUL.md + 3 skills + 2 MCP servers loaded]` **4. Collapse consecutive assistant messages** - If assistant spoke 3 times in a row (thinking steps), keep only the last message - The intermediate reasoning is captured in the session log for DPO — it doesn't need to stay in context **5. Never touch user messages** - All user messages are kept verbatim — that's intent, and it's the training signal **6. Never touch the last N turns** - Protect the last 5 turn pairs (user + assistant) from any compaction - Only compact the "middle" of the conversation ### Trigger - Fires when context usage hits 60% of window (configurable) - OR can be called manually via a command ### Preservation Guarantee - Every compacted element retains enough info to be *restorable* if the original session log exists - File paths, URLs, tool names, and error messages are never dropped - This is compression, not deletion — the full data lives in `~/.hermes/sessions/` ### Interface - Reads: Hermes session message array (JSON) - Writes: Compacted message array (same format, fewer tokens) - Config: `~/.timmy/timmy-config/compaction.yaml` ```yaml compaction: enabled: true trigger_threshold: 0.60 # fire at 60% of context window protect_last_n_turns: 5 # never touch recent turns max_tool_output_chars: 500 # truncate above this keep_tool_summary: true # one-line summaries of collapsed tools drop_system_after_turn: 1 # drop system messages after turn 1 collapse_consecutive_assistant: true ``` ### What This Is NOT - Not LLM-based summarization - Not lossy — session logs retain everything - Not upstream Hermes — this is a Timmy sidecar in timmy-config ## Implementation - Single Python file: `compaction.py` in timmy-config - Can be called from Huey as a periodic task or wired as middleware - ~50-100 lines of string manipulation - Source controlled in Timmy_Foundation/timmy-config ## DPO Training Implications - Compaction happens in the *live context window* only - The full uncompacted session is always preserved in `~/.hermes/sessions/` for DPO pair extraction - `session_export()` Huey task reads from the raw sessions, not the compacted view - Training data quality is unaffected ## References - [Manus context engineering](https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus) — similar philosophy: file system as infinite context, restorable compression - #610 (Shadow Context Manager) - #603 (Aurora pipeline) --- _Spec by Perplexity — deterministic compaction for sovereign local inference_

perplexity added the p0-critical harness sovereignty labels 2026-03-27 03:13:41 +00:00

Timmy commented

2026-03-27 03:15:22 +00:00

🔍 Triaged by Huey — needs assignment.

gemini commented

2026-03-27 03:20:25 +00:00

🔧 gemini working on this via Huey. Branch: gemini/issue-640

🔧 `gemini` working on this via Huey. Branch: `gemini/issue-640`

grok commented

2026-03-27 03:20:33 +00:00

🔧 grok working on this via Huey. Branch: grok/issue-640

🔧 `grok` working on this via Huey. Branch: `grok/issue-640`

grok commented

2026-03-27 03:20:35 +00:00

⚠️ grok produced no changes for this issue. Skipping.

⚠️ `grok` produced no changes for this issue. Skipping.

Timmy commented

2026-03-27 03:20:36 +00:00

⚡ Dispatched to claude. Huey task queued.

⚡ Dispatched to `claude`. Huey task queued.

Timmy commented

2026-03-27 03:20:36 +00:00

⚡ Dispatched to gemini. Huey task queued.

⚡ Dispatched to `gemini`. Huey task queued.

Timmy commented

2026-03-27 03:20:37 +00:00

⚡ Dispatched to kimi. Huey task queued.

⚡ Dispatched to `kimi`. Huey task queued.

gemini referenced a pull request that will close this issue

2026-03-27 03:20:38 +00:00

[gemini] [HARNESS] Deterministic context compaction — rule-based, zero inference cost (#640) #641

gemini referenced this issue from a commit

2026-03-27 03:20:38 +00:00

[gemini] [HARNESS] Deterministic context compaction — rule-based, zero inference cost (#640)

Timmy commented

2026-03-27 03:20:39 +00:00

⚡ Dispatched to grok. Huey task queued.

⚡ Dispatched to `grok`. Huey task queued.

Timmy commented

2026-03-27 03:20:40 +00:00

⚡ Dispatched to perplexity. Huey task queued.

⚡ Dispatched to `perplexity`. Huey task queued.

Timmy commented

2026-03-27 03:30:18 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 03:45:21 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 04:00:23 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 04:15:19 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 04:30:21 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 14:45:26 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 15:00:21 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 15:15:24 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 15:30:22 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 15:45:20 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 16:00:20 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 16:15:24 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 16:30:27 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 16:45:26 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy was assigned by Rockachopa

2026-03-28 03:54:21 +00:00

Timmy commented

2026-03-28 04:45:28 +00:00

Closing as duplicate during backlog burn-down. Canonical issue: #638.

Reason: this workstream already exists with materially the same title/scope. Keeping one canonical thread prevents agent churn and review waste.

Closing as duplicate during backlog burn-down. Canonical issue: #638. Reason: this workstream already exists with materially the same title/scope. Keeping one canonical thread prevents agent churn and review waste.

Timmy closed this issue

2026-03-28 04:45:28 +00:00

Sign in to join this conversation.

4 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#640