- add ACP user and developer docs covering setup, lifecycle, callbacks, permissions, tool rendering, and runtime behavior - add developer guides for agent loop, provider runtime resolution, prompt assembly, context caching/compression, gateway internals, session storage, tools runtime, trajectories, and cron internals - refresh architecture, quickstart, installation, CLI reference, and environments docs to link the new implementation pages and ACP support
73 lines
2.1 KiB
Markdown
73 lines
2.1 KiB
Markdown
---
|
|
sidebar_position: 6
|
|
title: "Context Compression & Prompt Caching"
|
|
description: "How Hermes compresses long conversations and applies provider-side prompt caching"
|
|
---
|
|
|
|
# Context Compression & Prompt Caching
|
|
|
|
Hermes manages long conversations with two complementary mechanisms:
|
|
|
|
- prompt caching
|
|
- context compression
|
|
|
|
Primary files:
|
|
|
|
- `agent/prompt_caching.py`
|
|
- `agent/context_compressor.py`
|
|
- `run_agent.py`
|
|
|
|
## Prompt caching
|
|
|
|
For Anthropic/native and Claude-via-OpenRouter flows, Hermes applies Anthropic-style cache markers.
|
|
|
|
Current strategy:
|
|
|
|
- cache the system prompt
|
|
- cache the last 3 non-system messages
|
|
- default TTL is 5 minutes unless explicitly extended
|
|
|
|
This is implemented in `agent/prompt_caching.py`.
|
|
|
|
## Why prompt stability matters
|
|
|
|
Prompt caching only helps when the stable prefix remains stable. That is why Hermes avoids rebuilding or mutating the core system prompt mid-session unless it has to.
|
|
|
|
## Compression trigger
|
|
|
|
Hermes can compress context when conversations become large. Configuration defaults live in `config.yaml`, and the compressor also has runtime checks based on actual prompt token counts.
|
|
|
|
## Compression algorithm
|
|
|
|
The compressor protects:
|
|
|
|
- the first N turns
|
|
- the last N turns
|
|
|
|
and summarizes the middle section.
|
|
|
|
It also cleans up structural issues such as orphaned tool-call/result pairs so the API never receives invalid conversation structure after compression.
|
|
|
|
## Pre-compression memory flush
|
|
|
|
Before compression, Hermes can give the model one last chance to persist memory so facts are not lost when middle turns are summarized away.
|
|
|
|
## Session lineage after compression
|
|
|
|
Compression can split the session into a new session ID while preserving parent lineage in the state DB.
|
|
|
|
This lets Hermes continue operating with a smaller active context while retaining a searchable ancestry chain.
|
|
|
|
## Re-injected state after compression
|
|
|
|
After compression, Hermes may re-inject compact operational state such as:
|
|
|
|
- todo snapshot
|
|
- prior-read-files summary
|
|
|
|
## Related docs
|
|
|
|
- [Prompt Assembly](./prompt-assembly.md)
|
|
- [Session Storage](./session-storage.md)
|
|
- [Agent Loop Internals](./agent-loop.md)
|