Add context compression feature for long conversations
- Implemented automatic context compression to manage long conversations that approach the model's context limit. - Configured the feature to summarize middle turns while protecting the first three and last four turns, ensuring important context is retained. - Added configuration options in `cli-config.yaml` and environment variables for enabling/disabling compression and setting thresholds. - Updated documentation in `README.md`, `cli.md`, and `.env.example` to explain the context compression functionality and its configuration. - Enhanced the `cli.py` to load compression settings into environment variables, ensuring seamless integration with the CLI. - Completed the implementation of context compression as outlined in the TODO list, marking it as a significant enhancement to conversation management.
This commit is contained in:
32
docs/cli.md
32
docs/cli.md
@@ -250,6 +250,38 @@ This is useful for:
|
||||
- Replaying conversations
|
||||
- Training data inspection
|
||||
|
||||
### Context Compression
|
||||
|
||||
Long conversations can exceed model context limits. The CLI automatically compresses context when approaching the limit:
|
||||
|
||||
```yaml
|
||||
# In cli-config.yaml
|
||||
compression:
|
||||
enabled: true # Enable auto-compression
|
||||
threshold: 0.85 # Compress at 85% of context limit
|
||||
summary_model: "google/gemini-2.0-flash-001"
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
1. Tracks actual token usage from each API response
|
||||
2. When tokens reach threshold, middle turns are summarized
|
||||
3. First 3 and last 4 turns are always protected
|
||||
4. Conversation continues seamlessly after compression
|
||||
|
||||
**When compression triggers:**
|
||||
```
|
||||
📦 Context compression triggered (170,000 tokens ≥ 170,000 threshold)
|
||||
📊 Model context limit: 200,000 tokens (85% = 170,000)
|
||||
🗜️ Summarizing turns 4-15 (12 turns)
|
||||
✅ Compressed: 20 → 9 messages (~45,000 tokens saved)
|
||||
```
|
||||
|
||||
To disable compression:
|
||||
```yaml
|
||||
compression:
|
||||
enabled: false
|
||||
```
|
||||
|
||||
## Quiet Mode
|
||||
|
||||
The CLI runs in "quiet mode" (`HERMES_QUIET=1`), which:
|
||||
|
||||
Reference in New Issue
Block a user