Add context compression feature for long conversations
- Implemented automatic context compression to manage long conversations that approach the model's context limit. - Configured the feature to summarize middle turns while protecting the first three and last four turns, ensuring important context is retained. - Added configuration options in `cli-config.yaml` and environment variables for enabling/disabling compression and setting thresholds. - Updated documentation in `README.md`, `cli.md`, and `.env.example` to explain the context compression functionality and its configuration. - Enhanced the `cli.py` to load compression settings into environment variables, ensuring seamless integration with the CLI. - Completed the implementation of context compression as outlined in the TODO list, marking it as a significant enhancement to conversation management.
This commit is contained in:
10
.env.example
10
.env.example
@@ -154,3 +154,13 @@ WEB_TOOLS_DEBUG=false
|
||||
VISION_TOOLS_DEBUG=false
|
||||
MOA_TOOLS_DEBUG=false
|
||||
IMAGE_TOOLS_DEBUG=false
|
||||
|
||||
# =============================================================================
|
||||
# CONTEXT COMPRESSION (Auto-shrinks long conversations)
|
||||
# =============================================================================
|
||||
# When conversation approaches model's context limit, middle turns are
|
||||
# automatically summarized to free up space.
|
||||
#
|
||||
# CONTEXT_COMPRESSION_ENABLED=true # Enable auto-compression (default: true)
|
||||
# CONTEXT_COMPRESSION_THRESHOLD=0.85 # Compress at 85% of context limit
|
||||
# CONTEXT_COMPRESSION_MODEL=google/gemini-2.0-flash-001 # Fast model for summaries
|
||||
|
||||
Reference in New Issue
Block a user