Update agent configuration for maximum tool-calling iterations

- Increased the default maximum tool-calling iterations from 20 to 60 in the CLI configuration and related files, allowing for more complex tasks.
- Updated documentation and comments to reflect the new recommended range for iterations, enhancing user guidance.
- Implemented backward compatibility for loading max iterations from the root-level configuration, ensuring a smooth transition for existing users.
- Adjusted the setup wizard to prompt for the maximum iterations setting, improving user experience during configuration.
This commit is contained in:
teknium1
2026-02-03 14:48:19 -08:00
parent 17a5efb416
commit 7eac4ee9fe
7 changed files with 246 additions and 17 deletions

145
TODO.md
View File

@@ -441,4 +441,149 @@
---
## 14. Learning Machine / Dynamic Memory System 🧠
*Inspired by [Dash](~/agent-codebases/dash) - a self-learning data agent.*
**Problem:** Agent starts fresh every session. Valuable learnings from debugging, error patterns, successful approaches, and user preferences are lost.
**Dash's Key Insight:** Separate **Knowledge** (static, curated) from **Learnings** (dynamic, discovered):
| System | What It Stores | How It Evolves |
|--------|---------------|----------------|
| **Knowledge** (Skills) | Validated approaches, templates, best practices | Curated by user |
| **Learnings** | Error patterns, gotchas, discovered fixes | Managed automatically |
**Tools to implement:**
- [ ] `save_learning(topic, learning, context?)` - Record a discovered pattern
```python
save_learning(
topic="python-ssl",
learning="On Ubuntu 22.04, SSL certificate errors often fixed by: apt install ca-certificates",
context="Debugging requests SSL failure"
)
```
- [ ] `search_learnings(query)` - Find relevant past learnings
```python
search_learnings("SSL certificate error Python")
# Returns: "On Ubuntu 22.04, SSL certificate errors often fixed by..."
```
**User Profile & Memory:**
- [ ] `user_profile` - Structured facts about user preferences
```yaml
# ~/.hermes/user_profile.yaml
coding_style:
python_formatter: black
type_hints: always
test_framework: pytest
preferences:
verbosity: detailed
confirm_destructive: true
environment:
os: linux
shell: bash
default_python: 3.11
```
- [ ] `user_memory` - Unstructured observations the agent learns
```yaml
# ~/.hermes/user_memory.yaml
- "User prefers tabs over spaces despite black's defaults"
- "User's main project is ~/work/myapp - a Django app"
- "User often works late - don't ask about timezone"
```
**When to learn:**
- After fixing an error that took multiple attempts
- When user corrects the agent's approach
- When a workaround is discovered for a tool limitation
- When user expresses a preference
**Storage:** Vector database (ChromaDB) or simple YAML with embedding search.
**Files to create:** `tools/learning_tools.py`, `learning/store.py`, `~/.hermes/learnings/`
---
## 15. Layered Context Architecture 📊
*Inspired by Dash's "Six Layers of Context" - grounding responses in multiple sources.*
**Problem:** Context sources are ad-hoc. No clear hierarchy or strategy for what context to include when.
**Proposed Layers for Hermes:**
| Layer | Source | When Loaded | Example |
|-------|--------|-------------|---------|
| 1. **Project Context** | `.hermes/context.md` | Auto on cwd | "This is a FastAPI project using PostgreSQL" |
| 2. **Skills** | `skills/*.md` | On request | "How to set up React project" |
| 3. **User Profile** | `~/.hermes/user_profile.yaml` | Always | "User prefers pytest, uses black" |
| 4. **Learnings** | `~/.hermes/learnings/` | Semantic search | "SSL fix for Ubuntu" |
| 5. **External Knowledge** | Web search, docs | On demand | Current API docs, Stack Overflow |
| 6. **Runtime Introspection** | Tool calls | Real-time | File contents, terminal output |
**Benefits:**
- Clear mental model for what context is available
- Prioritization: local > learned > external
- Debugging: "Why did agent do X?" → check which layers contributed
**Files to modify:** `run_agent.py` (context loading), new `context/layers.py`
---
## 16. Evaluation System with LLM Grading 📏
*Inspired by Dash's evaluation framework.*
**Problem:** `batch_runner.py` runs test cases but lacks quality assessment.
**Dash's Approach:**
- **String matching** (default) - Check if expected strings appear
- **LLM grader** (-g flag) - GPT evaluates response quality
- **Result comparison** (-r flag) - Compare against golden output
**Implementation for Hermes:**
- [ ] **Test case format:**
```python
TestCase(
name="create_python_project",
prompt="Create a new Python project with FastAPI and tests",
expected_strings=["requirements.txt", "main.py", "test_"], # Basic check
golden_actions=["write:main.py", "write:requirements.txt", "terminal:pip install"],
grader_criteria="Should create complete project structure with working code"
)
```
- [ ] **LLM grader mode:**
```python
def grade_response(response: str, criteria: str) -> Grade:
"""Use GPT to evaluate response quality."""
prompt = f"""
Evaluate this agent response against the criteria.
Criteria: {criteria}
Response: {response}
Score (1-5) and explain why.
"""
# Returns: Grade(score=4, explanation="Created all files but tests are minimal")
```
- [ ] **Action comparison mode:**
- Record tool calls made during test
- Compare against expected actions
- "Expected terminal call to pip install, got npm install"
- [ ] **CLI flags:**
```bash
python batch_runner.py eval test_cases.yaml # String matching
python batch_runner.py eval test_cases.yaml -g # + LLM grading
python batch_runner.py eval test_cases.yaml -r # + Result comparison
python batch_runner.py eval test_cases.yaml -v # Verbose (show responses)
```
**Files to modify:** `batch_runner.py`, new `evals/test_cases.py`, new `evals/grader.py`
---
*Last updated: $(date +%Y-%m-%d)* 🤖

View File

@@ -146,8 +146,10 @@ compression:
# Agent Behavior
# =============================================================================
agent:
# Maximum conversation turns before stopping
max_turns: 20
# Maximum tool-calling iterations per conversation
# Higher = more room for complex tasks, but costs more tokens
# Recommended: 20-30 for focused tasks, 50-100 for open exploration
max_turns: 60
# Enable verbose logging
verbose: false

26
cli.py
View File

@@ -95,7 +95,7 @@ def load_cli_config() -> Dict[str, Any]:
"summary_model": "google/gemini-2.0-flash-001", # Fast/cheap model for summaries
},
"agent": {
"max_turns": 20,
"max_turns": 60, # Default max tool-calling iterations
"verbose": False,
"system_prompt": "",
"personalities": {
@@ -145,6 +145,10 @@ def load_cli_config() -> Dict[str, Any]:
defaults[key].update(file_config[key])
else:
defaults[key] = file_config[key]
# Handle root-level max_turns (backwards compat) - copy to agent.max_turns
if "max_turns" in file_config and "agent" not in file_config:
defaults["agent"]["max_turns"] = file_config["max_turns"]
except Exception as e:
print(f"[Warning] Failed to load cli-config.yaml: {e}")
@@ -547,7 +551,7 @@ class HermesCLI:
toolsets: List[str] = None,
api_key: str = None,
base_url: str = None,
max_turns: int = 20,
max_turns: int = 60,
verbose: bool = False,
compact: bool = False,
):
@@ -559,7 +563,7 @@ class HermesCLI:
toolsets: List of toolsets to enable (default: all)
api_key: API key (default: from environment)
base_url: API base URL (default: OpenRouter)
max_turns: Maximum conversation turns
max_turns: Maximum tool-calling iterations (default: 60)
verbose: Enable verbose logging
compact: Use compact display mode
"""
@@ -577,7 +581,17 @@ class HermesCLI:
# API key: custom endpoint (OPENAI_API_KEY) takes precedence over OpenRouter
self.api_key = api_key or os.getenv("OPENAI_API_KEY") or os.getenv("OPENROUTER_API_KEY")
self.max_turns = max_turns if max_turns != 20 else CLI_CONFIG["agent"].get("max_turns", 20)
# Max turns priority: CLI arg > env var > config file (agent.max_turns or root max_turns) > default
if max_turns != 60: # CLI arg was explicitly set
self.max_turns = max_turns
elif os.getenv("HERMES_MAX_ITERATIONS"):
self.max_turns = int(os.getenv("HERMES_MAX_ITERATIONS"))
elif CLI_CONFIG["agent"].get("max_turns"):
self.max_turns = CLI_CONFIG["agent"]["max_turns"]
elif CLI_CONFIG.get("max_turns"): # Backwards compat: root-level max_turns
self.max_turns = CLI_CONFIG["max_turns"]
else:
self.max_turns = 60
# Parse and validate toolsets
self.enabled_toolsets = toolsets
@@ -1377,7 +1391,7 @@ def main(
model: str = None,
api_key: str = None,
base_url: str = None,
max_turns: int = 20,
max_turns: int = 60,
verbose: bool = False,
compact: bool = False,
list_tools: bool = False,
@@ -1396,7 +1410,7 @@ def main(
model: Model to use (default: anthropic/claude-opus-4-20250514)
api_key: API key for authentication
base_url: Base URL for the API
max_turns: Maximum conversation turns (default: 20)
max_turns: Maximum tool-calling iterations (default: 60)
verbose: Enable verbose logging
compact: Use compact display mode
list_tools: List available tools and exit

View File

@@ -360,8 +360,12 @@ class GatewayRunner:
toolset = toolset_map.get(source.platform, "hermes-telegram")
def run_sync():
# Read from env var or use default (same as CLI)
max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "60"))
agent = AIAgent(
model=os.getenv("HERMES_MODEL", "anthropic/claude-sonnet-4"),
max_iterations=max_iterations,
quiet_mode=True,
enabled_toolsets=[toolset],
ephemeral_system_prompt=context_prompt,

View File

@@ -201,6 +201,13 @@ OPTIONAL_ENV_VARS = {
"url": None,
"password": True,
},
# Agent configuration
"HERMES_MAX_ITERATIONS": {
"description": "Maximum tool-calling iterations per conversation (default: 25 for messaging, 10 for CLI)",
"prompt": "Max iterations",
"url": None,
"password": False,
},
}

View File

@@ -693,7 +693,28 @@ def run_setup_wizard(args):
# else: Keep current (selected_backend is None)
# =========================================================================
# Step 5: Context Compression
# Step 5: Agent Settings
# =========================================================================
print_header("Agent Settings")
# Max iterations
current_max = get_env_value('HERMES_MAX_ITERATIONS') or '60'
print_info("Maximum tool-calling iterations per conversation.")
print_info("Higher = more complex tasks, but costs more tokens.")
print_info("Recommended: 30-60 for most tasks, 100+ for open exploration.")
max_iter_str = prompt("Max iterations", current_max)
try:
max_iter = int(max_iter_str)
if max_iter > 0:
save_env_value("HERMES_MAX_ITERATIONS", str(max_iter))
config['max_turns'] = max_iter
print_success(f"Max iterations set to {max_iter}")
except ValueError:
print_warning("Invalid number, keeping current value")
# =========================================================================
# Step 6: Context Compression
# =========================================================================
print_header("Context Compression")
print_info("Automatically summarize old messages when context gets too long.")
@@ -718,7 +739,7 @@ def run_setup_wizard(args):
config.setdefault('compression', {})['enabled'] = False
# =========================================================================
# Step 6: Messaging Platforms (Optional)
# Step 7: Messaging Platforms (Optional)
# =========================================================================
print_header("Messaging Platforms (Optional)")
print_info("Connect to messaging platforms to chat with Hermes from anywhere.")
@@ -812,7 +833,7 @@ def run_setup_wizard(args):
print_success("Discord allowlist configured")
# =========================================================================
# Step 7: Additional Tools (Optional)
# Step 8: Additional Tools (Optional)
# =========================================================================
print_header("Additional Tools (Optional)")
print_info("These tools extend the agent's capabilities.")

View File

@@ -585,7 +585,7 @@ class AIAgent:
base_url: str = None,
api_key: str = None,
model: str = "anthropic/claude-sonnet-4-20250514", # OpenRouter format
max_iterations: int = 10,
max_iterations: int = 60, # Default tool-calling iterations
tool_delay: float = 1.0,
enabled_toolsets: List[str] = None,
disabled_toolsets: List[str] = None,
@@ -1966,11 +1966,47 @@ class AIAgent:
final_response = f"I apologize, but I encountered repeated errors: {error_msg}"
break
# Handle max iterations reached
if api_call_count >= self.max_iterations:
print(f"⚠️ Reached maximum iterations ({self.max_iterations}). Stopping to prevent infinite loop.")
if final_response is None:
final_response = "I've reached the maximum number of iterations. Here's what I found so far."
# Handle max iterations reached - ask model to summarize what it found
if api_call_count >= self.max_iterations and final_response is None:
print(f"⚠️ Reached maximum iterations ({self.max_iterations}). Requesting summary...")
# Inject a user message asking for a summary
summary_request = (
"You've reached the maximum number of tool-calling iterations allowed. "
"Please provide a final response summarizing what you've found and accomplished so far, "
"without calling any more tools."
)
messages.append({"role": "user", "content": summary_request})
# Make one final API call WITHOUT tools to force a text response
try:
api_messages = messages.copy()
if self.ephemeral_system_prompt:
api_messages = [{"role": "system", "content": self.ephemeral_system_prompt}] + api_messages
summary_response = self.client.chat.completions.create(
model=self.model,
messages=api_messages,
# No tools parameter - forces text response
extra_headers=self.extra_headers,
extra_body=self.extra_body,
)
if summary_response.choices and summary_response.choices[0].message.content:
final_response = summary_response.choices[0].message.content
# Strip think blocks from final response
if "<think>" in final_response:
import re
final_response = re.sub(r'<think>.*?</think>\s*', '', final_response, flags=re.DOTALL).strip()
# Add to messages for session continuity
messages.append({"role": "assistant", "content": final_response})
else:
final_response = "I reached the iteration limit and couldn't generate a summary."
except Exception as e:
logging.warning(f"Failed to get summary response: {e}")
final_response = f"I reached the maximum iterations ({self.max_iterations}) but couldn't summarize. Error: {str(e)}"
# Determine if conversation completed successfully
completed = final_response is not None and api_call_count < self.max_iterations