Update agent configuration for maximum tool-calling iterations

- Increased the default maximum tool-calling iterations from 20 to 60 in the CLI configuration and related files, allowing for more complex tasks. - Updated documentation and comments to reflect the new recommended range for iterations, enhancing user guidance. - Implemented backward compatibility for loading max iterations from the root-level configuration, ensuring a smooth transition for existing users. - Adjusted the setup wizard to prompt for the maximum iterations setting, improving user experience during configuration.
2026-02-03 14:48:19 -08:00
parent 17a5efb416
commit 7eac4ee9fe
7 changed files with 246 additions and 17 deletions
--- a/TODO.md
+++ b/TODO.md
@@ -441,4 +441,149 @@

 ---

+## 14. Learning Machine / Dynamic Memory System 🧠
+
+*Inspired by [Dash](~/agent-codebases/dash) - a self-learning data agent.*
+
+**Problem:** Agent starts fresh every session. Valuable learnings from debugging, error patterns, successful approaches, and user preferences are lost.
+
+**Dash's Key Insight:** Separate **Knowledge** (static, curated) from **Learnings** (dynamic, discovered):
+
+| System | What It Stores | How It Evolves |
+|--------|---------------|----------------|
+| **Knowledge** (Skills) | Validated approaches, templates, best practices | Curated by user |
+| **Learnings** | Error patterns, gotchas, discovered fixes | Managed automatically |
+
+**Tools to implement:**
+- [ ] `save_learning(topic, learning, context?)` - Record a discovered pattern
+  ```python
+  save_learning(
+    topic="python-ssl",
+    learning="On Ubuntu 22.04, SSL certificate errors often fixed by: apt install ca-certificates",
+    context="Debugging requests SSL failure"
+  )
+  ```
+- [ ] `search_learnings(query)` - Find relevant past learnings
+  ```python
+  search_learnings("SSL certificate error Python")
+  # Returns: "On Ubuntu 22.04, SSL certificate errors often fixed by..."
+  ```
+
+**User Profile & Memory:**
+- [ ] `user_profile` - Structured facts about user preferences
+  ```yaml
+  # ~/.hermes/user_profile.yaml
+  coding_style:
+    python_formatter: black
+    type_hints: always
+    test_framework: pytest
+  preferences:
+    verbosity: detailed
+    confirm_destructive: true
+  environment:
+    os: linux
+    shell: bash
+    default_python: 3.11
+  ```
+- [ ] `user_memory` - Unstructured observations the agent learns
+  ```yaml
+  # ~/.hermes/user_memory.yaml
+  - "User prefers tabs over spaces despite black's defaults"
+  - "User's main project is ~/work/myapp - a Django app"
+  - "User often works late - don't ask about timezone"
+  ```
+
+**When to learn:**
+- After fixing an error that took multiple attempts
+- When user corrects the agent's approach
+- When a workaround is discovered for a tool limitation
+- When user expresses a preference
+
+**Storage:** Vector database (ChromaDB) or simple YAML with embedding search.
+
+**Files to create:** `tools/learning_tools.py`, `learning/store.py`, `~/.hermes/learnings/`
+
+---
+
+## 15. Layered Context Architecture 📊
+
+*Inspired by Dash's "Six Layers of Context" - grounding responses in multiple sources.*
+
+**Problem:** Context sources are ad-hoc. No clear hierarchy or strategy for what context to include when.
+
+**Proposed Layers for Hermes:**
+
+| Layer | Source | When Loaded | Example |
+|-------|--------|-------------|---------|
+| 1. **Project Context** | `.hermes/context.md` | Auto on cwd | "This is a FastAPI project using PostgreSQL" |
+| 2. **Skills** | `skills/*.md` | On request | "How to set up React project" |
+| 3. **User Profile** | `~/.hermes/user_profile.yaml` | Always | "User prefers pytest, uses black" |
+| 4. **Learnings** | `~/.hermes/learnings/` | Semantic search | "SSL fix for Ubuntu" |
+| 5. **External Knowledge** | Web search, docs | On demand | Current API docs, Stack Overflow |
+| 6. **Runtime Introspection** | Tool calls | Real-time | File contents, terminal output |
+
+**Benefits:**
+- Clear mental model for what context is available
+- Prioritization: local > learned > external
+- Debugging: "Why did agent do X?" → check which layers contributed
+
+**Files to modify:** `run_agent.py` (context loading), new `context/layers.py`
+
+---
+
+## 16. Evaluation System with LLM Grading 📏
+
+*Inspired by Dash's evaluation framework.*
+
+**Problem:** `batch_runner.py` runs test cases but lacks quality assessment.
+
+**Dash's Approach:**
+- **String matching** (default) - Check if expected strings appear
+- **LLM grader** (-g flag) - GPT evaluates response quality
+- **Result comparison** (-r flag) - Compare against golden output
+
+**Implementation for Hermes:**
+
+- [ ] **Test case format:**
+  ```python
+  TestCase(
+    name="create_python_project",
+    prompt="Create a new Python project with FastAPI and tests",
+    expected_strings=["requirements.txt", "main.py", "test_"],  # Basic check
+    golden_actions=["write:main.py", "write:requirements.txt", "terminal:pip install"],
+    grader_criteria="Should create complete project structure with working code"
+  )
+  ```
+
+- [ ] **LLM grader mode:**
+  ```python
+  def grade_response(response: str, criteria: str) -> Grade:
+      """Use GPT to evaluate response quality."""
+      prompt = f"""
+      Evaluate this agent response against the criteria.
+      Criteria: {criteria}
+      Response: {response}
+      
+      Score (1-5) and explain why.
+      """
+      # Returns: Grade(score=4, explanation="Created all files but tests are minimal")
+  ```
+
+- [ ] **Action comparison mode:**
+  - Record tool calls made during test
+  - Compare against expected actions
+  - "Expected terminal call to pip install, got npm install"
+
+- [ ] **CLI flags:**
+  ```bash
+  python batch_runner.py eval test_cases.yaml       # String matching
+  python batch_runner.py eval test_cases.yaml -g    # + LLM grading
+  python batch_runner.py eval test_cases.yaml -r    # + Result comparison
+  python batch_runner.py eval test_cases.yaml -v    # Verbose (show responses)
+  ```
+
+**Files to modify:** `batch_runner.py`, new `evals/test_cases.py`, new `evals/grader.py`
+
+---
+
 *Last updated: $(date +%Y-%m-%d)* 🤖
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -146,8 +146,10 @@ compression:
 # Agent Behavior
 # =============================================================================
 agent:
-  # Maximum conversation turns before stopping
-  max_turns: 20
+  # Maximum tool-calling iterations per conversation
+  # Higher = more room for complex tasks, but costs more tokens
+  # Recommended: 20-30 for focused tasks, 50-100 for open exploration
+  max_turns: 60
  
  # Enable verbose logging
  verbose: false
--- a/cli.py
+++ b/cli.py
@@ -95,7 +95,7 @@ def load_cli_config() -> Dict[str, Any]:
            "summary_model": "google/gemini-2.0-flash-001",  # Fast/cheap model for summaries
        },
        "agent": {
-            "max_turns": 20,
+            "max_turns": 60,  # Default max tool-calling iterations
            "verbose": False,
            "system_prompt": "",
            "personalities": {
@@ -145,6 +145,10 @@ def load_cli_config() -> Dict[str, Any]:
                        defaults[key].update(file_config[key])
                    else:
                        defaults[key] = file_config[key]
+            
+            # Handle root-level max_turns (backwards compat) - copy to agent.max_turns
+            if "max_turns" in file_config and "agent" not in file_config:
+                defaults["agent"]["max_turns"] = file_config["max_turns"]
        except Exception as e:
            print(f"[Warning] Failed to load cli-config.yaml: {e}")
    
@@ -547,7 +551,7 @@ class HermesCLI:
        toolsets: List[str] = None,
        api_key: str = None,
        base_url: str = None,
-        max_turns: int = 20,
+        max_turns: int = 60,
        verbose: bool = False,
        compact: bool = False,
    ):
@@ -559,7 +563,7 @@ class HermesCLI:
            toolsets: List of toolsets to enable (default: all)
            api_key: API key (default: from environment)
            base_url: API base URL (default: OpenRouter)
-            max_turns: Maximum conversation turns
+            max_turns: Maximum tool-calling iterations (default: 60)
            verbose: Enable verbose logging
            compact: Use compact display mode
        """
@@ -577,7 +581,17 @@ class HermesCLI:
        
        # API key: custom endpoint (OPENAI_API_KEY) takes precedence over OpenRouter
        self.api_key = api_key or os.getenv("OPENAI_API_KEY") or os.getenv("OPENROUTER_API_KEY")
-        self.max_turns = max_turns if max_turns != 20 else CLI_CONFIG["agent"].get("max_turns", 20)
+        # Max turns priority: CLI arg > env var > config file (agent.max_turns or root max_turns) > default
+        if max_turns != 60:  # CLI arg was explicitly set
+            self.max_turns = max_turns
+        elif os.getenv("HERMES_MAX_ITERATIONS"):
+            self.max_turns = int(os.getenv("HERMES_MAX_ITERATIONS"))
+        elif CLI_CONFIG["agent"].get("max_turns"):
+            self.max_turns = CLI_CONFIG["agent"]["max_turns"]
+        elif CLI_CONFIG.get("max_turns"):  # Backwards compat: root-level max_turns
+            self.max_turns = CLI_CONFIG["max_turns"]
+        else:
+            self.max_turns = 60
        
        # Parse and validate toolsets
        self.enabled_toolsets = toolsets
@@ -1377,7 +1391,7 @@ def main(
    model: str = None,
    api_key: str = None,
    base_url: str = None,
-    max_turns: int = 20,
+    max_turns: int = 60,
    verbose: bool = False,
    compact: bool = False,
    list_tools: bool = False,
@@ -1396,7 +1410,7 @@ def main(
        model: Model to use (default: anthropic/claude-opus-4-20250514)
        api_key: API key for authentication
        base_url: Base URL for the API
-        max_turns: Maximum conversation turns (default: 20)
+        max_turns: Maximum tool-calling iterations (default: 60)
        verbose: Enable verbose logging
        compact: Use compact display mode
        list_tools: List available tools and exit
--- a/gateway/run.py
+++ b/gateway/run.py
@@ -360,8 +360,12 @@ class GatewayRunner:
        toolset = toolset_map.get(source.platform, "hermes-telegram")
        
        def run_sync():
+            # Read from env var or use default (same as CLI)
+            max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "60"))
+            
            agent = AIAgent(
                model=os.getenv("HERMES_MODEL", "anthropic/claude-sonnet-4"),
+                max_iterations=max_iterations,
                quiet_mode=True,
                enabled_toolsets=[toolset],
                ephemeral_system_prompt=context_prompt,
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@@ -201,6 +201,13 @@ OPTIONAL_ENV_VARS = {
        "url": None,
        "password": True,
    },
+    # Agent configuration
+    "HERMES_MAX_ITERATIONS": {
+        "description": "Maximum tool-calling iterations per conversation (default: 25 for messaging, 10 for CLI)",
+        "prompt": "Max iterations",
+        "url": None,
+        "password": False,
+    },
 }


--- a/hermes_cli/setup.py
+++ b/hermes_cli/setup.py
@@ -693,7 +693,28 @@ def run_setup_wizard(args):
    # else: Keep current (selected_backend is None)
    
    # =========================================================================
-    # Step 5: Context Compression
+    # Step 5: Agent Settings
+    # =========================================================================
+    print_header("Agent Settings")
+    
+    # Max iterations
+    current_max = get_env_value('HERMES_MAX_ITERATIONS') or '60'
+    print_info("Maximum tool-calling iterations per conversation.")
+    print_info("Higher = more complex tasks, but costs more tokens.")
+    print_info("Recommended: 30-60 for most tasks, 100+ for open exploration.")
+    
+    max_iter_str = prompt("Max iterations", current_max)
+    try:
+        max_iter = int(max_iter_str)
+        if max_iter > 0:
+            save_env_value("HERMES_MAX_ITERATIONS", str(max_iter))
+            config['max_turns'] = max_iter
+            print_success(f"Max iterations set to {max_iter}")
+    except ValueError:
+        print_warning("Invalid number, keeping current value")
+    
+    # =========================================================================
+    # Step 6: Context Compression
    # =========================================================================
    print_header("Context Compression")
    print_info("Automatically summarize old messages when context gets too long.")
@@ -718,7 +739,7 @@ def run_setup_wizard(args):
        config.setdefault('compression', {})['enabled'] = False
    
    # =========================================================================
-    # Step 6: Messaging Platforms (Optional)
+    # Step 7: Messaging Platforms (Optional)
    # =========================================================================
    print_header("Messaging Platforms (Optional)")
    print_info("Connect to messaging platforms to chat with Hermes from anywhere.")
@@ -812,7 +833,7 @@ def run_setup_wizard(args):
                    print_success("Discord allowlist configured")
    
    # =========================================================================
-    # Step 7: Additional Tools (Optional)
+    # Step 8: Additional Tools (Optional)
    # =========================================================================
    print_header("Additional Tools (Optional)")
    print_info("These tools extend the agent's capabilities.")
--- a/run_agent.py
+++ b/run_agent.py
@@ -585,7 +585,7 @@ class AIAgent:
        base_url: str = None,
        api_key: str = None,
        model: str = "anthropic/claude-sonnet-4-20250514",  # OpenRouter format
-        max_iterations: int = 10,
+        max_iterations: int = 60,  # Default tool-calling iterations
        tool_delay: float = 1.0,
        enabled_toolsets: List[str] = None,
        disabled_toolsets: List[str] = None,
@@ -1966,11 +1966,47 @@ class AIAgent:
                    final_response = f"I apologize, but I encountered repeated errors: {error_msg}"
                    break
        
-        # Handle max iterations reached
-        if api_call_count >= self.max_iterations:
-            print(f"⚠️  Reached maximum iterations ({self.max_iterations}). Stopping to prevent infinite loop.")
-            if final_response is None:
-                final_response = "I've reached the maximum number of iterations. Here's what I found so far."
+        # Handle max iterations reached - ask model to summarize what it found
+        if api_call_count >= self.max_iterations and final_response is None:
+            print(f"⚠️  Reached maximum iterations ({self.max_iterations}). Requesting summary...")
+            
+            # Inject a user message asking for a summary
+            summary_request = (
+                "You've reached the maximum number of tool-calling iterations allowed. "
+                "Please provide a final response summarizing what you've found and accomplished so far, "
+                "without calling any more tools."
+            )
+            messages.append({"role": "user", "content": summary_request})
+            
+            # Make one final API call WITHOUT tools to force a text response
+            try:
+                api_messages = messages.copy()
+                if self.ephemeral_system_prompt:
+                    api_messages = [{"role": "system", "content": self.ephemeral_system_prompt}] + api_messages
+                
+                summary_response = self.client.chat.completions.create(
+                    model=self.model,
+                    messages=api_messages,
+                    # No tools parameter - forces text response
+                    extra_headers=self.extra_headers,
+                    extra_body=self.extra_body,
+                )
+                
+                if summary_response.choices and summary_response.choices[0].message.content:
+                    final_response = summary_response.choices[0].message.content
+                    # Strip think blocks from final response
+                    if "<think>" in final_response:
+                        import re
+                        final_response = re.sub(r'<think>.*?</think>\s*', '', final_response, flags=re.DOTALL).strip()
+                    
+                    # Add to messages for session continuity
+                    messages.append({"role": "assistant", "content": final_response})
+                else:
+                    final_response = "I reached the iteration limit and couldn't generate a summary."
+                    
+            except Exception as e:
+                logging.warning(f"Failed to get summary response: {e}")
+                final_response = f"I reached the maximum iterations ({self.max_iterations}) but couldn't summarize. Error: {str(e)}"
        
        # Determine if conversation completed successfully
        completed = final_response is not None and api_call_count < self.max_iterations