diff --git a/TODO.md b/TODO.md index dc116539b..454c7ff4b 100644 --- a/TODO.md +++ b/TODO.md @@ -441,4 +441,149 @@ --- +## 14. Learning Machine / Dynamic Memory System 🧠 + +*Inspired by [Dash](~/agent-codebases/dash) - a self-learning data agent.* + +**Problem:** Agent starts fresh every session. Valuable learnings from debugging, error patterns, successful approaches, and user preferences are lost. + +**Dash's Key Insight:** Separate **Knowledge** (static, curated) from **Learnings** (dynamic, discovered): + +| System | What It Stores | How It Evolves | +|--------|---------------|----------------| +| **Knowledge** (Skills) | Validated approaches, templates, best practices | Curated by user | +| **Learnings** | Error patterns, gotchas, discovered fixes | Managed automatically | + +**Tools to implement:** +- [ ] `save_learning(topic, learning, context?)` - Record a discovered pattern + ```python + save_learning( + topic="python-ssl", + learning="On Ubuntu 22.04, SSL certificate errors often fixed by: apt install ca-certificates", + context="Debugging requests SSL failure" + ) + ``` +- [ ] `search_learnings(query)` - Find relevant past learnings + ```python + search_learnings("SSL certificate error Python") + # Returns: "On Ubuntu 22.04, SSL certificate errors often fixed by..." + ``` + +**User Profile & Memory:** +- [ ] `user_profile` - Structured facts about user preferences + ```yaml + # ~/.hermes/user_profile.yaml + coding_style: + python_formatter: black + type_hints: always + test_framework: pytest + preferences: + verbosity: detailed + confirm_destructive: true + environment: + os: linux + shell: bash + default_python: 3.11 + ``` +- [ ] `user_memory` - Unstructured observations the agent learns + ```yaml + # ~/.hermes/user_memory.yaml + - "User prefers tabs over spaces despite black's defaults" + - "User's main project is ~/work/myapp - a Django app" + - "User often works late - don't ask about timezone" + ``` + +**When to learn:** +- After fixing an error that took multiple attempts +- When user corrects the agent's approach +- When a workaround is discovered for a tool limitation +- When user expresses a preference + +**Storage:** Vector database (ChromaDB) or simple YAML with embedding search. + +**Files to create:** `tools/learning_tools.py`, `learning/store.py`, `~/.hermes/learnings/` + +--- + +## 15. Layered Context Architecture 📊 + +*Inspired by Dash's "Six Layers of Context" - grounding responses in multiple sources.* + +**Problem:** Context sources are ad-hoc. No clear hierarchy or strategy for what context to include when. + +**Proposed Layers for Hermes:** + +| Layer | Source | When Loaded | Example | +|-------|--------|-------------|---------| +| 1. **Project Context** | `.hermes/context.md` | Auto on cwd | "This is a FastAPI project using PostgreSQL" | +| 2. **Skills** | `skills/*.md` | On request | "How to set up React project" | +| 3. **User Profile** | `~/.hermes/user_profile.yaml` | Always | "User prefers pytest, uses black" | +| 4. **Learnings** | `~/.hermes/learnings/` | Semantic search | "SSL fix for Ubuntu" | +| 5. **External Knowledge** | Web search, docs | On demand | Current API docs, Stack Overflow | +| 6. **Runtime Introspection** | Tool calls | Real-time | File contents, terminal output | + +**Benefits:** +- Clear mental model for what context is available +- Prioritization: local > learned > external +- Debugging: "Why did agent do X?" → check which layers contributed + +**Files to modify:** `run_agent.py` (context loading), new `context/layers.py` + +--- + +## 16. Evaluation System with LLM Grading 📏 + +*Inspired by Dash's evaluation framework.* + +**Problem:** `batch_runner.py` runs test cases but lacks quality assessment. + +**Dash's Approach:** +- **String matching** (default) - Check if expected strings appear +- **LLM grader** (-g flag) - GPT evaluates response quality +- **Result comparison** (-r flag) - Compare against golden output + +**Implementation for Hermes:** + +- [ ] **Test case format:** + ```python + TestCase( + name="create_python_project", + prompt="Create a new Python project with FastAPI and tests", + expected_strings=["requirements.txt", "main.py", "test_"], # Basic check + golden_actions=["write:main.py", "write:requirements.txt", "terminal:pip install"], + grader_criteria="Should create complete project structure with working code" + ) + ``` + +- [ ] **LLM grader mode:** + ```python + def grade_response(response: str, criteria: str) -> Grade: + """Use GPT to evaluate response quality.""" + prompt = f""" + Evaluate this agent response against the criteria. + Criteria: {criteria} + Response: {response} + + Score (1-5) and explain why. + """ + # Returns: Grade(score=4, explanation="Created all files but tests are minimal") + ``` + +- [ ] **Action comparison mode:** + - Record tool calls made during test + - Compare against expected actions + - "Expected terminal call to pip install, got npm install" + +- [ ] **CLI flags:** + ```bash + python batch_runner.py eval test_cases.yaml # String matching + python batch_runner.py eval test_cases.yaml -g # + LLM grading + python batch_runner.py eval test_cases.yaml -r # + Result comparison + python batch_runner.py eval test_cases.yaml -v # Verbose (show responses) + ``` + +**Files to modify:** `batch_runner.py`, new `evals/test_cases.py`, new `evals/grader.py` + +--- + *Last updated: $(date +%Y-%m-%d)* 🤖 diff --git a/cli-config.yaml.example b/cli-config.yaml.example index 5b4f16ada..63e4f7555 100644 --- a/cli-config.yaml.example +++ b/cli-config.yaml.example @@ -146,8 +146,10 @@ compression: # Agent Behavior # ============================================================================= agent: - # Maximum conversation turns before stopping - max_turns: 20 + # Maximum tool-calling iterations per conversation + # Higher = more room for complex tasks, but costs more tokens + # Recommended: 20-30 for focused tasks, 50-100 for open exploration + max_turns: 60 # Enable verbose logging verbose: false diff --git a/cli.py b/cli.py index 15d1f8ed6..9718ebea0 100755 --- a/cli.py +++ b/cli.py @@ -95,7 +95,7 @@ def load_cli_config() -> Dict[str, Any]: "summary_model": "google/gemini-2.0-flash-001", # Fast/cheap model for summaries }, "agent": { - "max_turns": 20, + "max_turns": 60, # Default max tool-calling iterations "verbose": False, "system_prompt": "", "personalities": { @@ -145,6 +145,10 @@ def load_cli_config() -> Dict[str, Any]: defaults[key].update(file_config[key]) else: defaults[key] = file_config[key] + + # Handle root-level max_turns (backwards compat) - copy to agent.max_turns + if "max_turns" in file_config and "agent" not in file_config: + defaults["agent"]["max_turns"] = file_config["max_turns"] except Exception as e: print(f"[Warning] Failed to load cli-config.yaml: {e}") @@ -547,7 +551,7 @@ class HermesCLI: toolsets: List[str] = None, api_key: str = None, base_url: str = None, - max_turns: int = 20, + max_turns: int = 60, verbose: bool = False, compact: bool = False, ): @@ -559,7 +563,7 @@ class HermesCLI: toolsets: List of toolsets to enable (default: all) api_key: API key (default: from environment) base_url: API base URL (default: OpenRouter) - max_turns: Maximum conversation turns + max_turns: Maximum tool-calling iterations (default: 60) verbose: Enable verbose logging compact: Use compact display mode """ @@ -577,7 +581,17 @@ class HermesCLI: # API key: custom endpoint (OPENAI_API_KEY) takes precedence over OpenRouter self.api_key = api_key or os.getenv("OPENAI_API_KEY") or os.getenv("OPENROUTER_API_KEY") - self.max_turns = max_turns if max_turns != 20 else CLI_CONFIG["agent"].get("max_turns", 20) + # Max turns priority: CLI arg > env var > config file (agent.max_turns or root max_turns) > default + if max_turns != 60: # CLI arg was explicitly set + self.max_turns = max_turns + elif os.getenv("HERMES_MAX_ITERATIONS"): + self.max_turns = int(os.getenv("HERMES_MAX_ITERATIONS")) + elif CLI_CONFIG["agent"].get("max_turns"): + self.max_turns = CLI_CONFIG["agent"]["max_turns"] + elif CLI_CONFIG.get("max_turns"): # Backwards compat: root-level max_turns + self.max_turns = CLI_CONFIG["max_turns"] + else: + self.max_turns = 60 # Parse and validate toolsets self.enabled_toolsets = toolsets @@ -1377,7 +1391,7 @@ def main( model: str = None, api_key: str = None, base_url: str = None, - max_turns: int = 20, + max_turns: int = 60, verbose: bool = False, compact: bool = False, list_tools: bool = False, @@ -1396,7 +1410,7 @@ def main( model: Model to use (default: anthropic/claude-opus-4-20250514) api_key: API key for authentication base_url: Base URL for the API - max_turns: Maximum conversation turns (default: 20) + max_turns: Maximum tool-calling iterations (default: 60) verbose: Enable verbose logging compact: Use compact display mode list_tools: List available tools and exit diff --git a/gateway/run.py b/gateway/run.py index de7dd8447..3bb32239f 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -360,8 +360,12 @@ class GatewayRunner: toolset = toolset_map.get(source.platform, "hermes-telegram") def run_sync(): + # Read from env var or use default (same as CLI) + max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "60")) + agent = AIAgent( model=os.getenv("HERMES_MODEL", "anthropic/claude-sonnet-4"), + max_iterations=max_iterations, quiet_mode=True, enabled_toolsets=[toolset], ephemeral_system_prompt=context_prompt, diff --git a/hermes_cli/config.py b/hermes_cli/config.py index e24e2a3a4..2a5833fd4 100644 --- a/hermes_cli/config.py +++ b/hermes_cli/config.py @@ -201,6 +201,13 @@ OPTIONAL_ENV_VARS = { "url": None, "password": True, }, + # Agent configuration + "HERMES_MAX_ITERATIONS": { + "description": "Maximum tool-calling iterations per conversation (default: 25 for messaging, 10 for CLI)", + "prompt": "Max iterations", + "url": None, + "password": False, + }, } diff --git a/hermes_cli/setup.py b/hermes_cli/setup.py index 98420725e..fddf8e583 100644 --- a/hermes_cli/setup.py +++ b/hermes_cli/setup.py @@ -693,7 +693,28 @@ def run_setup_wizard(args): # else: Keep current (selected_backend is None) # ========================================================================= - # Step 5: Context Compression + # Step 5: Agent Settings + # ========================================================================= + print_header("Agent Settings") + + # Max iterations + current_max = get_env_value('HERMES_MAX_ITERATIONS') or '60' + print_info("Maximum tool-calling iterations per conversation.") + print_info("Higher = more complex tasks, but costs more tokens.") + print_info("Recommended: 30-60 for most tasks, 100+ for open exploration.") + + max_iter_str = prompt("Max iterations", current_max) + try: + max_iter = int(max_iter_str) + if max_iter > 0: + save_env_value("HERMES_MAX_ITERATIONS", str(max_iter)) + config['max_turns'] = max_iter + print_success(f"Max iterations set to {max_iter}") + except ValueError: + print_warning("Invalid number, keeping current value") + + # ========================================================================= + # Step 6: Context Compression # ========================================================================= print_header("Context Compression") print_info("Automatically summarize old messages when context gets too long.") @@ -718,7 +739,7 @@ def run_setup_wizard(args): config.setdefault('compression', {})['enabled'] = False # ========================================================================= - # Step 6: Messaging Platforms (Optional) + # Step 7: Messaging Platforms (Optional) # ========================================================================= print_header("Messaging Platforms (Optional)") print_info("Connect to messaging platforms to chat with Hermes from anywhere.") @@ -812,7 +833,7 @@ def run_setup_wizard(args): print_success("Discord allowlist configured") # ========================================================================= - # Step 7: Additional Tools (Optional) + # Step 8: Additional Tools (Optional) # ========================================================================= print_header("Additional Tools (Optional)") print_info("These tools extend the agent's capabilities.") diff --git a/run_agent.py b/run_agent.py index 963a9db4f..b8cbad581 100644 --- a/run_agent.py +++ b/run_agent.py @@ -585,7 +585,7 @@ class AIAgent: base_url: str = None, api_key: str = None, model: str = "anthropic/claude-sonnet-4-20250514", # OpenRouter format - max_iterations: int = 10, + max_iterations: int = 60, # Default tool-calling iterations tool_delay: float = 1.0, enabled_toolsets: List[str] = None, disabled_toolsets: List[str] = None, @@ -1966,11 +1966,47 @@ class AIAgent: final_response = f"I apologize, but I encountered repeated errors: {error_msg}" break - # Handle max iterations reached - if api_call_count >= self.max_iterations: - print(f"⚠️ Reached maximum iterations ({self.max_iterations}). Stopping to prevent infinite loop.") - if final_response is None: - final_response = "I've reached the maximum number of iterations. Here's what I found so far." + # Handle max iterations reached - ask model to summarize what it found + if api_call_count >= self.max_iterations and final_response is None: + print(f"⚠️ Reached maximum iterations ({self.max_iterations}). Requesting summary...") + + # Inject a user message asking for a summary + summary_request = ( + "You've reached the maximum number of tool-calling iterations allowed. " + "Please provide a final response summarizing what you've found and accomplished so far, " + "without calling any more tools." + ) + messages.append({"role": "user", "content": summary_request}) + + # Make one final API call WITHOUT tools to force a text response + try: + api_messages = messages.copy() + if self.ephemeral_system_prompt: + api_messages = [{"role": "system", "content": self.ephemeral_system_prompt}] + api_messages + + summary_response = self.client.chat.completions.create( + model=self.model, + messages=api_messages, + # No tools parameter - forces text response + extra_headers=self.extra_headers, + extra_body=self.extra_body, + ) + + if summary_response.choices and summary_response.choices[0].message.content: + final_response = summary_response.choices[0].message.content + # Strip think blocks from final response + if "" in final_response: + import re + final_response = re.sub(r'.*?\s*', '', final_response, flags=re.DOTALL).strip() + + # Add to messages for session continuity + messages.append({"role": "assistant", "content": final_response}) + else: + final_response = "I reached the iteration limit and couldn't generate a summary." + + except Exception as e: + logging.warning(f"Failed to get summary response: {e}") + final_response = f"I reached the maximum iterations ({self.max_iterations}) but couldn't summarize. Error: {str(e)}" # Determine if conversation completed successfully completed = final_response is not None and api_call_count < self.max_iterations