diff --git a/AGENTS.md b/AGENTS.md index 4d7f733ad..f9a66ed51 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -285,6 +285,9 @@ When `HERMES_TOOL_PROGRESS=true`, the bot sends status messages as it works: - `šŸ’» \`ls -la\`...` (terminal commands show the actual command) - `šŸ” web_search...` - `šŸ“„ web_extract...` +- `šŸ execute_code...` (programmatic tool calling sandbox) +- `šŸ”€ delegate_task...` (subagent delegation) +- `ā“ clarify...` (user question, CLI-only) Modes: - `new`: Only when switching to a different tool (less spam) diff --git a/README.md b/README.md index db163135c..cf260a01a 100644 --- a/README.md +++ b/README.md @@ -334,7 +334,7 @@ hermes --toolsets "web,terminal" hermes --list-tools ``` -**Available toolsets:** `web`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `moa`, `skills`, `tts`, `todo`, `memory`, `session_search`, `cronjob`, and more. +**Available toolsets:** `web`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `moa`, `skills`, `tts`, `todo`, `memory`, `session_search`, `cronjob`, `code_execution`, `delegation`, `clarify`, and more. ### šŸ–„ļø Terminal & Process Management @@ -682,6 +682,62 @@ How to confirm it worked. └── .bundled_manifest # Tracks which bundled skills have been offered ``` +### šŸ Code Execution (Programmatic Tool Calling) + +The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a sandboxed child process on the agent host, communicating with the parent via Unix domain socket RPC. + +```bash +# The agent can write scripts like: +from hermes_tools import web_search, web_extract +results = web_search("Python 3.13 features", limit=5) +for r in results["data"]["web"]: + content = web_extract([r["url"]]) + # ... filter and process ... +print(summary) +``` + +**Available tools in sandbox:** `web_search`, `web_extract`, `read_file`, `write_file`, `search`, `patch`, `terminal` (foreground only). + +**When the agent uses this:** 3+ tool calls with processing logic between them, bulk data filtering, conditional branching, loops. The intermediate tool results never enter the context window -- only the final `print()` output comes back. + +Configure via `~/.hermes/config.yaml`: +```yaml +code_execution: + timeout: 300 # Max seconds per script (default: 300) + max_tool_calls: 50 # Max tool calls per execution (default: 50) +``` + +### šŸ”€ Subagents (Task Delegation) + +The `delegate_task` tool spawns child AIAgent instances with isolated context, restricted toolsets, and their own terminal sessions. Each child gets a fresh conversation and works independently -- only its final summary enters the parent's context. + +**Single task:** +``` +delegate_task(goal="Debug why tests fail", context="Error: assertion in test_foo.py line 42", toolsets=["terminal", "file"]) +``` + +**Parallel batch (up to 3 concurrent):** +``` +delegate_task(tasks=[ + {"goal": "Research topic A", "toolsets": ["web"]}, + {"goal": "Research topic B", "toolsets": ["web"]}, + {"goal": "Fix the build", "toolsets": ["terminal", "file"]} +]) +``` + +**Key properties:** +- Each subagent gets its own terminal session (separate from the parent) +- Depth limit of 2 (no grandchildren) +- Subagents cannot call: `delegate_task`, `clarify`, `memory`, `send_message`, `execute_code` +- Interrupt propagation: interrupting the parent interrupts all active children + +Configure via `~/.hermes/config.yaml`: +```yaml +delegation: + max_iterations: 25 # Max turns per child (default: 25) + default_toolsets: ["terminal", "file", "web"] # Default toolsets +``` + ### šŸ¤– RL Training (Tinker + Atropos) > **āš ļø In Development** — RL training integration is not yet functional. The tools and environments below are under active development. diff --git a/TODO.md b/TODO.md index 546f9d633..8244b333c 100644 --- a/TODO.md +++ b/TODO.md @@ -40,7 +40,7 @@ These four systems form a continuum of agent intelligence. They should be though ## 1. Subagent Architecture (Context Isolation) šŸŽÆ -**Status:** Not started +**Status:** Implemented āœ… **Priority:** High -- this is foundational for scaling to complex tasks The main agent becomes an orchestrator that delegates context-heavy tasks to subagents with isolated context. Each subagent returns a summary, keeping the orchestrator's context clean. @@ -598,11 +598,11 @@ This goes in the tool description: - **SQLite State Store & Session Search.** `~/.hermes/state.db` with sessions, messages, FTS5 search, `session_search` tool. - **Interactive Clarifying Questions.** `clarify` tool with arrow-key selection UI in CLI, configurable timeout, CLI-only. - **Programmatic Tool Calling.** `execute_code` tool -- sandbox child process with UDS RPC bridge to 7 tools (`web_search`, `web_extract`, `read_file`, `write_file`, `search`, `patch`, `terminal`). Configurable timeout and tool call limits via `config.yaml`. +- **Subagent Architecture.** `delegate_task` tool -- spawn child AIAgents with isolated context and terminal sessions. Single-task and batch (up to 3 parallel) modes via ThreadPoolExecutor. Depth limit of 2, blocked tool enforcement, interrupt propagation. ### Tier 1: Next Up -1. Subagent Architecture -- #1 -2. MCP Support -- #6 +1. MCP Support -- #6 ### Tier 2: Quality of Life diff --git a/cli-config.yaml.example b/cli-config.yaml.example index 7fe334ed2..ed1493346 100644 --- a/cli-config.yaml.example +++ b/cli-config.yaml.example @@ -385,6 +385,24 @@ stt: # No configuration needed - logging is always enabled. # To disable, you would need to modify the source code. +# ============================================================================= +# Code Execution Sandbox (Programmatic Tool Calling) +# ============================================================================= +# The execute_code tool runs Python scripts that call Hermes tools via RPC. +# Intermediate tool results stay out of the LLM's context window. +code_execution: + timeout: 300 # Max seconds per script before kill (default: 300 = 5 min) + max_tool_calls: 50 # Max RPC tool calls per execution (default: 50) + +# ============================================================================= +# Subagent Delegation +# ============================================================================= +# The delegate_task tool spawns child agents with isolated context. +# Supports single tasks and batch mode (up to 3 parallel). +delegation: + max_iterations: 50 # Max tool-calling turns per child (default: 25) + default_toolsets: ["terminal", "file", "web"] # Default toolsets for subagents + # ============================================================================= # Display # ============================================================================= diff --git a/cli.py b/cli.py index 65daaad52..ca6bcc539 100755 --- a/cli.py +++ b/cli.py @@ -139,6 +139,10 @@ def load_cli_config() -> Dict[str, Any]: "timeout": 300, # Max seconds a sandbox script can run before being killed (5 min) "max_tool_calls": 50, # Max RPC tool calls per execution }, + "delegation": { + "max_iterations": 25, # Max tool-calling turns per child agent + "default_toolsets": ["terminal", "file", "web"], # Default toolsets for subagents + }, } # Track whether the config file explicitly set terminal config. diff --git a/docs/tools.md b/docs/tools.md index 7f6f616b9..9f986d5e8 100644 --- a/docs/tools.md +++ b/docs/tools.md @@ -52,6 +52,9 @@ async def web_search(query: str) -> dict: | **Session Search** | `session_search_tool.py` | `session_search` (search + summarize past conversations) | | **Cronjob** | `cronjob_tools.py` | `schedule_cronjob`, `list_cronjobs`, `remove_cronjob` | | **RL Training** | `rl_training_tool.py` | `rl_list_environments`, `rl_start_training`, `rl_check_status`, etc. | +| **Clarify** | `clarify_tool.py` | `clarify` (interactive multiple-choice / open-ended questions, CLI-only) | +| **Code Execution** | `code_execution_tool.py` | `execute_code` (run Python scripts that call tools via RPC sandbox) | +| **Delegation** | `delegate_tool.py` | `delegate_task` (spawn subagents with isolated context, single + parallel batch) | ## Tool Registration diff --git a/model_tools.py b/model_tools.py index 71016af52..787470a98 100644 --- a/model_tools.py +++ b/model_tools.py @@ -97,6 +97,8 @@ from tools.session_search_tool import session_search, check_session_search_requi from tools.clarify_tool import clarify_tool, check_clarify_requirements, CLARIFY_SCHEMA # Code execution sandbox (programmatic tool calling) from tools.code_execution_tool import execute_code, check_sandbox_requirements, EXECUTE_CODE_SCHEMA +# Subagent delegation +from tools.delegate_tool import delegate_task, check_delegate_requirements, DELEGATE_TASK_SCHEMA from toolsets import ( get_toolset, resolve_toolset, resolve_multiple_toolsets, get_all_toolsets, get_toolset_names, validate_toolset, @@ -221,6 +223,13 @@ TOOLSET_REQUIREMENTS = { "setup_url": None, "tools": ["execute_code"], }, + "delegation": { + "name": "Subagent Delegation", + "env_vars": [], # Uses existing AIAgent class, no external deps + "check_fn": check_delegate_requirements, + "setup_url": None, + "tools": ["delegate_task"], + }, } @@ -1023,6 +1032,13 @@ def get_execute_code_tool_definitions() -> List[Dict[str, Any]]: return [{"type": "function", "function": EXECUTE_CODE_SCHEMA}] +def get_delegate_tool_definitions() -> List[Dict[str, Any]]: + """ + Get tool definitions for the subagent delegation tool. + """ + return [{"type": "function", "function": DELEGATE_TASK_SCHEMA}] + + def get_send_message_tool_definitions(): """Tool definitions for cross-channel messaging.""" return [ @@ -1196,6 +1212,10 @@ def get_all_tool_names() -> List[str]: if check_sandbox_requirements(): tool_names.extend(["execute_code"]) + # Subagent delegation + if check_delegate_requirements(): + tool_names.extend(["delegate_task"]) + # Cross-channel messaging (always available on messaging platforms) tool_names.extend(["send_message"]) @@ -1262,6 +1282,8 @@ TOOL_TO_TOOLSET_MAP = { "clarify": "clarify_tools", # Code execution sandbox "execute_code": "code_execution_tools", + # Subagent delegation + "delegate_task": "delegation_tools", } @@ -1400,6 +1422,11 @@ def get_tool_definitions( for tool in get_execute_code_tool_definitions(): all_available_tools_map[tool["function"]["name"]] = tool + # Subagent delegation + if check_delegate_requirements(): + for tool in get_delegate_tool_definitions(): + all_available_tools_map[tool["function"]["name"]] = tool + # Cross-channel messaging (always available on messaging platforms) for tool in get_send_message_tool_definitions(): all_available_tools_map[tool["function"]["name"]] = tool @@ -2313,6 +2340,10 @@ def handle_function_call( elif function_name == "session_search": return json.dumps({"error": "Session search is not available. The session database may not be initialized."}) + # Delegate task -- handled by the agent loop (needs parent AIAgent instance). + elif function_name == "delegate_task": + return json.dumps({"error": "delegate_task must be handled by the agent loop"}) + else: error_msg = f"Unknown function: {function_name}" print(f"āŒ {error_msg}") @@ -2426,6 +2457,12 @@ def get_available_toolsets() -> Dict[str, Dict[str, Any]]: "tools": ["execute_code"], "description": "Code execution sandbox: run Python scripts that call tools programmatically", "requirements": ["Linux or macOS (Unix domain sockets)"] + }, + "delegation_tools": { + "available": check_delegate_requirements(), + "tools": ["delegate_task"], + "description": "Subagent delegation: spawn child agents with isolated context for complex subtasks", + "requirements": [] } } @@ -2450,6 +2487,7 @@ def check_toolset_requirements() -> Dict[str, bool]: "file_tools": check_file_requirements(), "tts_tools": check_tts_requirements(), "code_execution_tools": check_sandbox_requirements(), + "delegation_tools": check_delegate_requirements(), } if __name__ == "__main__": diff --git a/run_agent.py b/run_agent.py index 0f01f9e5e..5b0c40e4d 100644 --- a/run_agent.py +++ b/run_agent.py @@ -1142,6 +1142,10 @@ class AIAgent: self._interrupt_requested = False self._interrupt_message = None # Optional message that triggered interrupt + # Subagent delegation state + self._delegate_depth = 0 # 0 = top-level agent, incremented for children + self._active_children = [] # Running child AIAgents (for interrupt propagation) + # Store OpenRouter provider preferences self.providers_allowed = providers_allowed self.providers_ignored = providers_ignored @@ -1600,6 +1604,14 @@ class AIAgent: first_line = code.strip().split("\n")[0] if code.strip() else "" return f"ā”Š šŸ exec {_trunc(first_line, 35)} {dur}" + # ── Subagent Delegation ── + if tool_name == "delegate_task": + tasks = args.get("tasks") + if tasks and isinstance(tasks, list): + return f"ā”Š šŸ”€ delegate {len(tasks)} parallel tasks {dur}" + goal = _trunc(args.get("goal", ""), 35) + return f"ā”Š šŸ”€ delegate {goal} {dur}" + # ── Fallback ── preview = _build_tool_preview(tool_name, args) or "" return f"ā”Š ⚔ {tool_name[:9]:9} {_trunc(preview, 35)} {dur}" @@ -2091,6 +2103,12 @@ class AIAgent: self._interrupt_message = message # Signal the terminal tool to kill any running subprocess immediately _set_terminal_interrupt(True) + # Propagate interrupt to any running child agents (subagent delegation) + for child in self._active_children: + try: + child.interrupt(message) + except Exception: + pass if not self.quiet_mode: print(f"\n⚔ Interrupt requested" + (f": '{message[:40]}...'" if message and len(message) > 40 else f": '{message}'" if message else "")) @@ -2957,6 +2975,21 @@ class AIAgent: tool_duration = time.time() - tool_start_time if self.quiet_mode: print(f" {self._get_cute_tool_message('clarify', function_args, tool_duration)}") + # Delegate task -- spawn child agent(s) with isolated context + elif function_name == "delegate_task": + from tools.delegate_tool import delegate_task as _delegate_task + function_result = _delegate_task( + goal=function_args.get("goal"), + context=function_args.get("context"), + toolsets=function_args.get("toolsets"), + tasks=function_args.get("tasks"), + model=function_args.get("model"), + max_iterations=function_args.get("max_iterations"), + parent_agent=self, + ) + tool_duration = time.time() - tool_start_time + if self.quiet_mode: + print(f" {self._get_cute_tool_message('delegate_task', function_args, tool_duration)}") # Execute other tools - with animated kawaii spinner in quiet mode # The face is "alive" while the tool works, then vanishes # and is replaced by the clean result line. @@ -2976,7 +3009,7 @@ class AIAgent: 'skills_list': 'šŸ“š', 'skill_view': 'šŸ“š', 'schedule_cronjob': 'ā°', 'list_cronjobs': 'ā°', 'remove_cronjob': 'ā°', 'send_message': 'šŸ“Ø', 'todo': 'šŸ“‹', 'memory': '🧠', 'session_search': 'šŸ”', - 'clarify': 'ā“', 'execute_code': 'šŸ', + 'clarify': 'ā“', 'execute_code': 'šŸ', 'delegate_task': 'šŸ”€', } emoji = tool_emoji_map.get(function_name, '⚔') preview = _build_tool_preview(function_name, function_args) or function_name diff --git a/tests/test_delegate.py b/tests/test_delegate.py new file mode 100644 index 000000000..fea5cc9f6 --- /dev/null +++ b/tests/test_delegate.py @@ -0,0 +1,237 @@ +#!/usr/bin/env python3 +""" +Tests for the subagent delegation tool. + +Uses mock AIAgent instances to test the delegation logic without +requiring API keys or real LLM calls. + +Run with: python -m pytest tests/test_delegate.py -v + or: python tests/test_delegate.py +""" + +import json +import os +import sys +import time +import unittest +from unittest.mock import MagicMock, patch + +sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) + +from tools.delegate_tool import ( + DELEGATE_BLOCKED_TOOLS, + DELEGATE_TASK_SCHEMA, + MAX_CONCURRENT_CHILDREN, + MAX_DEPTH, + check_delegate_requirements, + delegate_task, + _build_child_system_prompt, + _strip_blocked_tools, +) + + +def _make_mock_parent(depth=0): + """Create a mock parent agent with the fields delegate_task expects.""" + parent = MagicMock() + parent.base_url = "https://openrouter.ai/api/v1" + parent.model = "anthropic/claude-sonnet-4" + parent.platform = "cli" + parent.providers_allowed = None + parent.providers_ignored = None + parent.providers_order = None + parent.provider_sort = None + parent._session_db = None + parent._delegate_depth = depth + parent._active_children = [] + return parent + + +class TestDelegateRequirements(unittest.TestCase): + def test_always_available(self): + self.assertTrue(check_delegate_requirements()) + + def test_schema_valid(self): + self.assertEqual(DELEGATE_TASK_SCHEMA["name"], "delegate_task") + props = DELEGATE_TASK_SCHEMA["parameters"]["properties"] + self.assertIn("goal", props) + self.assertIn("tasks", props) + self.assertIn("context", props) + self.assertIn("toolsets", props) + self.assertIn("model", props) + self.assertIn("max_iterations", props) + self.assertEqual(props["tasks"]["maxItems"], 3) + + +class TestChildSystemPrompt(unittest.TestCase): + def test_goal_only(self): + prompt = _build_child_system_prompt("Fix the tests") + self.assertIn("Fix the tests", prompt) + self.assertIn("YOUR TASK", prompt) + self.assertNotIn("CONTEXT", prompt) + + def test_goal_with_context(self): + prompt = _build_child_system_prompt("Fix the tests", "Error: assertion failed in test_foo.py line 42") + self.assertIn("Fix the tests", prompt) + self.assertIn("CONTEXT", prompt) + self.assertIn("assertion failed", prompt) + + def test_empty_context_ignored(self): + prompt = _build_child_system_prompt("Do something", " ") + self.assertNotIn("CONTEXT", prompt) + + +class TestStripBlockedTools(unittest.TestCase): + def test_removes_blocked_toolsets(self): + result = _strip_blocked_tools(["terminal", "file", "delegation", "clarify", "memory", "code_execution"]) + self.assertEqual(sorted(result), ["file", "terminal"]) + + def test_preserves_allowed_toolsets(self): + result = _strip_blocked_tools(["terminal", "file", "web", "browser"]) + self.assertEqual(sorted(result), ["browser", "file", "terminal", "web"]) + + def test_empty_input(self): + result = _strip_blocked_tools([]) + self.assertEqual(result, []) + + +class TestDelegateTask(unittest.TestCase): + def test_no_parent_agent(self): + result = json.loads(delegate_task(goal="test")) + self.assertIn("error", result) + self.assertIn("parent agent", result["error"]) + + def test_depth_limit(self): + parent = _make_mock_parent(depth=2) + result = json.loads(delegate_task(goal="test", parent_agent=parent)) + self.assertIn("error", result) + self.assertIn("depth limit", result["error"].lower()) + + def test_no_goal_or_tasks(self): + parent = _make_mock_parent() + result = json.loads(delegate_task(parent_agent=parent)) + self.assertIn("error", result) + + def test_empty_goal(self): + parent = _make_mock_parent() + result = json.loads(delegate_task(goal=" ", parent_agent=parent)) + self.assertIn("error", result) + + def test_task_missing_goal(self): + parent = _make_mock_parent() + result = json.loads(delegate_task(tasks=[{"context": "no goal here"}], parent_agent=parent)) + self.assertIn("error", result) + + @patch("tools.delegate_tool._run_single_child") + def test_single_task_mode(self, mock_run): + mock_run.return_value = { + "task_index": 0, "status": "completed", + "summary": "Done!", "api_calls": 3, "duration_seconds": 5.0 + } + parent = _make_mock_parent() + result = json.loads(delegate_task(goal="Fix tests", context="error log...", parent_agent=parent)) + self.assertIn("results", result) + self.assertEqual(len(result["results"]), 1) + self.assertEqual(result["results"][0]["status"], "completed") + self.assertEqual(result["results"][0]["summary"], "Done!") + mock_run.assert_called_once() + + @patch("tools.delegate_tool._run_single_child") + def test_batch_mode(self, mock_run): + mock_run.side_effect = [ + {"task_index": 0, "status": "completed", "summary": "Result A", "api_calls": 2, "duration_seconds": 3.0}, + {"task_index": 1, "status": "completed", "summary": "Result B", "api_calls": 4, "duration_seconds": 6.0}, + ] + parent = _make_mock_parent() + tasks = [ + {"goal": "Research topic A"}, + {"goal": "Research topic B"}, + ] + result = json.loads(delegate_task(tasks=tasks, parent_agent=parent)) + self.assertIn("results", result) + self.assertEqual(len(result["results"]), 2) + self.assertEqual(result["results"][0]["summary"], "Result A") + self.assertEqual(result["results"][1]["summary"], "Result B") + self.assertIn("total_duration_seconds", result) + + @patch("tools.delegate_tool._run_single_child") + def test_batch_capped_at_3(self, mock_run): + mock_run.return_value = { + "task_index": 0, "status": "completed", + "summary": "Done", "api_calls": 1, "duration_seconds": 1.0 + } + parent = _make_mock_parent() + tasks = [{"goal": f"Task {i}"} for i in range(5)] + result = json.loads(delegate_task(tasks=tasks, parent_agent=parent)) + # Should only run 3 tasks (MAX_CONCURRENT_CHILDREN) + self.assertEqual(mock_run.call_count, 3) + + @patch("tools.delegate_tool._run_single_child") + def test_batch_ignores_toplevel_goal(self, mock_run): + """When tasks array is provided, top-level goal/context/toolsets are ignored.""" + mock_run.return_value = { + "task_index": 0, "status": "completed", + "summary": "Done", "api_calls": 1, "duration_seconds": 1.0 + } + parent = _make_mock_parent() + result = json.loads(delegate_task( + goal="This should be ignored", + tasks=[{"goal": "Actual task"}], + parent_agent=parent, + )) + # The mock was called with the tasks array item, not the top-level goal + call_args = mock_run.call_args + self.assertEqual(call_args.kwargs.get("goal") or call_args[1].get("goal", call_args[0][1] if len(call_args[0]) > 1 else None), "Actual task") + + @patch("tools.delegate_tool._run_single_child") + def test_failed_child_included_in_results(self, mock_run): + mock_run.return_value = { + "task_index": 0, "status": "error", + "summary": None, "error": "Something broke", + "api_calls": 0, "duration_seconds": 0.5 + } + parent = _make_mock_parent() + result = json.loads(delegate_task(goal="Break things", parent_agent=parent)) + self.assertEqual(result["results"][0]["status"], "error") + self.assertIn("Something broke", result["results"][0]["error"]) + + def test_depth_increments(self): + """Verify child gets parent's depth + 1.""" + parent = _make_mock_parent(depth=0) + + with patch("run_agent.AIAgent") as MockAgent: + mock_child = MagicMock() + mock_child.run_conversation.return_value = { + "final_response": "done", "completed": True, "api_calls": 1 + } + MockAgent.return_value = mock_child + + delegate_task(goal="Test depth", parent_agent=parent) + self.assertEqual(mock_child._delegate_depth, 1) + + def test_active_children_tracking(self): + """Verify children are registered/unregistered for interrupt propagation.""" + parent = _make_mock_parent(depth=0) + + with patch("run_agent.AIAgent") as MockAgent: + mock_child = MagicMock() + mock_child.run_conversation.return_value = { + "final_response": "done", "completed": True, "api_calls": 1 + } + MockAgent.return_value = mock_child + + delegate_task(goal="Test tracking", parent_agent=parent) + self.assertEqual(len(parent._active_children), 0) + + +class TestBlockedTools(unittest.TestCase): + def test_blocked_tools_constant(self): + for tool in ["delegate_task", "clarify", "memory", "send_message", "execute_code"]: + self.assertIn(tool, DELEGATE_BLOCKED_TOOLS) + + def test_constants(self): + self.assertEqual(MAX_CONCURRENT_CHILDREN, 3) + self.assertEqual(MAX_DEPTH, 2) + + +if __name__ == "__main__": + unittest.main() diff --git a/tools/__init__.py b/tools/__init__.py index 5704de530..ea1bd12bb 100644 --- a/tools/__init__.py +++ b/tools/__init__.py @@ -156,6 +156,13 @@ from .code_execution_tool import ( EXECUTE_CODE_SCHEMA, ) +# Subagent delegation (spawn child agents with isolated context) +from .delegate_tool import ( + delegate_task, + check_delegate_requirements, + DELEGATE_TASK_SCHEMA, +) + # File tools have no external requirements - they use the terminal backend def check_file_requirements(): """File tools only require terminal backend to be available.""" @@ -261,5 +268,9 @@ __all__ = [ 'execute_code', 'check_sandbox_requirements', 'EXECUTE_CODE_SCHEMA', + # Subagent delegation + 'delegate_task', + 'check_delegate_requirements', + 'DELEGATE_TASK_SCHEMA', ] diff --git a/tools/delegate_tool.py b/tools/delegate_tool.py new file mode 100644 index 000000000..ffd7f17d4 --- /dev/null +++ b/tools/delegate_tool.py @@ -0,0 +1,398 @@ +#!/usr/bin/env python3 +""" +Delegate Tool -- Subagent Architecture + +Spawns child AIAgent instances with isolated context, restricted toolsets, +and their own terminal sessions. Supports single-task and batch (parallel) +modes. The parent blocks until all children complete. + +Each child gets: + - A fresh conversation (no parent history) + - Its own task_id (own terminal session, file ops cache) + - A restricted toolset (configurable, with blocked tools always stripped) + - A focused system prompt built from the delegated goal + context + +The parent's context only sees the delegation call and the summary result, +never the child's intermediate tool calls or reasoning. +""" + +import contextlib +import io +import json +import logging +import os +import time +from concurrent.futures import ThreadPoolExecutor, as_completed +from typing import Any, Dict, List, Optional + + +# Tools that children must never have access to +DELEGATE_BLOCKED_TOOLS = frozenset([ + "delegate_task", # no recursive delegation + "clarify", # no user interaction + "memory", # no writes to shared MEMORY.md + "send_message", # no cross-platform side effects + "execute_code", # children should reason step-by-step, not write scripts +]) + +MAX_CONCURRENT_CHILDREN = 3 +MAX_DEPTH = 2 # parent (0) -> child (1) -> grandchild rejected (2) +DEFAULT_MAX_ITERATIONS = 25 +DEFAULT_TOOLSETS = ["terminal", "file", "web"] + + +def check_delegate_requirements() -> bool: + """Delegation has no external requirements -- always available.""" + return True + + +def _build_child_system_prompt(goal: str, context: Optional[str] = None) -> str: + """Build a focused system prompt for a child agent.""" + parts = [ + "You are a focused subagent working on a specific delegated task.", + "", + f"YOUR TASK:\n{goal}", + ] + if context and context.strip(): + parts.append(f"\nCONTEXT:\n{context}") + parts.append( + "\nComplete this task using the tools available to you. " + "When finished, provide a clear, concise summary of:\n" + "- What you did\n" + "- What you found or accomplished\n" + "- Any files you created or modified\n" + "- Any issues encountered\n\n" + "Be thorough but concise -- your response is returned to the " + "parent agent as a summary." + ) + return "\n".join(parts) + + +def _strip_blocked_tools(toolsets: List[str]) -> List[str]: + """Remove toolsets that contain only blocked tools.""" + blocked_toolset_names = { + "delegation", "clarify", "memory", "code_execution", + } + return [t for t in toolsets if t not in blocked_toolset_names] + + +def _run_single_child( + task_index: int, + goal: str, + context: Optional[str], + toolsets: Optional[List[str]], + model: Optional[str], + max_iterations: int, + parent_agent, +) -> Dict[str, Any]: + """ + Spawn and run a single child agent. Called from within a thread. + Returns a structured result dict. + """ + from run_agent import AIAgent + + child_start = time.monotonic() + + child_toolsets = _strip_blocked_tools(toolsets or DEFAULT_TOOLSETS) + + child_prompt = _build_child_system_prompt(goal, context) + + try: + child = AIAgent( + base_url=parent_agent.base_url, + model=model or parent_agent.model, + max_iterations=max_iterations, + enabled_toolsets=child_toolsets, + quiet_mode=True, + ephemeral_system_prompt=child_prompt, + log_prefix=f"[subagent-{task_index}]", + platform=parent_agent.platform, + skip_context_files=True, + skip_memory=True, + clarify_callback=None, + session_db=getattr(parent_agent, '_session_db', None), + providers_allowed=parent_agent.providers_allowed, + providers_ignored=parent_agent.providers_ignored, + providers_order=parent_agent.providers_order, + provider_sort=parent_agent.provider_sort, + ) + + # Set delegation depth so children can't spawn grandchildren + child._delegate_depth = getattr(parent_agent, '_delegate_depth', 0) + 1 + + # Register child for interrupt propagation + if hasattr(parent_agent, '_active_children'): + parent_agent._active_children.append(child) + + # Run with stdout/stderr suppressed to prevent interleaved output + devnull = io.StringIO() + with contextlib.redirect_stdout(devnull), contextlib.redirect_stderr(devnull): + result = child.run_conversation(user_message=goal) + + duration = round(time.monotonic() - child_start, 2) + + summary = result.get("final_response") or "" + completed = result.get("completed", False) + interrupted = result.get("interrupted", False) + api_calls = result.get("api_calls", 0) + + if interrupted: + status = "interrupted" + elif completed and summary: + status = "completed" + else: + status = "failed" + + entry: Dict[str, Any] = { + "task_index": task_index, + "status": status, + "summary": summary, + "api_calls": api_calls, + "duration_seconds": duration, + } + if status == "failed": + entry["error"] = result.get("error", "Subagent did not produce a response.") + + return entry + + except Exception as exc: + duration = round(time.monotonic() - child_start, 2) + logging.exception(f"[subagent-{task_index}] failed") + return { + "task_index": task_index, + "status": "error", + "summary": None, + "error": str(exc), + "api_calls": 0, + "duration_seconds": duration, + } + + finally: + # Unregister child from interrupt propagation + if hasattr(parent_agent, '_active_children'): + try: + parent_agent._active_children.remove(child) + except (ValueError, UnboundLocalError): + pass + + +def delegate_task( + goal: Optional[str] = None, + context: Optional[str] = None, + toolsets: Optional[List[str]] = None, + tasks: Optional[List[Dict[str, Any]]] = None, + model: Optional[str] = None, + max_iterations: Optional[int] = None, + parent_agent=None, +) -> str: + """ + Spawn one or more child agents to handle delegated tasks. + + Supports two modes: + - Single: provide goal (+ optional context, toolsets) + - Batch: provide tasks array [{goal, context, toolsets}, ...] + + Returns JSON with results array, one entry per task. + """ + if parent_agent is None: + return json.dumps({"error": "delegate_task requires a parent agent context."}) + + # Depth limit + depth = getattr(parent_agent, '_delegate_depth', 0) + if depth >= MAX_DEPTH: + return json.dumps({ + "error": ( + f"Delegation depth limit reached ({MAX_DEPTH}). " + "Subagents cannot spawn further subagents." + ) + }) + + # Load config + cfg = _load_config() + default_max_iter = cfg.get("max_iterations", DEFAULT_MAX_ITERATIONS) + effective_max_iter = max_iterations or default_max_iter + + # Normalize to task list + if tasks and isinstance(tasks, list): + task_list = tasks[:MAX_CONCURRENT_CHILDREN] + elif goal and isinstance(goal, str) and goal.strip(): + task_list = [{"goal": goal, "context": context, "toolsets": toolsets}] + else: + return json.dumps({"error": "Provide either 'goal' (single task) or 'tasks' (batch)."}) + + if not task_list: + return json.dumps({"error": "No tasks provided."}) + + # Validate each task has a goal + for i, task in enumerate(task_list): + if not task.get("goal", "").strip(): + return json.dumps({"error": f"Task {i} is missing a 'goal'."}) + + overall_start = time.monotonic() + results = [] + + if len(task_list) == 1: + # Single task -- run directly (no thread pool overhead) + t = task_list[0] + result = _run_single_child( + task_index=0, + goal=t["goal"], + context=t.get("context"), + toolsets=t.get("toolsets") or toolsets, + model=model, + max_iterations=effective_max_iter, + parent_agent=parent_agent, + ) + results.append(result) + else: + # Batch -- run in parallel + with ThreadPoolExecutor(max_workers=MAX_CONCURRENT_CHILDREN) as executor: + futures = {} + for i, t in enumerate(task_list): + future = executor.submit( + _run_single_child, + task_index=i, + goal=t["goal"], + context=t.get("context"), + toolsets=t.get("toolsets") or toolsets, + model=model, + max_iterations=effective_max_iter, + parent_agent=parent_agent, + ) + futures[future] = i + + for future in as_completed(futures): + try: + results.append(future.result()) + except Exception as exc: + idx = futures[future] + results.append({ + "task_index": idx, + "status": "error", + "summary": None, + "error": str(exc), + "api_calls": 0, + "duration_seconds": 0, + }) + + # Sort by task_index so results match input order + results.sort(key=lambda r: r["task_index"]) + + total_duration = round(time.monotonic() - overall_start, 2) + + return json.dumps({ + "results": results, + "total_duration_seconds": total_duration, + }, ensure_ascii=False) + + +def _load_config() -> dict: + """Load delegation config from CLI_CONFIG if available.""" + try: + from cli import CLI_CONFIG + return CLI_CONFIG.get("delegation", {}) + except Exception: + return {} + + +# --------------------------------------------------------------------------- +# OpenAI Function-Calling Schema +# --------------------------------------------------------------------------- + +DELEGATE_TASK_SCHEMA = { + "name": "delegate_task", + "description": ( + "Spawn one or more subagents to work on tasks in isolated contexts. " + "Each subagent gets its own conversation, terminal session, and toolset. " + "Only the final summary is returned -- intermediate tool results " + "never enter your context window.\n\n" + "TWO MODES:\n" + "1. Single task: provide 'goal' (+ optional context, toolsets)\n" + "2. Batch (parallel): provide 'tasks' array with up to 3 items. " + "All run concurrently and results are returned together.\n\n" + "WHEN TO USE delegate_task:\n" + "- Reasoning-heavy subtasks (debugging, code review, research synthesis)\n" + "- Tasks that would flood your context with intermediate data\n" + "- Parallel independent workstreams (research A and B simultaneously)\n\n" + "WHEN NOT TO USE (use these instead):\n" + "- Mechanical multi-step work with no reasoning needed -> use execute_code\n" + "- Single tool call -> just call the tool directly\n" + "- Tasks needing user interaction -> subagents cannot use clarify\n\n" + "IMPORTANT:\n" + "- Subagents have NO memory of your conversation. Pass all relevant " + "info (file paths, error messages, constraints) via the 'context' field.\n" + "- Subagents CANNOT call: delegate_task, clarify, memory, send_message, " + "execute_code.\n" + "- Each subagent gets its own terminal session (separate working directory and state).\n" + "- Results are always returned as an array, one entry per task." + ), + "parameters": { + "type": "object", + "properties": { + "goal": { + "type": "string", + "description": ( + "What the subagent should accomplish. Be specific and " + "self-contained -- the subagent knows nothing about your " + "conversation history." + ), + }, + "context": { + "type": "string", + "description": ( + "Background information the subagent needs: file paths, " + "error messages, project structure, constraints. The more " + "specific you are, the better the subagent performs." + ), + }, + "toolsets": { + "type": "array", + "items": {"type": "string"}, + "description": ( + "Toolsets to enable for this subagent. " + "Default: ['terminal', 'file', 'web']. " + "Common patterns: ['terminal', 'file'] for code work, " + "['web'] for research, ['terminal', 'file', 'web'] for " + "full-stack tasks." + ), + }, + "tasks": { + "type": "array", + "items": { + "type": "object", + "properties": { + "goal": {"type": "string", "description": "Task goal"}, + "context": {"type": "string", "description": "Task-specific context"}, + "toolsets": { + "type": "array", + "items": {"type": "string"}, + "description": "Toolsets for this specific task", + }, + }, + "required": ["goal"], + }, + "maxItems": 3, + "description": ( + "Batch mode: up to 3 tasks to run in parallel. Each gets " + "its own subagent with isolated context and terminal session. " + "When provided, top-level goal/context/toolsets are ignored." + ), + }, + "model": { + "type": "string", + "description": ( + "Model override for the subagent(s). Omit to use your " + "same model. Use a cheaper/faster model for simple subtasks." + ), + }, + "max_iterations": { + "type": "integer", + "description": ( + "Max tool-calling turns per subagent (default: 25). " + "Lower for simple tasks, higher for complex ones." + ), + }, + }, + "required": [], + }, +} diff --git a/toolsets.py b/toolsets.py index 838f3a8c3..2a2daa7e9 100644 --- a/toolsets.py +++ b/toolsets.py @@ -144,6 +144,12 @@ TOOLSETS = { "includes": [] }, + "delegation": { + "description": "Spawn subagents with isolated context for complex subtasks", + "tools": ["delegate_task"], + "includes": [] + }, + # Scenario-specific toolsets @@ -197,6 +203,8 @@ TOOLSETS = { "clarify", # Code execution sandbox (programmatic tool calling) "execute_code", + # Subagent delegation + "delegate_task", # Cronjob management (CLI-only) "schedule_cronjob", "list_cronjobs", "remove_cronjob" ], @@ -237,6 +245,8 @@ TOOLSETS = { "session_search", # Code execution sandbox (programmatic tool calling) "execute_code", + # Subagent delegation + "delegate_task", # Cronjob management - let users schedule tasks "schedule_cronjob", "list_cronjobs", "remove_cronjob", # Cross-channel messaging @@ -275,6 +285,8 @@ TOOLSETS = { "session_search", # Code execution sandbox (programmatic tool calling) "execute_code", + # Subagent delegation + "delegate_task", # Cronjob management - let users schedule tasks "schedule_cronjob", "list_cronjobs", "remove_cronjob", # Cross-channel messaging @@ -313,6 +325,8 @@ TOOLSETS = { "session_search", # Code execution sandbox (programmatic tool calling) "execute_code", + # Subagent delegation + "delegate_task", # Cronjob management "schedule_cronjob", "list_cronjobs", "remove_cronjob", # Cross-channel messaging @@ -351,6 +365,8 @@ TOOLSETS = { "session_search", # Code execution sandbox (programmatic tool calling) "execute_code", + # Subagent delegation + "delegate_task", # Cronjob management - let users schedule tasks "schedule_cronjob", "list_cronjobs", "remove_cronjob", # Cross-channel messaging