Moved "architecture" dir to "docs" for clarity

2026-01-30 07:54:51 +00:00
parent b292192467
commit 4b68d30b0e
4 changed files with 0 additions and 0 deletions
--- a/docs/agents.md
+++ b/docs/agents.md
@@ -0,0 +1,104 @@
+# Agents
+
+The agent is the core loop that orchestrates LLM calls and tool execution.
+
+## AIAgent Class
+
+The main agent is implemented in `run_agent.py`:
+
+```python
+class AIAgent:
+    def __init__(
+        self,
+        model: str = "anthropic/claude-sonnet-4",
+        api_key: str = None,
+        base_url: str = "https://openrouter.ai/api/v1",
+        max_turns: int = 20,
+        enabled_toolsets: list = None,
+        disabled_toolsets: list = None,
+        verbose_logging: bool = False,
+    ):
+        # Initialize OpenAI client, load tools based on toolsets
+        ...
+    
+    def chat(self, user_message: str, task_id: str = None) -> str:
+        # Main entry point - runs the agent loop
+        ...
+```
+
+## Agent Loop
+
+The core loop in `_run_agent_loop()`:
+
+```
+1. Add user message to conversation
+2. Call LLM with tools
+3. If LLM returns tool calls:
+   - Execute each tool
+   - Add tool results to conversation
+   - Go to step 2
+4. If LLM returns text response:
+   - Return response to user
+```
+
+```python
+while turns < max_turns:
+    response = client.chat.completions.create(
+        model=model,
+        messages=messages,
+        tools=tool_schemas,
+    )
+    
+    if response.tool_calls:
+        for tool_call in response.tool_calls:
+            result = await execute_tool(tool_call)
+            messages.append(tool_result_message(result))
+        turns += 1
+    else:
+        return response.content
+```
+
+## Conversation Management
+
+Messages are stored as a list of dicts following OpenAI format:
+
+```python
+messages = [
+    {"role": "system", "content": "You are a helpful assistant..."},
+    {"role": "user", "content": "Search for Python tutorials"},
+    {"role": "assistant", "content": None, "tool_calls": [...]},
+    {"role": "tool", "tool_call_id": "...", "content": "..."},
+    {"role": "assistant", "content": "Here's what I found..."},
+]
+```
+
+## Reasoning Context
+
+For models that support reasoning (chain-of-thought), the agent:
+1. Extracts `reasoning_content` from API responses
+2. Stores it in `assistant_msg["reasoning"]` for trajectory export
+3. Passes it back via `reasoning_content` field on subsequent turns
+
+## Trajectory Export
+
+Conversations can be exported for training:
+
+```python
+agent = AIAgent(save_trajectories=True)
+agent.chat("Do something")
+# Saves to trajectories/*.jsonl in ShareGPT format
+```
+
+## Batch Processing
+
+For processing multiple prompts, use `batch_runner.py`:
+
+```bash
+python batch_runner.py \
+    --dataset_file=prompts.jsonl \
+    --batch_size=20 \
+    --num_workers=4 \
+    --run_name=my_run
+```
+
+See `batch_runner.py` for parallel execution with checkpointing.
--- a/docs/llm_client.md
+++ b/docs/llm_client.md
@@ -0,0 +1,124 @@
+# LLM Client
+
+Hermes Agent uses the OpenAI Python SDK with OpenRouter as the backend, providing access to many models through a single API.
+
+## Configuration
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    api_key=os.getenv("OPENROUTER_API_KEY"),
+    base_url="https://openrouter.ai/api/v1"
+)
+```
+
+## Supported Models
+
+Any model available on [OpenRouter](https://openrouter.ai/models):
+
+```python
+# Anthropic
+model = "anthropic/claude-sonnet-4"
+model = "anthropic/claude-opus-4"
+
+# OpenAI
+model = "openai/gpt-4o"
+model = "openai/o1"
+
+# Google
+model = "google/gemini-2.0-flash"
+
+# Open models
+model = "meta-llama/llama-3.3-70b-instruct"
+model = "deepseek/deepseek-chat-v3"
+model = "moonshotai/kimi-k2.5"
+```
+
+## Tool Calling
+
+Standard OpenAI function calling format:
+
+```python
+response = client.chat.completions.create(
+    model=model,
+    messages=messages,
+    tools=[
+        {
+            "type": "function",
+            "function": {
+                "name": "web_search",
+                "description": "Search the web",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "query": {"type": "string"}
+                    },
+                    "required": ["query"]
+                }
+            }
+        }
+    ],
+)
+
+# Check for tool calls
+if response.choices[0].message.tool_calls:
+    for tool_call in response.choices[0].message.tool_calls:
+        name = tool_call.function.name
+        args = json.loads(tool_call.function.arguments)
+        # Execute tool...
+```
+
+## Reasoning Models
+
+Some models return reasoning/thinking content:
+
+```python
+# Access reasoning if available
+message = response.choices[0].message
+if hasattr(message, 'reasoning_content') and message.reasoning_content:
+    reasoning = message.reasoning_content
+    # Store for trajectory export
+```
+
+## Provider Selection
+
+OpenRouter allows selecting specific providers:
+
+```python
+response = client.chat.completions.create(
+    model=model,
+    messages=messages,
+    extra_body={
+        "provider": {
+            "order": ["Anthropic", "Google"],  # Preferred providers
+            "ignore": ["Novita"],              # Providers to skip
+        }
+    }
+)
+```
+
+## Error Handling
+
+Common errors and handling:
+
+```python
+try:
+    response = client.chat.completions.create(...)
+except openai.RateLimitError:
+    # Back off and retry
+except openai.APIError as e:
+    # Check e.code for specific errors
+    # 400 = bad request (often provider-specific)
+    # 502 = bad gateway (retry with different provider)
+```
+
+## Cost Tracking
+
+OpenRouter returns usage info:
+
+```python
+usage = response.usage
+print(f"Tokens: {usage.prompt_tokens} + {usage.completion_tokens}")
+print(f"Cost: ${usage.cost:.6f}")  # If available
+```
--- a/docs/message_graph.md
+++ b/docs/message_graph.md
@@ -0,0 +1,121 @@
+# Message Format & Trajectories
+
+Hermes Agent uses two message formats: the **API format** for LLM calls and the **trajectory format** for training data export.
+
+## API Message Format
+
+Standard OpenAI chat format used during execution:
+
+```python
+messages = [
+    # System prompt
+    {"role": "system", "content": "You are a helpful assistant with tools..."},
+    
+    # User query
+    {"role": "user", "content": "Search for Python tutorials"},
+    
+    # Assistant with tool call
+    {
+        "role": "assistant",
+        "content": None,
+        "tool_calls": [{
+            "id": "call_abc123",
+            "type": "function",
+            "function": {
+                "name": "web_search",
+                "arguments": "{\"query\": \"Python tutorials\"}"
+            }
+        }]
+    },
+    
+    # Tool result
+    {
+        "role": "tool",
+        "tool_call_id": "call_abc123",
+        "content": "{\"results\": [...]}"
+    },
+    
+    # Final response
+    {"role": "assistant", "content": "Here's what I found..."}
+]
+```
+
+## Trajectory Format (ShareGPT)
+
+Exported for training in ShareGPT format:
+
+```json
+{
+    "conversations": [
+        {"from": "system", "value": "You are a helpful assistant..."},
+        {"from": "human", "value": "Search for Python tutorials"},
+        {"from": "gpt", "value": "<tool_call>\n{\"name\": \"web_search\", \"arguments\": {\"query\": \"Python tutorials\"}}\n</tool_call>"},
+        {"from": "tool", "value": "<tool_response>\n{\"results\": [...]}\n</tool_response>"},
+        {"from": "gpt", "value": "Here's what I found..."}
+    ],
+    "tools": "[{\"type\": \"function\", \"function\": {...}}]",
+    "source": "hermes-agent"
+}
+```
+
+## Reasoning Content
+
+For models that output reasoning/chain-of-thought:
+
+**During execution** (API format):
+```python
+# Stored internally but not sent back to model in content
+assistant_msg = {
+    "role": "assistant",
+    "content": "Here's what I found...",
+    "reasoning": "Let me think about this step by step..."  # Internal only
+}
+```
+
+**In trajectory export** (reasoning wrapped in tags):
+```json
+{
+    "from": "gpt",
+    "value": "<think>\nLet me think about this step by step...\n</think>\nHere's what I found..."
+}
+```
+
+## Conversion Flow
+
+```
+API Response → Internal Storage → Trajectory Export
+     ↓              ↓                    ↓
+tool_calls    reasoning field      <tool_call> tags
+reasoning_content                  <think> tags
+```
+
+The conversion happens in `_convert_to_trajectory_format()` in `run_agent.py`.
+
+## Ephemeral System Prompts
+
+Batch processing supports ephemeral system prompts that guide behavior during execution but are NOT saved to trajectories:
+
+```python
+# During execution: full system prompt + ephemeral guidance
+messages = [
+    {"role": "system", "content": SYSTEM_PROMPT + "\n\n" + ephemeral_prompt},
+    ...
+]
+
+# In saved trajectory: only the base system prompt
+trajectory = {
+    "conversations": [
+        {"from": "system", "value": SYSTEM_PROMPT},  # No ephemeral
+        ...
+    ]
+}
+```
+
+## Trajectory Compression
+
+Long trajectories can be compressed for training using `trajectory_compressor.py`:
+
+- Protects first/last N turns
+- Summarizes middle turns with LLM
+- Targets specific token budget
+- See `configs/trajectory_compression.yaml` for settings
--- a/docs/tools.md
+++ b/docs/tools.md
@@ -0,0 +1,133 @@
+# Tools
+
+Tools are functions that extend the agent's capabilities. Each tool is defined with an OpenAI-compatible JSON schema and an async handler function.
+
+## Tool Structure
+
+Each tool module in `tools/` exports:
+1. **Schema definitions** - OpenAI function-calling format
+2. **Handler functions** - Async functions that execute the tool
+
+```python
+# Example: tools/web_tools.py
+
+# Schema definition
+WEB_SEARCH_SCHEMA = {
+    "type": "function",
+    "function": {
+        "name": "web_search",
+        "description": "Search the web for information",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "query": {"type": "string", "description": "Search query"}
+            },
+            "required": ["query"]
+        }
+    }
+}
+
+# Handler function
+async def web_search(query: str) -> dict:
+    """Execute web search and return results."""
+    # Implementation...
+    return {"results": [...]}
+```
+
+## Tool Categories
+
+| Category | Module | Tools |
+|----------|--------|-------|
+| **Web** | `web_tools.py` | `web_search`, `web_extract`, `web_crawl` |
+| **Terminal** | `terminal_tool.py` | `terminal` (local/docker/singularity/modal backends) |
+| **Browser** | `browser_tool.py` | `browser_navigate`, `browser_click`, `browser_type`, etc. |
+| **Vision** | `vision_tools.py` | `vision_analyze` |
+| **Image Gen** | `image_generation_tool.py` | `image_generate` |
+| **Reasoning** | `mixture_of_agents_tool.py` | `mixture_of_agents` |
+| **Skills** | `skills_tool.py` | `skills_categories`, `skills_list`, `skill_view` |
+
+## Tool Registration
+
+Tools are registered in `model_tools.py`:
+
+```python
+# model_tools.py
+TOOL_SCHEMAS = [
+    *WEB_TOOL_SCHEMAS,
+    *TERMINAL_TOOL_SCHEMAS,
+    *BROWSER_TOOL_SCHEMAS,
+    # ...
+]
+
+TOOL_HANDLERS = {
+    "web_search": web_search,
+    "terminal": terminal_tool,
+    "browser_navigate": browser_navigate,
+    # ...
+}
+```
+
+## Toolsets
+
+Tools are grouped into **toolsets** for logical organization (see `toolsets.py`):
+
+```python
+TOOLSETS = {
+    "web": {
+        "description": "Web search and content extraction",
+        "tools": ["web_search", "web_extract", "web_crawl"]
+    },
+    "terminal": {
+        "description": "Command execution",
+        "tools": ["terminal"]
+    },
+    # ...
+}
+```
+
+## Adding a New Tool
+
+1. Create handler function in `tools/your_tool.py`
+2. Define JSON schema following OpenAI format
+3. Register in `model_tools.py` (schemas and handlers)
+4. Add to appropriate toolset in `toolsets.py`
+5. Update `tools/__init__.py` exports
+
+## Stateful Tools
+
+Some tools maintain state across calls within a session:
+
+- **Terminal**: Keeps container/sandbox running between commands
+- **Browser**: Maintains browser session for multi-step navigation
+
+State is managed per `task_id` and cleaned up automatically.
+
+## Skills Tools (Progressive Disclosure)
+
+Skills are on-demand knowledge documents. They use **progressive disclosure** to minimize tokens:
+
+```
+Level 0: skills_categories()     → ["mlops", "devops"]           (~50 tokens)
+Level 1: skills_list(category)   → [{name, description}, ...]   (~3k tokens)
+Level 2: skill_view(name)        → Full content + metadata       (varies)
+Level 3: skill_view(name, path)  → Specific reference file       (varies)
+```
+
+Skill directory structure:
+```
+skills/
+└── mlops/
+    └── axolotl/
+        ├── SKILL.md           # Main instructions (required)
+        ├── references/        # Additional docs
+        └── templates/         # Output formats, configs
+```
+
+SKILL.md uses YAML frontmatter:
+```yaml
+---
+name: axolotl
+description: Fine-tuning LLMs with Axolotl
+tags: [Fine-Tuning, LoRA, DPO]
+---
+```