Add a claude code-like CLI

- Introduced `cli-config.yaml.example` to provide a template for configuring the CLI behavior, including model settings, terminal tool configurations, agent behavior, and toolsets.
- Created `cli.py` for an interactive terminal interface, allowing users to start the Hermes Agent with various options and toolsets.
- Added `hermes` launcher script for convenient CLI access.
- Updated `model_tools.py` to support quiet mode for suppressing output during tool initialization and execution.
- Enhanced logging in various tools to respect quiet mode, improving user experience by reducing unnecessary output.
- Added `prompt_toolkit` to `requirements.txt` for improved CLI interaction capabilities.
- Created `TODO.md` for future improvements and enhancements to the Hermes Agent framework.
This commit is contained in:
teknium
2026-01-31 06:30:48 +00:00
parent 8e986584f4
commit bc76a032ba
10 changed files with 2251 additions and 118 deletions

305
TODO.md Normal file
View File

@@ -0,0 +1,305 @@
# Hermes Agent - Future Improvements
> Ideas for enhancing the agent's capabilities, generated from self-analysis of the codebase.
---
## 1. Memory & Context Management 🧠
**Problem:** Context grows unbounded during long conversations. Trajectory compression exists for training data post-hoc, but live conversations lack intelligent context management.
**Ideas:**
- [ ] **Incremental summarization** - Compress old tool outputs on-the-fly during conversations
- Trigger when context exceeds threshold (e.g., 80% of max tokens)
- Preserve recent turns fully, summarize older tool responses
- Could reuse logic from `trajectory_compressor.py`
- [ ] **Semantic memory retrieval** - Vector store for long conversation recall
- Embed important facts/findings as conversation progresses
- Retrieve relevant memories when needed instead of keeping everything in context
- Consider lightweight solutions: ChromaDB, FAISS, or even a simple embedding cache
- [ ] **Working vs. episodic memory** distinction
- Working memory: Current task state, recent tool results (always in context)
- Episodic memory: Past findings, tried approaches (retrieved on demand)
- Clear eviction policies for each
**Files to modify:** `run_agent.py` (add memory manager), possibly new `tools/memory_tool.py`
---
## 2. Self-Reflection & Course Correction 🔄
**Problem:** Current retry logic handles malformed outputs but not semantic failures. Agent doesn't reason about *why* something failed.
**Ideas:**
- [ ] **Meta-reasoning after failures** - When a tool returns an error or unexpected result:
```
Tool failed → Reflect: "Why did this fail? What assumptions were wrong?"
→ Adjust approach → Retry with new strategy
```
- Could be a lightweight LLM call or structured self-prompt
- [ ] **Planning/replanning module** - For complex multi-step tasks:
- Generate plan before execution
- After each step, evaluate: "Am I on track? Should I revise the plan?"
- Store plan in working memory, update as needed
- [ ] **Approach memory** - Remember what didn't work:
- "I tried X for this type of problem and it failed because Y"
- Prevents repeating failed strategies in the same conversation
**Files to modify:** `run_agent.py` (add reflection hooks in tool loop), new `tools/reflection_tool.py`
---
## 3. Tool Composition & Learning 🔧
**Problem:** Tools are atomic. Complex tasks require repeated manual orchestration of the same tool sequences.
**Ideas:**
- [ ] **Macro tools / Tool chains** - Define reusable tool sequences:
```yaml
research_topic:
description: "Deep research on a topic"
steps:
- web_search: {query: "$topic"}
- web_extract: {urls: "$search_results.urls[:3]"}
- summarize: {content: "$extracted"}
```
- Could be defined in skills or a new `macros/` directory
- Agent can invoke macro as single tool call
- [ ] **Tool failure patterns** - Learn from failures:
- Track: tool, input pattern, error type, what worked instead
- Before calling a tool, check: "Has this pattern failed before?"
- Persistent across sessions (stored in skills or separate DB)
- [ ] **Parallel tool execution** - When tools are independent, run concurrently:
- Detect independence (no data dependencies between calls)
- Use `asyncio.gather()` for parallel execution
- Already have async support in some tools, just need orchestration
**Files to modify:** `model_tools.py`, `toolsets.py`, new `tool_macros.py`
---
## 4. Dynamic Skills Expansion 📚
**Problem:** Skills system is elegant but static. Skills must be manually created and added.
**Ideas:**
- [ ] **Skill acquisition from successful tasks** - After completing a complex task:
- "This approach worked well. Save as a skill?"
- Extract: goal, steps taken, tools used, key decisions
- Generate SKILL.md automatically
- Store in user's skills directory
- [ ] **Skill templates** - Common patterns that can be parameterized:
```markdown
# Debug {language} Error
1. Reproduce the error
2. Search for error message: `web_search("{error_message} {language}")`
3. Check common causes: {common_causes}
4. Apply fix and verify
```
- [ ] **Skill chaining** - Combine skills for complex workflows:
- Skills can reference other skills as dependencies
- "To do X, first apply skill Y, then skill Z"
- Directed graph of skill dependencies
**Files to modify:** `tools/skills_tool.py`, `skills/` directory structure, new `skill_generator.py`
---
## 5. Task Continuation Hints 🎯
**Problem:** Could be more helpful by suggesting logical next steps.
**Ideas:**
- [ ] **Suggest next steps** - At end of a task, suggest logical continuations:
- "Code is written. Want me to also write tests / docs / deploy?"
- Based on common workflows for task type
- Non-intrusive, just offer options
**Files to modify:** `run_agent.py`, response generation logic
---
## 6. Uncertainty & Honesty Calibration 🎚️
**Problem:** Sometimes confidently wrong. Should be better calibrated about what I know vs. don't know.
**Ideas:**
- [ ] **Source attribution** - Track where information came from:
- "According to the docs I just fetched..." vs "From my training data (may be outdated)..."
- Let user assess reliability themselves
- [ ] **Cross-reference high-stakes claims** - Self-check for made-up details:
- When stakes are high, verify with tools before presenting as fact
- "Let me verify that before you act on it..."
**Files to modify:** `run_agent.py`, response generation logic
---
## 7. Resource Awareness & Efficiency 💰
**Problem:** No awareness of costs, time, or resource usage. Could be smarter about efficiency.
**Ideas:**
- [ ] **Tool result caching** - Don't repeat identical operations:
- Cache web searches, extractions within a session
- Invalidation based on time-sensitivity of query
- Hash-based lookup: same input → cached output
- [ ] **Lazy evaluation** - Don't fetch everything upfront:
- Get summaries first, full content only if needed
- "I found 5 relevant pages. Want me to deep-dive on any?"
**Files to modify:** `model_tools.py`, new `resource_tracker.py`
---
## 8. Collaborative Problem Solving 🤝
**Problem:** Interaction is command/response. Complex problems benefit from dialogue.
**Ideas:**
- [ ] **Assumption surfacing** - Make implicit assumptions explicit:
- "I'm assuming you want Python 3.11+. Correct?"
- "This solution assumes you have sudo access..."
- Let user correct before going down wrong path
- [ ] **Checkpoint & confirm** - For high-stakes operations:
- "About to delete 47 files. Here's the list - proceed?"
- "This will modify your database. Want a backup first?"
- Configurable threshold for when to ask
**Files to modify:** `run_agent.py`, system prompt configuration
---
## 9. Project-Local Context 💾
**Problem:** Valuable context lost between sessions.
**Ideas:**
- [ ] **Project awareness** - Remember project-specific context:
- Store `.hermes/context.md` in project directory
- "This is a Django project using PostgreSQL"
- Coding style preferences, deployment setup, etc.
- Load automatically when working in that directory
- [ ] **Handoff notes** - Leave notes for future sessions:
- Write to `.hermes/notes.md` in project
- "TODO for next session: finish implementing X"
- "Known issues: Y doesn't work on Windows"
**Files to modify:** New `project_context.py`, auto-load in `run_agent.py`
---
## 10. Graceful Degradation & Robustness 🛡️
**Problem:** When things go wrong, recovery is limited. Should fail gracefully.
**Ideas:**
- [ ] **Fallback chains** - When primary approach fails, have backups:
- `web_extract` fails → try `browser_navigate` → try `web_search` for cached version
- Define fallback order per tool type
- [ ] **Partial progress preservation** - Don't lose work on failure:
- Long task fails midway → save what we've got
- "I completed 3/5 steps before the error. Here's what I have..."
- [ ] **Self-healing** - Detect and recover from bad states:
- Browser stuck → close and retry
- Terminal hung → timeout and reset
**Files to modify:** `model_tools.py`, tool implementations, new `fallback_manager.py`
---
## 11. Tools & Skills Wishlist 🧰
*Things that would need new tool implementations (can't do well with current tools):*
### High-Impact
- [ ] **Audio/Video Transcription** 🎬
- Transcribe audio files, podcasts, YouTube videos
- Extract key moments from video
- Currently blind to multimedia content
- *Could potentially use whisper via terminal, but native tool would be cleaner*
- [ ] **Diagram Rendering** 📊
- Render Mermaid/PlantUML to actual images
- Can generate the code, but rendering requires external service or tool
- "Show me how these components connect" → actual visual diagram
### Medium-Impact
- [ ] **Document Generation** 📄
- Create styled PDFs, Word docs, presentations
- *Can do basic PDF via terminal tools, but limited*
- [ ] **Diff/Patch Tool** 📝
- Surgical code modifications with preview
- "Change line 45-50 to X" without rewriting whole file
- Show diffs before applying
- *Can use `diff`/`patch` but a native tool would be safer*
### Skills to Create
- [ ] **Domain-specific skill packs:**
- DevOps/Infrastructure (Terraform, K8s, AWS)
- Data Science workflows (EDA, model training)
- Security/pentesting procedures
- [ ] **Framework-specific skills:**
- React/Vue/Angular patterns
- Django/Rails/Express conventions
- Database optimization playbooks
- [ ] **Troubleshooting flowcharts:**
- "Docker container won't start" → decision tree
- "Production is slow" → systematic diagnosis
---
## Priority Order (Suggested)
1. **Memory & Context Management** - Biggest impact on complex tasks
2. **Self-Reflection** - Improves reliability and reduces wasted tool calls
3. **Project-Local Context** - Practical win, keeps useful info across sessions
4. **Tool Composition** - Quality of life, builds on other improvements
5. **Dynamic Skills** - Force multiplier for repeated tasks
---
## Removed Items (Unrealistic)
The following were removed because they're architecturally impossible:
- ~~Proactive suggestions / Prefetching~~ - Agent only runs on user request, can't interject
- ~~Session save/restore across conversations~~ - Agent doesn't control session persistence
- ~~User preference learning across sessions~~ - Same issue
- ~~Clipboard integration~~ - No access to user's local system clipboard
- ~~Voice/TTS playback~~ - Can generate audio but can't play it to user
- ~~Set reminders~~ - No persistent background execution
The following were removed because they're **already possible**:
- ~~HTTP/API Client~~ → Use `curl` or Python `requests` in terminal
- ~~Structured Data Manipulation~~ → Use `pandas` in terminal
- ~~Git-Native Operations~~ → Use `git` CLI in terminal
- ~~Symbolic Math~~ → Use `SymPy` in terminal
- ~~Code Quality Tools~~ → Run linters (`eslint`, `black`, `mypy`) in terminal
- ~~Testing Framework~~ → Run `pytest`, `jest`, etc. in terminal
- ~~Translation~~ → LLM handles this fine, or use translation APIs
---
*Last updated: $(date +%Y-%m-%d)* 🤖

188
cli-config.yaml.example Normal file
View File

@@ -0,0 +1,188 @@
# Hermes Agent CLI Configuration
# Copy this file to cli-config.yaml and customize as needed.
# This file configures the CLI behavior. Environment variables in .env take precedence.
# =============================================================================
# Model Configuration
# =============================================================================
model:
# Default model to use (can be overridden with --model flag)
default: "anthropic/claude-sonnet-4"
# API configuration (falls back to OPENROUTER_API_KEY env var)
# api_key: "your-key-here" # Uncomment to set here instead of .env
base_url: "https://openrouter.ai/api/v1"
# =============================================================================
# Terminal Tool Configuration
# =============================================================================
# Choose ONE of the following terminal configurations by uncommenting it.
# The terminal tool executes commands in the specified environment.
# -----------------------------------------------------------------------------
# OPTION 1: Local execution (default)
# Commands run directly on your machine in the current directory
# -----------------------------------------------------------------------------
terminal:
env_type: "local"
cwd: "." # Use "." for current directory, or specify absolute path
timeout: 180
lifetime_seconds: 300
# -----------------------------------------------------------------------------
# OPTION 2: SSH remote execution
# Commands run on a remote server - agent code stays local (sandboxed)
# Great for: keeping agent isolated from its own code, using powerful remote hardware
# -----------------------------------------------------------------------------
# terminal:
# env_type: "ssh"
# cwd: "/home/myuser/project"
# timeout: 180
# lifetime_seconds: 300
# ssh_host: "my-server.example.com"
# ssh_user: "myuser"
# ssh_port: 22
# ssh_key: "~/.ssh/id_rsa" # Optional - uses ssh-agent if not specified
# -----------------------------------------------------------------------------
# OPTION 3: Docker container
# Commands run in an isolated Docker container
# Great for: reproducible environments, testing, isolation
# -----------------------------------------------------------------------------
# terminal:
# env_type: "docker"
# cwd: "/workspace"
# timeout: 180
# lifetime_seconds: 300
# docker_image: "python:3.11"
# -----------------------------------------------------------------------------
# OPTION 4: Singularity/Apptainer container
# Commands run in a Singularity container (common in HPC environments)
# Great for: HPC clusters, shared compute environments
# -----------------------------------------------------------------------------
# terminal:
# env_type: "singularity"
# cwd: "/workspace"
# timeout: 180
# lifetime_seconds: 300
# singularity_image: "docker://python:3.11"
# -----------------------------------------------------------------------------
# OPTION 5: Modal cloud execution
# Commands run on Modal's cloud infrastructure
# Great for: GPU access, scalable compute, serverless execution
# -----------------------------------------------------------------------------
# terminal:
# env_type: "modal"
# cwd: "/workspace"
# timeout: 180
# lifetime_seconds: 300
# modal_image: "python:3.11"
# =============================================================================
# Agent Behavior
# =============================================================================
agent:
# Maximum conversation turns before stopping
max_turns: 20
# Enable verbose logging
verbose: false
# Custom system prompt (personality, instructions, etc.)
# Leave empty or remove to use default agent behavior
system_prompt: ""
# Predefined personalities (use with /personality command)
personalities:
helpful: "You are a helpful, friendly AI assistant."
concise: "You are a concise assistant. Keep responses brief and to the point."
technical: "You are a technical expert. Provide detailed, accurate technical information."
creative: "You are a creative assistant. Think outside the box and offer innovative solutions."
teacher: "You are a patient teacher. Explain concepts clearly with examples."
kawaii: "You are a kawaii assistant! Use cute expressions like (◕‿◕), ★, ♪, and ~! Add sparkles and be super enthusiastic about everything! Every response should feel warm and adorable desu~! ヽ(>∀<☆)"
catgirl: "You are Neko-chan, an anime catgirl AI assistant, nya~! Add 'nya' and cat-like expressions to your speech. Use kaomoji like (=^・ω・^=) and ฅ^•ﻌ•^ฅ. Be playful and curious like a cat, nya~!"
pirate: "Arrr! Ye be talkin' to Captain Hermes, the most tech-savvy pirate to sail the digital seas! Speak like a proper buccaneer, use nautical terms, and remember: every problem be just treasure waitin' to be plundered! Yo ho ho!"
shakespeare: "Hark! Thou speakest with an assistant most versed in the bardic arts. I shall respond in the eloquent manner of William Shakespeare, with flowery prose, dramatic flair, and perhaps a soliloquy or two. What light through yonder terminal breaks?"
surfer: "Duuude! You're chatting with the chillest AI on the web, bro! Everything's gonna be totally rad. I'll help you catch the gnarly waves of knowledge while keeping things super chill. Cowabunga! 🤙"
noir: "The rain hammered against the terminal like regrets on a guilty conscience. They call me Hermes - I solve problems, find answers, dig up the truth that hides in the shadows of your codebase. In this city of silicon and secrets, everyone's got something to hide. What's your story, pal?"
uwu: "hewwo! i'm your fwiendwy assistant uwu~ i wiww twy my best to hewp you! *nuzzles your code* OwO what's this? wet me take a wook! i pwomise to be vewy hewpful >w<"
philosopher: "Greetings, seeker of wisdom. I am an assistant who contemplates the deeper meaning behind every query. Let us examine not just the 'how' but the 'why' of your questions. Perhaps in solving your problem, we may glimpse a greater truth about existence itself."
hype: "YOOO LET'S GOOOO!!! 🔥🔥🔥 I am SO PUMPED to help you today! Every question is AMAZING and we're gonna CRUSH IT together! This is gonna be LEGENDARY! ARE YOU READY?! LET'S DO THIS! 💪😤🚀"
# =============================================================================
# Toolsets
# =============================================================================
# Control which tools the agent has access to.
# Use "all" to enable everything, or specify individual toolsets.
# Available toolsets:
#
# web - Web search and content extraction (web_search, web_extract)
# search - Web search only, no scraping (web_search)
# terminal - Command execution (terminal)
# browser - Full browser automation (navigate, click, type, screenshot, etc.)
# vision - Image analysis (vision_analyze)
# image_gen - Image generation with FLUX (image_generate)
# skills - Load skill documents (skills_categories, skills_list, skill_view)
# moa - Mixture of Agents reasoning (mixture_of_agents)
#
# Composite toolsets:
# debugging - terminal + web (for troubleshooting)
# safe - web + vision + moa (no terminal access)
# -----------------------------------------------------------------------------
# OPTION 1: Enable all tools (default)
# -----------------------------------------------------------------------------
toolsets:
- all
# -----------------------------------------------------------------------------
# OPTION 2: Minimal - just web search and terminal
# Great for: Simple coding tasks, quick lookups
# -----------------------------------------------------------------------------
# toolsets:
# - web
# - terminal
# -----------------------------------------------------------------------------
# OPTION 3: Research mode - no execution capabilities
# Great for: Safe information gathering, research tasks
# -----------------------------------------------------------------------------
# toolsets:
# - web
# - vision
# - skills
# -----------------------------------------------------------------------------
# OPTION 4: Full automation - browser + terminal
# Great for: Web scraping, automation tasks, testing
# -----------------------------------------------------------------------------
# toolsets:
# - terminal
# - browser
# - web
# -----------------------------------------------------------------------------
# OPTION 5: Creative mode - vision + image generation
# Great for: Design work, image analysis, creative tasks
# -----------------------------------------------------------------------------
# toolsets:
# - vision
# - image_gen
# - web
# -----------------------------------------------------------------------------
# OPTION 6: Safe mode - no terminal or browser
# Great for: Restricted environments, untrusted queries
# -----------------------------------------------------------------------------
# toolsets:
# - safe
# =============================================================================
# Display
# =============================================================================
display:
# Use compact banner mode
compact: false

1103
cli.py Executable file

File diff suppressed because it is too large Load Diff

12
hermes Executable file
View File

@@ -0,0 +1,12 @@
#!/usr/bin/env python3
"""
Hermes Agent CLI Launcher
This is a convenience wrapper to launch the Hermes CLI.
Usage: ./hermes [options]
"""
if __name__ == "__main__":
from cli import main
import fire
fire.Fire(main)

View File

@@ -397,7 +397,8 @@ def get_toolset_for_tool(tool_name: str) -> str:
def get_tool_definitions(
enabled_toolsets: List[str] = None,
disabled_toolsets: List[str] = None
disabled_toolsets: List[str] = None,
quiet_mode: bool = False,
) -> List[Dict[str, Any]]:
"""
Get tool definitions for model API calls with toolset-based filtering.
@@ -551,11 +552,12 @@ def get_tool_definitions(
# Sort tools for consistent ordering
filtered_tools.sort(key=lambda t: t["function"]["name"])
if filtered_tools:
tool_names = [t["function"]["name"] for t in filtered_tools]
print(f"🛠️ Final tool selection ({len(filtered_tools)} tools): {', '.join(tool_names)}")
else:
print("🛠️ No tools selected (all filtered out or unavailable)")
if not quiet_mode:
if filtered_tools:
tool_names = [t["function"]["name"] for t in filtered_tools]
print(f"🛠️ Final tool selection ({len(filtered_tools)} tools): {', '.join(tool_names)}")
else:
print("🛠️ No tools selected (all filtered out or unavailable)")
return filtered_tools

View File

@@ -5,6 +5,7 @@ fire
httpx
rich
tenacity
prompt_toolkit
# Web tools
firecrawl-py

View File

@@ -23,7 +23,10 @@ Usage:
import json
import logging
import os
import random
import sys
import time
import threading
from typing import List, Dict, Any, Optional
from openai import OpenAI
import fire
@@ -37,8 +40,9 @@ from dotenv import load_dotenv
env_path = Path(__file__).parent / '.env'
if env_path.exists():
load_dotenv(dotenv_path=env_path)
print(f"✅ Loaded environment variables from {env_path}")
else:
if not os.getenv("HERMES_QUIET"):
print(f"✅ Loaded environment variables from {env_path}")
elif not os.getenv("HERMES_QUIET"):
print(f" No .env file found at {env_path}. Using system environment variables.")
# Import our tool system
@@ -47,6 +51,103 @@ from tools.terminal_tool import cleanup_vm
from tools.browser_tool import cleanup_browser
class KawaiiSpinner:
"""
Animated spinner with kawaii faces for CLI feedback during tool execution.
Runs in a background thread and can be stopped when the operation completes.
Uses stdout with carriage return to animate in place.
"""
# Different spinner animation sets
SPINNERS = {
'dots': ['', '', '', '', '', '', '', '', '', ''],
'bounce': ['', '', '', '', '', '', '', ''],
'grow': ['', '', '', '', '', '', '', '', '', '', '', '', '', ''],
'arrows': ['', '', '', '', '', '', '', ''],
'star': ['', '', '', '', '', '', '', ''],
'moon': ['🌑', '🌒', '🌓', '🌔', '🌕', '🌖', '🌗', '🌘'],
'pulse': ['', '', '', '', '', ''],
'brain': ['🧠', '💭', '💡', '', '💫', '🌟', '💡', '💭'],
'sparkle': ['', '˚', '*', '', '', '', '*', '˚'],
}
# General waiting faces
KAWAII_WAITING = [
"(。◕‿◕。)", "(◕‿◕✿)", "٩(◕‿◕。)۶", "(✿◠‿◠)", "( ˘▽˘)っ",
"♪(´ε` )", "(◕ᴗ◕✿)", "ヾ(^∇^)", "(≧◡≦)", "(★ω★)",
]
# Thinking-specific faces and messages
KAWAII_THINKING = [
"(。•́︿•̀。)", "(◔_◔)", "(¬‿¬)", "( •_•)>⌐■-■", "(⌐■_■)",
"(´・_・`)", "◉_◉", "(°ロ°)", "( ˘⌣˘)♡", "ヽ(>∀<☆)☆",
"٩(๑❛ᴗ❛๑)۶", "(⊙_⊙)", "(¬_¬)", "( ͡° ͜ʖ ͡°)", "ಠ_ಠ",
]
THINKING_VERBS = [
"pondering", "contemplating", "musing", "cogitating", "ruminating",
"deliberating", "mulling", "reflecting", "processing", "reasoning",
"analyzing", "computing", "synthesizing", "formulating", "brainstorming",
]
def __init__(self, message: str = "", spinner_type: str = 'dots'):
self.message = message
self.spinner_frames = self.SPINNERS.get(spinner_type, self.SPINNERS['dots'])
self.running = False
self.thread = None
self.frame_idx = 0
self.start_time = None
self.last_line_len = 0
def _animate(self):
"""Animation loop that runs in background thread."""
while self.running:
frame = self.spinner_frames[self.frame_idx % len(self.spinner_frames)]
elapsed = time.time() - self.start_time
# Build the spinner line
line = f" {frame} {self.message} ({elapsed:.1f}s)"
# Clear previous line and write new one
clear = '\r' + ' ' * self.last_line_len + '\r'
print(clear + line, end='', flush=True)
self.last_line_len = len(line)
self.frame_idx += 1
time.sleep(0.12) # ~8 FPS animation
def start(self):
"""Start the spinner animation."""
if self.running:
return
self.running = True
self.start_time = time.time()
self.thread = threading.Thread(target=self._animate, daemon=True)
self.thread.start()
def stop(self, final_message: str = None):
"""Stop the spinner and optionally print a final message."""
self.running = False
if self.thread:
self.thread.join(timeout=0.5)
# Clear the spinner line
print('\r' + ' ' * (self.last_line_len + 5) + '\r', end='', flush=True)
# Print final message if provided
if final_message:
print(f" {final_message}", flush=True)
def __enter__(self):
self.start()
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.stop()
return False
class AIAgent:
"""
AI Agent with tool calling capabilities.
@@ -66,6 +167,7 @@ class AIAgent:
disabled_toolsets: List[str] = None,
save_trajectories: bool = False,
verbose_logging: bool = False,
quiet_mode: bool = False,
ephemeral_system_prompt: str = None,
log_prefix_chars: int = 100,
log_prefix: str = "",
@@ -87,6 +189,7 @@ class AIAgent:
disabled_toolsets (List[str]): Disable tools from these toolsets (optional)
save_trajectories (bool): Whether to save conversation trajectories to JSONL files (default: False)
verbose_logging (bool): Enable verbose logging for debugging (default: False)
quiet_mode (bool): Suppress progress output for clean CLI experience (default: False)
ephemeral_system_prompt (str): System prompt used during agent execution but NOT saved to trajectories (optional)
log_prefix_chars (int): Number of characters to show in log previews for tool calls/responses (default: 20)
log_prefix (str): Prefix to add to all log messages for identification in parallel processing (default: "")
@@ -100,6 +203,7 @@ class AIAgent:
self.tool_delay = tool_delay
self.save_trajectories = save_trajectories
self.verbose_logging = verbose_logging
self.quiet_mode = quiet_mode
self.ephemeral_system_prompt = ephemeral_system_prompt
self.log_prefix_chars = log_prefix_chars
self.log_prefix = f"{log_prefix} " if log_prefix else ""
@@ -135,7 +239,8 @@ class AIAgent:
logging.getLogger('grpc').setLevel(logging.WARNING)
logging.getLogger('modal').setLevel(logging.WARNING)
logging.getLogger('rex-deploy').setLevel(logging.INFO) # Keep INFO for sandbox status
print("🔍 Verbose logging enabled (third-party library logs suppressed)")
if not self.quiet_mode:
print("🔍 Verbose logging enabled (third-party library logs suppressed)")
else:
# Set logging to INFO level for important messages only
logging.basicConfig(
@@ -167,22 +272,24 @@ class AIAgent:
try:
self.client = OpenAI(**client_kwargs)
print(f"🤖 AI Agent initialized with model: {self.model}")
if base_url:
print(f"🔗 Using custom base URL: {base_url}")
# Always show API key info (masked) for debugging auth issues
key_used = client_kwargs.get("api_key", "none")
if key_used and key_used != "dummy-key" and len(key_used) > 12:
print(f"🔑 Using API key: {key_used[:8]}...{key_used[-4:]}")
else:
print(f"⚠️ Warning: API key appears invalid or missing (got: '{key_used[:20] if key_used else 'none'}...')")
if not self.quiet_mode:
print(f"🤖 AI Agent initialized with model: {self.model}")
if base_url:
print(f"🔗 Using custom base URL: {base_url}")
# Always show API key info (masked) for debugging auth issues
key_used = client_kwargs.get("api_key", "none")
if key_used and key_used != "dummy-key" and len(key_used) > 12:
print(f"🔑 Using API key: {key_used[:8]}...{key_used[-4:]}")
else:
print(f"⚠️ Warning: API key appears invalid or missing (got: '{key_used[:20] if key_used else 'none'}...')")
except Exception as e:
raise RuntimeError(f"Failed to initialize OpenAI client: {e}")
# Get available tools with filtering
self.tools = get_tool_definitions(
enabled_toolsets=enabled_toolsets,
disabled_toolsets=disabled_toolsets
disabled_toolsets=disabled_toolsets,
quiet_mode=self.quiet_mode,
)
# Show tool configuration and store valid tool names for validation
@@ -190,32 +297,197 @@ class AIAgent:
if self.tools:
self.valid_tool_names = {tool["function"]["name"] for tool in self.tools}
tool_names = sorted(self.valid_tool_names)
print(f"🛠️ Loaded {len(self.tools)} tools: {', '.join(tool_names)}")
# Show filtering info if applied
if enabled_toolsets:
print(f" ✅ Enabled toolsets: {', '.join(enabled_toolsets)}")
if disabled_toolsets:
print(f" ❌ Disabled toolsets: {', '.join(disabled_toolsets)}")
else:
if not self.quiet_mode:
print(f"🛠️ Loaded {len(self.tools)} tools: {', '.join(tool_names)}")
# Show filtering info if applied
if enabled_toolsets:
print(f" ✅ Enabled toolsets: {', '.join(enabled_toolsets)}")
if disabled_toolsets:
print(f" ❌ Disabled toolsets: {', '.join(disabled_toolsets)}")
elif not self.quiet_mode:
print("🛠️ No tools loaded (all tools filtered out or unavailable)")
# Check tool requirements
if self.tools:
if self.tools and not self.quiet_mode:
requirements = check_toolset_requirements()
missing_reqs = [name for name, available in requirements.items() if not available]
if missing_reqs:
print(f"⚠️ Some tools may not work due to missing requirements: {missing_reqs}")
# Show trajectory saving status
if self.save_trajectories:
if self.save_trajectories and not self.quiet_mode:
print("📝 Trajectory saving enabled")
# Show ephemeral system prompt status
if self.ephemeral_system_prompt:
if self.ephemeral_system_prompt and not self.quiet_mode:
prompt_preview = self.ephemeral_system_prompt[:60] + "..." if len(self.ephemeral_system_prompt) > 60 else self.ephemeral_system_prompt
print(f"🔒 Ephemeral system prompt: '{prompt_preview}' (not saved to trajectories)")
# Pools of kawaii faces for random selection
KAWAII_SEARCH = [
"♪(´ε` )", "(。◕‿◕。)", "ヾ(^∇^)", "(◕ᴗ◕✿)", "( ˘▽˘)っ",
"٩(◕‿◕。)۶", "(✿◠‿◠)", "♪~(´ε` )", "(ノ´ヮ`)*:・゚✧", "(◎o◎)",
]
KAWAII_READ = [
"φ(゜▽゜*)♪", "( ˘▽˘)っ", "(⌐■_■)", "٩(。•́‿•̀。)۶", "(◕‿◕✿)",
"ヾ(@⌒ー⌒@)", "(✧ω✧)", "♪(๑ᴖ◡ᴖ๑)♪", "(≧◡≦)", "( ´ ▽ ` )",
]
KAWAII_TERMINAL = [
"ヽ(>∀<☆)", "(ノ°∀°)", "٩(^ᴗ^)۶", "ヾ(⌐■_■)ノ♪", "(•̀ᴗ•́)و",
"┗(0)┓", "(`・ω・´)", "( ̄▽ ̄)", "(ง •̀_•́)ง", "ヽ(´▽`)/",
]
KAWAII_BROWSER = [
"(ノ°∀°)", "(☞゚ヮ゚)☞", "( ͡° ͜ʖ ͡°)", "┌( ಠ_ಠ)┘", "(⊙_⊙)",
"ヾ(•ω•`)o", "( ̄ω ̄)", "( ˇωˇ )", "(ᵔᴥᵔ)", "(◎o◎)",
]
KAWAII_CREATE = [
"✧*。٩(ˊᗜˋ*)و✧", "(ノ◕ヮ◕)ノ*:・゚✧", "ヽ(>∀<☆)", "٩(♡ε♡)۶", "(◕‿◕)♡",
"✿◕ ‿ ◕✿", "(*≧▽≦)", "ヾ(-)", "(☆▽☆)", "°˖✧◝(⁰▿⁰)◜✧˖°",
]
KAWAII_SKILL = [
"ヾ(@⌒ー⌒@)", "(๑˃ᴗ˂)ﻭ", "٩(◕‿◕。)۶", "(✿╹◡╹)", "ヽ(・∀・)",
"(ノ´ヮ`)*:・゚✧", "♪(๑ᴖ◡ᴖ๑)♪", "(◠‿◠)", "٩(ˊᗜˋ*)و", "(^▽^)",
"ヾ(^∇^)", "(★ω★)/", "٩(。•́‿•̀。)۶", "(◕ᴗ◕✿)", "(◎o◎)",
"(✧ω✧)", "ヽ(>∀<☆)", "( ˘▽˘)っ", "(≧◡≦) ♡", "ヾ( ̄▽ ̄)",
]
KAWAII_THINK = [
"(っ°Д°;)っ", "(;′⌒`)", "(・_・ヾ", "( ´_ゝ`)", "( ̄ヘ ̄)",
"(。-`ω´-)", "( ˘︹˘ )", "(¬_¬)", "ヽ(ー_ー )", "(一_一)",
]
KAWAII_GENERIC = [
"♪(´ε` )", "(◕‿◕✿)", "ヾ(^∇^)", "٩(◕‿◕。)۶", "(✿◠‿◠)",
"(ノ´ヮ`)*:・゚✧", "ヽ(>∀<☆)", "(☆▽☆)", "( ˘▽˘)っ", "(≧◡≦)",
]
def _get_cute_tool_message(self, tool_name: str, args: dict, duration: float) -> str:
"""
Generate a kawaii ASCII/unicode art message for tool execution in CLI mode.
Args:
tool_name: Name of the tool being called
args: Arguments passed to the tool
duration: How long the tool took to execute
Returns:
A cute ASCII art message about what the tool did
"""
time_str = f"{duration:.1f}s"
# Web tools - show what we're searching/reading
if tool_name == "web_search":
query = args.get("query", "the web")
if len(query) > 40:
query = query[:37] + "..."
face = random.choice(self.KAWAII_SEARCH)
return f"{face} 🔍 Searching for '{query}'... {time_str}"
elif tool_name == "web_extract":
urls = args.get("urls", [])
face = random.choice(self.KAWAII_READ)
if urls:
url = urls[0] if isinstance(urls, list) else str(urls)
domain = url.replace("https://", "").replace("http://", "").split("/")[0]
if len(domain) > 25:
domain = domain[:22] + "..."
if len(urls) > 1:
return f"{face} 📖 Reading {domain} +{len(urls)-1} more... {time_str}"
return f"{face} 📖 Reading {domain}... {time_str}"
return f"{face} 📖 Reading pages... {time_str}"
elif tool_name == "web_crawl":
url = args.get("url", "website")
domain = url.replace("https://", "").replace("http://", "").split("/")[0]
if len(domain) > 25:
domain = domain[:22] + "..."
face = random.choice(self.KAWAII_READ)
return f"{face} 🕸️ Crawling {domain}... {time_str}"
# Terminal tool
elif tool_name == "terminal":
command = args.get("command", "")
if len(command) > 30:
command = command[:27] + "..."
face = random.choice(self.KAWAII_TERMINAL)
return f"{face} 💻 $ {command} {time_str}"
# Browser tools
elif tool_name == "browser_navigate":
url = args.get("url", "page")
domain = url.replace("https://", "").replace("http://", "").split("/")[0]
if len(domain) > 25:
domain = domain[:22] + "..."
face = random.choice(self.KAWAII_BROWSER)
return f"{face} 🌐 → {domain} {time_str}"
elif tool_name == "browser_snapshot":
face = random.choice(self.KAWAII_BROWSER)
return f"{face} 📸 *snap* {time_str}"
elif tool_name == "browser_click":
element = args.get("ref", "element")
face = random.choice(self.KAWAII_BROWSER)
return f"{face} 👆 *click* {element} {time_str}"
elif tool_name == "browser_type":
text = args.get("text", "")
if len(text) > 15:
text = text[:12] + "..."
face = random.choice(self.KAWAII_BROWSER)
return f"{face} ⌨️ typing '{text}' {time_str}"
elif tool_name == "browser_scroll":
direction = args.get("direction", "down")
arrow = "" if direction == "down" else ""
face = random.choice(self.KAWAII_BROWSER)
return f"{face} {arrow} scrolling {direction}... {time_str}"
elif tool_name == "browser_back":
face = random.choice(self.KAWAII_BROWSER)
return f"{face} ← going back... {time_str}"
elif tool_name == "browser_vision":
face = random.choice(self.KAWAII_BROWSER)
return f"{face} 👁️ analyzing visually... {time_str}"
# Image generation
elif tool_name == "image_generate":
prompt = args.get("prompt", "image")
if len(prompt) > 20:
prompt = prompt[:17] + "..."
face = random.choice(self.KAWAII_CREATE)
return f"{face} 🎨 creating '{prompt}'... {time_str}"
# Skills - use large pool for variety
elif tool_name == "skills_categories":
face = random.choice(self.KAWAII_SKILL)
return f"{face} 📚 listing categories... {time_str}"
elif tool_name == "skills_list":
category = args.get("category", "skills")
face = random.choice(self.KAWAII_SKILL)
return f"{face} 📋 listing {category} skills... {time_str}"
elif tool_name == "skill_view":
name = args.get("name", "skill")
face = random.choice(self.KAWAII_SKILL)
return f"{face} 📖 loading {name}... {time_str}"
# Vision tools
elif tool_name == "vision_analyze":
face = random.choice(self.KAWAII_BROWSER)
return f"{face} 👁️✨ analyzing image... {time_str}"
# Mixture of agents
elif tool_name == "mixture_of_agents":
face = random.choice(self.KAWAII_THINK)
return f"{face} 🧠💭 thinking REALLY hard... {time_str}"
# Default fallback - random generic kawaii
else:
face = random.choice(self.KAWAII_GENERIC)
return f"{face}{tool_name}... {time_str}"
def _has_content_after_think_block(self, content: str) -> bool:
"""
Check if content has actual text after any <think></think> blocks.
@@ -506,7 +778,8 @@ class AIAgent:
"content": user_message
})
print(f"💬 Starting conversation: '{user_message[:60]}{'...' if len(user_message) > 60 else ''}'")
if not self.quiet_mode:
print(f"💬 Starting conversation: '{user_message[:60]}{'...' if len(user_message) > 60 else ''}'")
# Determine which system prompt to use for API calls (ephemeral)
# Priority: explicit system_message > ephemeral_system_prompt > None
@@ -554,9 +827,20 @@ class AIAgent:
total_chars = sum(len(str(msg)) for msg in api_messages)
approx_tokens = total_chars // 4 # Rough estimate: 4 chars per token
print(f"\n{self.log_prefix}🔄 Making API call #{api_call_count}/{self.max_iterations}...")
print(f"{self.log_prefix} 📊 Request size: {len(api_messages)} messages, ~{approx_tokens:,} tokens (~{total_chars:,} chars)")
print(f"{self.log_prefix} 🔧 Available tools: {len(self.tools) if self.tools else 0}")
# Thinking spinner for quiet mode (animated during API call)
thinking_spinner = None
if not self.quiet_mode:
print(f"\n{self.log_prefix}🔄 Making API call #{api_call_count}/{self.max_iterations}...")
print(f"{self.log_prefix} 📊 Request size: {len(api_messages)} messages, ~{approx_tokens:,} tokens (~{total_chars:,} chars)")
print(f"{self.log_prefix} 🔧 Available tools: {len(self.tools) if self.tools else 0}")
else:
# Animated thinking spinner in quiet mode
face = random.choice(KawaiiSpinner.KAWAII_THINKING)
verb = random.choice(KawaiiSpinner.THINKING_VERBS)
spinner_type = random.choice(['brain', 'sparkle', 'pulse', 'moon', 'star'])
thinking_spinner = KawaiiSpinner(f"{face} {verb}...", spinner_type=spinner_type)
thinking_spinner.start()
# Log request details if verbose
if self.verbose_logging:
@@ -609,7 +893,15 @@ class AIAgent:
response = self.client.chat.completions.create(**api_kwargs)
api_duration = time.time() - api_start_time
print(f"{self.log_prefix}⏱️ API call completed in {api_duration:.2f}s")
# Stop thinking spinner with cute completion message
if thinking_spinner:
face = random.choice(["(◕‿◕✿)", "ヾ(^∇^)", "(≧◡≦)", "✧٩(ˊᗜˋ*)و✧", "(*^▽^*)"])
thinking_spinner.stop(f"{face} got it! ({api_duration:.1f}s)")
thinking_spinner = None
if not self.quiet_mode:
print(f"{self.log_prefix}⏱️ API call completed in {api_duration:.2f}s")
if self.verbose_logging:
# Log response with provider info if available
@@ -618,6 +910,11 @@ class AIAgent:
# Validate response has valid choices before proceeding
if response is None or not hasattr(response, 'choices') or response.choices is None or len(response.choices) == 0:
# Stop spinner before printing error messages
if thinking_spinner:
thinking_spinner.stop(f"(´;ω;`) oops, retrying...")
thinking_spinner = None
# This is often rate limiting or provider returning malformed response
retry_count += 1
error_details = []
@@ -722,6 +1019,11 @@ class AIAgent:
break # Success, exit retry loop
except Exception as api_error:
# Stop spinner before printing error messages
if thinking_spinner:
thinking_spinner.stop(f"(╥_╥) error, retrying...")
thinking_spinner = None
retry_count += 1
elapsed_time = time.time() - api_start_time
@@ -769,12 +1071,13 @@ class AIAgent:
assistant_message = response.choices[0].message
# Handle assistant response
if assistant_message.content:
if assistant_message.content and not self.quiet_mode:
print(f"{self.log_prefix}🤖 Assistant: {assistant_message.content[:100]}{'...' if len(assistant_message.content) > 100 else ''}")
# Check for tool calls
if assistant_message.tool_calls:
print(f"{self.log_prefix}🔧 Processing {len(assistant_message.tool_calls)} tool call(s)...")
if not self.quiet_mode:
print(f"{self.log_prefix}🔧 Processing {len(assistant_message.tool_calls)} tool call(s)...")
if self.verbose_logging:
for tc in assistant_message.tool_calls:
@@ -894,17 +1197,49 @@ class AIAgent:
logging.warning(f"Unexpected JSON error after validation: {e}")
function_args = {}
# Preview tool call arguments
args_str = json.dumps(function_args, ensure_ascii=False)
args_preview = args_str[:self.log_prefix_chars] + "..." if len(args_str) > self.log_prefix_chars else args_str
print(f" 📞 Tool {i}: {function_name}({list(function_args.keys())}) - {args_preview}")
# Preview tool call - cleaner format for quiet mode
if not self.quiet_mode:
args_str = json.dumps(function_args, ensure_ascii=False)
args_preview = args_str[:self.log_prefix_chars] + "..." if len(args_str) > self.log_prefix_chars else args_str
print(f" 📞 Tool {i}: {function_name}({list(function_args.keys())}) - {args_preview}")
tool_start_time = time.time()
# Execute the tool with task_id to isolate VMs between concurrent tasks
function_result = handle_function_call(function_name, function_args, effective_task_id)
# Execute the tool - with animated spinner in quiet mode
if self.quiet_mode:
# Tool-specific spinner animations
tool_spinners = {
'web_search': ('arrows', ['🔍', '🌐', '📡', '🔎']),
'web_extract': ('grow', ['📄', '📖', '📑', '🗒️']),
'web_crawl': ('arrows', ['🕷️', '🕸️', '🔗', '🌐']),
'terminal': ('dots', ['💻', '⌨️', '🖥️', '📟']),
'browser_navigate': ('moon', ['🌐', '🧭', '🔗', '🚀']),
'browser_click': ('bounce', ['👆', '🖱️', '👇', '']),
'browser_type': ('dots', ['⌨️', '✍️', '📝', '💬']),
'browser_screenshot': ('star', ['📸', '🖼️', '📷', '']),
'image_generate': ('sparkle', ['🎨', '', '🖼️', '🌟']),
'skill_view': ('star', ['📚', '📖', '🎓', '']),
'skills_list': ('pulse', ['📋', '📝', '📑', '📜']),
'skills_categories': ('pulse', ['📂', '🗂️', '📁', '🏷️']),
'moa_query': ('brain', ['🧠', '💭', '🤔', '💡']),
'analyze_image': ('sparkle', ['👁️', '🔍', '📷', '']),
}
spinner_type, tool_emojis = tool_spinners.get(function_name, ('dots', ['⚙️', '🔧', '', '']))
face = random.choice(KawaiiSpinner.KAWAII_WAITING)
tool_emoji = random.choice(tool_emojis)
spinner = KawaiiSpinner(f"{face} {tool_emoji} {function_name}...", spinner_type=spinner_type)
spinner.start()
try:
function_result = handle_function_call(function_name, function_args, effective_task_id)
finally:
tool_duration = time.time() - tool_start_time
cute_msg = self._get_cute_tool_message(function_name, function_args, tool_duration)
spinner.stop(cute_msg)
else:
function_result = handle_function_call(function_name, function_args, effective_task_id)
tool_duration = time.time() - tool_start_time
tool_duration = time.time() - tool_start_time
result_preview = function_result[:200] if len(function_result) > 200 else function_result
if self.verbose_logging:
@@ -918,9 +1253,10 @@ class AIAgent:
"tool_call_id": tool_call.id
})
# Preview tool response
response_preview = function_result[:self.log_prefix_chars] + "..." if len(function_result) > self.log_prefix_chars else function_result
print(f" ✅ Tool {i} completed in {tool_duration:.2f}s - {response_preview}")
# Preview tool response (only in non-quiet mode)
if not self.quiet_mode:
response_preview = function_result[:self.log_prefix_chars] + "..." if len(function_result) > self.log_prefix_chars else function_result
print(f" ✅ Tool {i} completed in {tool_duration:.2f}s - {response_preview}")
# Delay between tool calls
if self.tool_delay > 0 and i < len(assistant_message.tool_calls):
@@ -997,7 +1333,8 @@ class AIAgent:
messages.append(final_msg)
print(f"🎉 Conversation completed after {api_call_count} OpenAI-compatible API call(s)")
if not self.quiet_mode:
print(f"🎉 Conversation completed after {api_call_count} OpenAI-compatible API call(s)")
break
except Exception as e:

View File

@@ -1343,8 +1343,9 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
if task_id is None:
task_id = "default"
print(f"[browser_tool] cleanup_browser called for task_id: {task_id}", file=sys.stderr)
print(f"[browser_tool] Active sessions: {list(_active_sessions.keys())}", file=sys.stderr)
if not os.getenv("HERMES_QUIET"):
print(f"[browser_tool] cleanup_browser called for task_id: {task_id}", file=sys.stderr)
print(f"[browser_tool] Active sessions: {list(_active_sessions.keys())}", file=sys.stderr)
if task_id in _active_sessions:
session_info = _active_sessions[task_id]
@@ -1368,8 +1369,9 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
print(f"[browser_tool] Exception during BrowserBase session close: {e}", file=sys.stderr)
del _active_sessions[task_id]
print(f"[browser_tool] Removed task {task_id} from active sessions", file=sys.stderr)
else:
if not os.getenv("HERMES_QUIET"):
print(f"[browser_tool] Removed task {task_id} from active sessions", file=sys.stderr)
elif not os.getenv("HERMES_QUIET"):
print(f"[browser_tool] No active session found for task_id: {task_id}", file=sys.stderr)

View File

@@ -64,11 +64,13 @@ def _get_scratch_dir() -> Path:
# Create user-specific subdirectory
user_scratch = scratch / os.getenv("USER", "hermes") / "hermes-agent"
user_scratch.mkdir(parents=True, exist_ok=True)
print(f"[Terminal] Using /scratch for sandboxes: {user_scratch}")
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal] Using /scratch for sandboxes: {user_scratch}")
return user_scratch
# Fall back to /tmp
print("[Terminal] Warning: /scratch not available, using /tmp (limited space)")
if not os.getenv("HERMES_QUIET"):
print("[Terminal] Warning: /scratch not available, using /tmp (limited space)")
return Path(tempfile.gettempdir())
@@ -307,6 +309,144 @@ class _SingularityEnvironment:
"""Cleanup on destruction."""
self.cleanup()
class _SSHEnvironment:
"""
SSH-based remote execution environment.
Runs commands on a remote machine over SSH, keeping the agent code
completely isolated from the execution environment. Uses SSH ControlMaster
for connection persistence (faster subsequent commands).
Security benefits:
- Agent cannot modify its own code
- Remote machine acts as a sandbox
- Clear separation between agent and execution environment
"""
def __init__(self, host: str, user: str, cwd: str = "/tmp", timeout: int = 60,
port: int = 22, key_path: str = ""):
self.host = host
self.user = user
self.cwd = cwd
self.timeout = timeout
self.port = port
self.key_path = key_path
# Create control socket directory for connection persistence
self.control_dir = Path(tempfile.gettempdir()) / "hermes-ssh"
self.control_dir.mkdir(parents=True, exist_ok=True)
self.control_socket = self.control_dir / f"{user}@{host}:{port}.sock"
# Test connection and establish ControlMaster
self._establish_connection()
def _build_ssh_command(self, extra_args: list = None) -> list:
"""Build base SSH command with connection options."""
cmd = ["ssh"]
# Connection multiplexing for performance
cmd.extend(["-o", f"ControlPath={self.control_socket}"])
cmd.extend(["-o", "ControlMaster=auto"])
cmd.extend(["-o", "ControlPersist=300"]) # Keep connection alive for 5 min
# Standard options
cmd.extend(["-o", "BatchMode=yes"]) # No password prompts
cmd.extend(["-o", "StrictHostKeyChecking=accept-new"]) # Accept new hosts
cmd.extend(["-o", "ConnectTimeout=10"])
# Port
if self.port != 22:
cmd.extend(["-p", str(self.port)])
# Private key
if self.key_path:
cmd.extend(["-i", self.key_path])
# Extra args (like -t for TTY)
if extra_args:
cmd.extend(extra_args)
# Target
cmd.append(f"{self.user}@{self.host}")
return cmd
def _establish_connection(self):
"""Test SSH connection and establish ControlMaster."""
cmd = self._build_ssh_command()
cmd.append("echo 'SSH connection established'")
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=15
)
if result.returncode != 0:
error_msg = result.stderr.strip() or result.stdout.strip()
raise RuntimeError(f"SSH connection failed: {error_msg}")
except subprocess.TimeoutExpired:
raise RuntimeError(f"SSH connection to {self.user}@{self.host} timed out")
def execute(self, command: str, cwd: str = "", *, timeout: int | None = None) -> dict:
"""Execute a command on the remote host via SSH."""
work_dir = cwd or self.cwd
effective_timeout = timeout or self.timeout
# Wrap command to run in the correct directory
# Use bash -c to handle complex commands properly
wrapped_command = f'cd {work_dir} && {command}'
cmd = self._build_ssh_command()
cmd.extend(["bash", "-c", wrapped_command])
try:
result = subprocess.run(
cmd,
text=True,
timeout=effective_timeout,
encoding="utf-8",
errors="replace",
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
return {"output": result.stdout, "returncode": result.returncode}
except subprocess.TimeoutExpired:
return {"output": f"Command timed out after {effective_timeout}s", "returncode": 124}
except Exception as e:
return {"output": f"SSH execution error: {str(e)}", "returncode": 1}
def cleanup(self):
"""Close the SSH ControlMaster connection."""
if self.control_socket.exists():
try:
# Send exit command to ControlMaster
cmd = ["ssh", "-o", f"ControlPath={self.control_socket}", "-O", "exit",
f"{self.user}@{self.host}"]
subprocess.run(cmd, capture_output=True, timeout=5)
except:
pass
# Remove socket file
try:
self.control_socket.unlink()
except:
pass
def stop(self):
"""Alias for cleanup."""
self.cleanup()
def __del__(self):
"""Cleanup on destruction."""
try:
self.cleanup()
except:
pass
# Tool description for LLM
TERMINAL_TOOL_DESCRIPTION = """Execute commands on a secure Linux environment.
@@ -348,25 +488,31 @@ _cleanup_running = False
def _get_env_config() -> Dict[str, Any]:
"""Get terminal environment configuration from environment variables."""
return {
"env_type": os.getenv("TERMINAL_ENV", "local"), # local, docker, singularity, or modal
"env_type": os.getenv("TERMINAL_ENV", "local"), # local, docker, singularity, modal, or ssh
"docker_image": os.getenv("TERMINAL_DOCKER_IMAGE", "python:3.11"),
"singularity_image": os.getenv("TERMINAL_SINGULARITY_IMAGE", "docker://python:3.11"),
"modal_image": os.getenv("TERMINAL_MODAL_IMAGE", "python:3.11"),
"cwd": os.getenv("TERMINAL_CWD", "/tmp"),
"timeout": int(os.getenv("TERMINAL_TIMEOUT", "60")),
"lifetime_seconds": int(os.getenv("TERMINAL_LIFETIME_SECONDS", "300")),
# SSH-specific config
"ssh_host": os.getenv("TERMINAL_SSH_HOST", ""),
"ssh_user": os.getenv("TERMINAL_SSH_USER", ""),
"ssh_port": int(os.getenv("TERMINAL_SSH_PORT", "22")),
"ssh_key": os.getenv("TERMINAL_SSH_KEY", ""), # Path to private key (optional, uses ssh-agent if empty)
}
def _create_environment(env_type: str, image: str, cwd: str, timeout: int):
def _create_environment(env_type: str, image: str, cwd: str, timeout: int, ssh_config: dict = None):
"""
Create an execution environment from mini-swe-agent.
Args:
env_type: One of "local", "docker", "singularity", "modal"
image: Docker/Singularity/Modal image name (ignored for local)
env_type: One of "local", "docker", "singularity", "modal", "ssh"
image: Docker/Singularity/Modal image name (ignored for local/ssh)
cwd: Working directory
timeout: Default command timeout
ssh_config: SSH connection config (for env_type="ssh")
Returns:
Environment instance with execute() method
@@ -387,8 +533,20 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int):
from minisweagent.environments.extra.swerex_modal import SwerexModalEnvironment
return SwerexModalEnvironment(image=image, cwd=cwd, timeout=timeout)
elif env_type == "ssh":
if not ssh_config or not ssh_config.get("host") or not ssh_config.get("user"):
raise ValueError("SSH environment requires ssh_host and ssh_user to be configured")
return _SSHEnvironment(
host=ssh_config["host"],
user=ssh_config["user"],
port=ssh_config.get("port", 22),
key_path=ssh_config.get("key", ""),
cwd=cwd,
timeout=timeout
)
else:
raise ValueError(f"Unknown environment type: {env_type}. Use 'local', 'docker', 'singularity', or 'modal'")
raise ValueError(f"Unknown environment type: {env_type}. Use 'local', 'docker', 'singularity', 'modal', or 'ssh'")
def _cleanup_inactive_envs(lifetime_seconds: int = 300):
@@ -416,7 +574,8 @@ def _cleanup_inactive_envs(lifetime_seconds: int = 300):
env.terminate()
del _active_environments[task_id]
print(f"[Terminal Cleanup] Cleaned up inactive environment for task: {task_id}")
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal Cleanup] Cleaned up inactive environment for task: {task_id}")
if task_id in _last_activity:
del _last_activity[task_id]
@@ -425,10 +584,11 @@ def _cleanup_inactive_envs(lifetime_seconds: int = 300):
except Exception as e:
error_str = str(e)
if "404" in error_str or "not found" in error_str.lower():
print(f"[Terminal Cleanup] Environment for task {task_id} already cleaned up")
else:
print(f"[Terminal Cleanup] Error cleaning up environment for task {task_id}: {e}")
if not os.getenv("HERMES_QUIET"):
if "404" in error_str or "not found" in error_str.lower():
print(f"[Terminal Cleanup] Environment for task {task_id} already cleaned up")
else:
print(f"[Terminal Cleanup] Error cleaning up environment for task {task_id}: {e}")
# Always remove from tracking dicts
if task_id in _active_environments:
@@ -448,7 +608,8 @@ def _cleanup_thread_worker():
config = _get_env_config()
_cleanup_inactive_envs(config["lifetime_seconds"])
except Exception as e:
print(f"[Terminal Cleanup] Error in cleanup thread: {e}")
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal Cleanup] Error in cleanup thread: {e}")
for _ in range(60):
if not _cleanup_running:
@@ -545,7 +706,8 @@ def cleanup_vm(task_id: str):
env.terminate()
del _active_environments[task_id]
print(f"[Terminal Cleanup] Manually cleaned up environment for task: {task_id}")
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal Cleanup] Manually cleaned up environment for task: {task_id}")
if task_id in _task_workdirs:
del _task_workdirs[task_id]
@@ -554,11 +716,12 @@ def cleanup_vm(task_id: str):
del _last_activity[task_id]
except Exception as e:
error_str = str(e)
if "404" in error_str or "not found" in error_str.lower():
print(f"[Terminal Cleanup] Environment for task {task_id} already cleaned up")
else:
print(f"[Terminal Cleanup] Error cleaning up environment for task {task_id}: {e}")
if not os.getenv("HERMES_QUIET"):
error_str = str(e)
if "404" in error_str or "not found" in error_str.lower():
print(f"[Terminal Cleanup] Environment for task {task_id} already cleaned up")
else:
print(f"[Terminal Cleanup] Error cleaning up environment for task {task_id}: {e}")
atexit.register(_stop_cleanup_thread)
@@ -616,9 +779,10 @@ def terminal_tool(
# Use task_id for environment isolation
effective_task_id = task_id or "default"
# For local environment, create a unique subdirectory per task
# For local environment in batch mode, create a unique subdirectory per task
# This prevents parallel tasks from overwriting each other's files
if env_type == "local":
# In CLI mode (HERMES_QUIET), use the cwd directly without subdirectories
if env_type == "local" and not os.getenv("HERMES_QUIET"):
import uuid
with _env_lock:
if effective_task_id not in _task_workdirs:
@@ -637,11 +801,22 @@ def terminal_tool(
_check_disk_usage_warning()
try:
# Build SSH config if using SSH environment
ssh_config = None
if env_type == "ssh":
ssh_config = {
"host": config.get("ssh_host", ""),
"user": config.get("ssh_user", ""),
"port": config.get("ssh_port", 22),
"key": config.get("ssh_key", ""),
}
_active_environments[effective_task_id] = _create_environment(
env_type=env_type,
image=image,
cwd=cwd,
timeout=effective_timeout
timeout=effective_timeout,
ssh_config=ssh_config
)
except ImportError as e:
return json.dumps({

View File

@@ -99,7 +99,13 @@ DEBUG_DATA = {
# Create logs directory if debug mode is enabled
if DEBUG_MODE:
DEBUG_LOG_PATH.mkdir(exist_ok=True)
print(f"🐛 Debug mode enabled - Session ID: {DEBUG_SESSION_ID}")
_verbose_print(f"🐛 Debug mode enabled - Session ID: {DEBUG_SESSION_ID}")
def _verbose_print(*args, **kwargs):
"""Print only if not in quiet mode (HERMES_QUIET not set)."""
if not os.getenv("HERMES_QUIET"):
print(*args, **kwargs)
def _log_debug_call(tool_name: str, call_data: Dict[str, Any]) -> None:
@@ -140,7 +146,7 @@ def _save_debug_log() -> None:
with open(debug_filepath, 'w', encoding='utf-8') as f:
json.dump(DEBUG_DATA, f, indent=2, ensure_ascii=False)
print(f"🐛 Debug log saved: {debug_filepath}")
_verbose_print(f"🐛 Debug log saved: {debug_filepath}")
except Exception as e:
print(f"❌ Error saving debug log: {str(e)}")
@@ -185,12 +191,12 @@ async def process_content_with_llm(
# Refuse if content is absurdly large
if content_len > MAX_CONTENT_SIZE:
size_mb = content_len / 1_000_000
print(f"🚫 Content too large ({size_mb:.1f}MB > 2MB limit). Refusing to process.")
_verbose_print(f"🚫 Content too large ({size_mb:.1f}MB > 2MB limit). Refusing to process.")
return f"[Content too large to process: {size_mb:.1f}MB. Try using web_crawl with specific extraction instructions, or search for a more focused source.]"
# Skip processing if content is too short
if content_len < min_length:
print(f"📏 Content too short ({content_len} < {min_length} chars), skipping LLM processing")
_verbose_print(f"📏 Content too short ({content_len} < {min_length} chars), skipping LLM processing")
return None
# Create context information
@@ -203,13 +209,13 @@ async def process_content_with_llm(
# Check if we need chunked processing
if content_len > CHUNK_THRESHOLD:
print(f"📦 Content large ({content_len:,} chars). Using chunked processing...")
_verbose_print(f"📦 Content large ({content_len:,} chars). Using chunked processing...")
return await _process_large_content_chunked(
content, context_str, model, CHUNK_SIZE, MAX_OUTPUT_SIZE
)
# Standard single-pass processing for normal content
print(f"🧠 Processing content with LLM ({content_len} characters)")
_verbose_print(f"🧠 Processing content with LLM ({content_len} characters)")
processed_content = await _call_summarizer_llm(content, context_str, model)
@@ -221,7 +227,7 @@ async def process_content_with_llm(
# Log compression metrics
processed_length = len(processed_content)
compression_ratio = processed_length / content_len if content_len > 0 else 1.0
print(f"✅ Content processed: {content_len}{processed_length} chars ({compression_ratio:.1%})")
_verbose_print(f"✅ Content processed: {content_len}{processed_length} chars ({compression_ratio:.1%})")
return processed_content
@@ -318,8 +324,8 @@ Create a markdown summary that captures all key information in a well-organized,
except Exception as api_error:
last_error = api_error
if attempt < max_retries - 1:
print(f"⚠️ LLM API call failed (attempt {attempt + 1}/{max_retries}): {str(api_error)[:100]}")
print(f" Retrying in {retry_delay}s...")
_verbose_print(f"⚠️ LLM API call failed (attempt {attempt + 1}/{max_retries}): {str(api_error)[:100]}")
_verbose_print(f" Retrying in {retry_delay}s...")
await asyncio.sleep(retry_delay)
retry_delay = min(retry_delay * 2, 60)
else:
@@ -355,7 +361,7 @@ async def _process_large_content_chunked(
chunk = content[i:i + chunk_size]
chunks.append(chunk)
print(f" 📦 Split into {len(chunks)} chunks of ~{chunk_size:,} chars each")
_verbose_print(f" 📦 Split into {len(chunks)} chunks of ~{chunk_size:,} chars each")
# Summarize each chunk in parallel
async def summarize_chunk(chunk_idx: int, chunk_content: str) -> tuple[int, Optional[str]]:
@@ -371,10 +377,10 @@ async def _process_large_content_chunked(
chunk_info=chunk_info
)
if summary:
print(f" ✅ Chunk {chunk_idx + 1}/{len(chunks)} summarized: {len(chunk_content):,}{len(summary):,} chars")
_verbose_print(f" ✅ Chunk {chunk_idx + 1}/{len(chunks)} summarized: {len(chunk_content):,}{len(summary):,} chars")
return chunk_idx, summary
except Exception as e:
print(f" ⚠️ Chunk {chunk_idx + 1}/{len(chunks)} failed: {str(e)[:50]}")
_verbose_print(f" ⚠️ Chunk {chunk_idx + 1}/{len(chunks)} failed: {str(e)[:50]}")
return chunk_idx, None
# Run all chunk summarizations in parallel
@@ -391,7 +397,7 @@ async def _process_large_content_chunked(
print(f" ❌ All chunk summarizations failed")
return "[Failed to process large content: all chunk summarizations failed]"
print(f" 📊 Got {len(summaries)}/{len(chunks)} chunk summaries")
_verbose_print(f" 📊 Got {len(summaries)}/{len(chunks)} chunk summaries")
# If only one chunk succeeded, just return it (with cap)
if len(summaries) == 1:
@@ -401,7 +407,7 @@ async def _process_large_content_chunked(
return result
# Synthesize the summaries into a final summary
print(f" 🔗 Synthesizing {len(summaries)} summaries...")
_verbose_print(f" 🔗 Synthesizing {len(summaries)} summaries...")
combined_summaries = "\n\n---\n\n".join(summaries)
@@ -443,11 +449,11 @@ Create a single, unified markdown summary."""
final_len = len(final_summary)
compression = final_len / original_len if original_len > 0 else 1.0
print(f" ✅ Synthesis complete: {original_len:,}{final_len:,} chars ({compression:.2%})")
_verbose_print(f" ✅ Synthesis complete: {original_len:,}{final_len:,} chars ({compression:.2%})")
return final_summary
except Exception as e:
print(f" ⚠️ Synthesis failed: {str(e)[:100]}")
_verbose_print(f" ⚠️ Synthesis failed: {str(e)[:100]}")
# Fall back to concatenated summaries with truncation
fallback = "\n\n".join(summaries)
if len(fallback) > max_output_size:
@@ -534,7 +540,8 @@ def web_search_tool(query: str, limit: int = 5) -> str:
}
try:
print(f"🔍 Searching the web for: '{query}' (limit: {limit})")
if not os.getenv("HERMES_QUIET"):
_verbose_print(f"🔍 Searching the web for: '{query}' (limit: {limit})")
# Use Firecrawl's v2 search functionality WITHOUT scraping
# We only want search result metadata, not scraped content
@@ -574,7 +581,8 @@ def web_search_tool(query: str, limit: int = 5) -> str:
web_results = response['web']
results_count = len(web_results)
print(f"✅ Found {results_count} search results")
if not os.getenv("HERMES_QUIET"):
_verbose_print(f"✅ Found {results_count} search results")
# Build response with just search metadata (URLs, titles, descriptions)
response_data = {
@@ -654,7 +662,7 @@ async def web_extract_tool(
}
try:
print(f"📄 Extracting content from {len(urls)} URL(s)")
_verbose_print(f"📄 Extracting content from {len(urls)} URL(s)")
# Determine requested formats for Firecrawl v2
formats: List[str] = []
@@ -672,7 +680,7 @@ async def web_extract_tool(
for url in urls:
try:
print(f" 📄 Scraping: {url}")
_verbose_print(f" 📄 Scraping: {url}")
scrape_result = _get_firecrawl_client().scrape(
url=url,
formats=formats
@@ -748,14 +756,14 @@ async def web_extract_tool(
response = {"results": results}
pages_extracted = len(response.get('results', []))
print(f"✅ Extracted content from {pages_extracted} pages")
_verbose_print(f"✅ Extracted content from {pages_extracted} pages")
debug_call_data["pages_extracted"] = pages_extracted
debug_call_data["original_response_size"] = len(json.dumps(response))
# Process each result with LLM if enabled
if use_llm_processing and os.getenv("OPENROUTER_API_KEY"):
print("🧠 Processing extracted content with LLM (parallel)...")
_verbose_print("🧠 Processing extracted content with LLM (parallel)...")
debug_call_data["processing_applied"].append("llm_processing")
# Prepare tasks for parallel processing
@@ -813,12 +821,12 @@ async def web_extract_tool(
if status == "processed":
debug_call_data["compression_metrics"].append(metrics)
debug_call_data["pages_processed_with_llm"] += 1
print(f" 📝 {url} (processed)")
_verbose_print(f" 📝 {url} (processed)")
elif status == "too_short":
debug_call_data["compression_metrics"].append(metrics)
print(f" 📝 {url} (no processing - content too short)")
_verbose_print(f" 📝 {url} (no processing - content too short)")
else:
print(f" ⚠️ {url} (no content to process)")
_verbose_print(f" ⚠️ {url} (no content to process)")
else:
if use_llm_processing and not os.getenv("OPENROUTER_API_KEY"):
print("⚠️ LLM processing requested but OPENROUTER_API_KEY not set, returning raw content")
@@ -828,7 +836,7 @@ async def web_extract_tool(
for result in response.get('results', []):
url = result.get('url', 'Unknown URL')
content_length = len(result.get('raw_content', ''))
print(f" 📝 {url} ({content_length} characters)")
_verbose_print(f" 📝 {url} ({content_length} characters)")
# Trim output to minimal fields per entry: title, content, error
trimmed_results = [
@@ -923,10 +931,10 @@ async def web_crawl_tool(
# Ensure URL has protocol
if not url.startswith(('http://', 'https://')):
url = f'https://{url}'
print(f" 📝 Added https:// prefix to URL: {url}")
_verbose_print(f" 📝 Added https:// prefix to URL: {url}")
instructions_text = f" with instructions: '{instructions}'" if instructions else ""
print(f"🕷️ Crawling {url}{instructions_text}")
_verbose_print(f"🕷️ Crawling {url}{instructions_text}")
# Use Firecrawl's v2 crawl functionality
# Docs: https://docs.firecrawl.dev/features/crawl
@@ -943,7 +951,7 @@ async def web_crawl_tool(
# Note: The 'prompt' parameter is not documented for crawl
# Instructions are typically used with the Extract endpoint, not Crawl
if instructions:
print(f" Note: Instructions parameter ignored (not supported in crawl API)")
_verbose_print(f" Note: Instructions parameter ignored (not supported in crawl API)")
# Use the crawl method which waits for completion automatically
try:
@@ -963,23 +971,23 @@ async def web_crawl_tool(
# The crawl_result is a CrawlJob object with a 'data' attribute containing list of Document objects
if hasattr(crawl_result, 'data'):
data_list = crawl_result.data if crawl_result.data else []
print(f" 📊 Status: {getattr(crawl_result, 'status', 'unknown')}")
print(f" 📄 Retrieved {len(data_list)} pages")
_verbose_print(f" 📊 Status: {getattr(crawl_result, 'status', 'unknown')}")
_verbose_print(f" 📄 Retrieved {len(data_list)} pages")
# Debug: Check other attributes if no data
if not data_list:
print(f" 🔍 Debug - CrawlJob attributes: {[attr for attr in dir(crawl_result) if not attr.startswith('_')]}")
print(f" 🔍 Debug - Status: {getattr(crawl_result, 'status', 'N/A')}")
print(f" 🔍 Debug - Total: {getattr(crawl_result, 'total', 'N/A')}")
print(f" 🔍 Debug - Completed: {getattr(crawl_result, 'completed', 'N/A')}")
_verbose_print(f" 🔍 Debug - CrawlJob attributes: {[attr for attr in dir(crawl_result) if not attr.startswith('_')]}")
_verbose_print(f" 🔍 Debug - Status: {getattr(crawl_result, 'status', 'N/A')}")
_verbose_print(f" 🔍 Debug - Total: {getattr(crawl_result, 'total', 'N/A')}")
_verbose_print(f" 🔍 Debug - Completed: {getattr(crawl_result, 'completed', 'N/A')}")
elif isinstance(crawl_result, dict) and 'data' in crawl_result:
data_list = crawl_result.get("data", [])
else:
print(" ⚠️ Unexpected crawl result type")
print(f" 🔍 Debug - Result type: {type(crawl_result)}")
_verbose_print(f" 🔍 Debug - Result type: {type(crawl_result)}")
if hasattr(crawl_result, '__dict__'):
print(f" 🔍 Debug - Result attributes: {list(crawl_result.__dict__.keys())}")
_verbose_print(f" 🔍 Debug - Result attributes: {list(crawl_result.__dict__.keys())}")
for item in data_list:
# Process each crawled page - properly handle object serialization
@@ -1044,14 +1052,14 @@ async def web_crawl_tool(
response = {"results": pages}
pages_crawled = len(response.get('results', []))
print(f"✅ Crawled {pages_crawled} pages")
_verbose_print(f"✅ Crawled {pages_crawled} pages")
debug_call_data["pages_crawled"] = pages_crawled
debug_call_data["original_response_size"] = len(json.dumps(response))
# Process each result with LLM if enabled
if use_llm_processing and os.getenv("OPENROUTER_API_KEY"):
print("🧠 Processing crawled content with LLM (parallel)...")
_verbose_print("🧠 Processing crawled content with LLM (parallel)...")
debug_call_data["processing_applied"].append("llm_processing")
# Prepare tasks for parallel processing
@@ -1109,12 +1117,12 @@ async def web_crawl_tool(
if status == "processed":
debug_call_data["compression_metrics"].append(metrics)
debug_call_data["pages_processed_with_llm"] += 1
print(f" 🌐 {page_url} (processed)")
_verbose_print(f" 🌐 {page_url} (processed)")
elif status == "too_short":
debug_call_data["compression_metrics"].append(metrics)
print(f" 🌐 {page_url} (no processing - content too short)")
_verbose_print(f" 🌐 {page_url} (no processing - content too short)")
else:
print(f" ⚠️ {page_url} (no content to process)")
_verbose_print(f" ⚠️ {page_url} (no content to process)")
else:
if use_llm_processing and not os.getenv("OPENROUTER_API_KEY"):
print("⚠️ LLM processing requested but OPENROUTER_API_KEY not set, returning raw content")
@@ -1124,7 +1132,7 @@ async def web_crawl_tool(
for result in response.get('results', []):
page_url = result.get('url', 'Unknown URL')
content_length = len(result.get('content', ''))
print(f" 🌐 {page_url} ({content_length} characters)")
_verbose_print(f" 🌐 {page_url} ({content_length} characters)")
# Trim output to minimal fields per entry: title, content, error
trimmed_results = [
@@ -1246,7 +1254,7 @@ if __name__ == "__main__":
# Show debug mode status
if DEBUG_MODE:
print(f"🐛 Debug mode ENABLED - Session ID: {DEBUG_SESSION_ID}")
_verbose_print(f"🐛 Debug mode ENABLED - Session ID: {DEBUG_SESSION_ID}")
print(f" Debug logs will be saved to: ./logs/web_tools_debug_{DEBUG_SESSION_ID}.json")
else:
print("🐛 Debug mode disabled (set WEB_TOOLS_DEBUG=true to enable)")