Files
hermes-agent/TODO.md
teknium1 619c72e566 Enhance CLI with multi-platform messaging integration and configuration management
- Updated CLI to load configuration from user-specific and project-specific YAML files, prioritizing user settings.
- Introduced a new command `/platforms` to display the status of connected messaging platforms (Telegram, Discord, WhatsApp).
- Implemented a gateway system for handling messaging interactions, including session management and delivery routing for cron job outputs.
- Added support for environment variable configuration and a dedicated gateway configuration file for advanced settings.
- Enhanced documentation in README.md and added a new messaging.md file to guide users on platform integrations and setup.
- Updated toolsets to include platform-specific capabilities for Telegram, Discord, and WhatsApp, ensuring secure and tailored interactions.
2026-02-02 19:01:51 -08:00

30 KiB

Hermes Agent - Future Improvements

Ideas for enhancing the agent's capabilities, generated from self-analysis of the codebase.


🚨 HIGH PRIORITY - Immediate Fixes

These items need to be addressed ASAP:

1. SUDO Breaking Terminal Tool 🔐 COMPLETE

  • Problem: SUDO commands break the terminal tool execution (hangs indefinitely)
  • Fix: Created custom environment wrappers in tools/terminal_tool.py
    • stdin=subprocess.DEVNULL prevents hanging on interactive prompts
    • Sudo fails gracefully with clear error if no password configured
    • Same UX as Claude Code - agent sees error, tells user to run it themselves
  • All 5 environments now have consistent behavior:
    • _LocalEnvironment - local execution
    • _DockerEnvironment - Docker containers
    • _SingularityEnvironment - Singularity/Apptainer containers
    • _ModalEnvironment - Modal cloud sandboxes
    • _SSHEnvironment - remote SSH execution
  • Optional sudo support via SUDO_PASSWORD env var:
    • Shared _transform_sudo_command() helper used by all environments
    • If set, auto-transforms sudo cmd → pipes password via sudo -S
    • Documented in .env.example, cli-config.yaml, and README
    • Works for chained commands: cmd1 && sudo cmd2
  • Interactive sudo prompt in CLI mode:
    • When sudo detected and no password configured, prompts user
    • 45-second timeout (auto-skips if no input)
    • Hidden password input via getpass (password not visible)
    • Password cached for session (don't ask repeatedly)
    • Spinner pauses during prompt for clean UX
    • Uses HERMES_INTERACTIVE env var to detect CLI mode

2. Fix browser_get_images Tool 🖼️ VERIFIED WORKING

  • Tested: Tool works correctly on multiple sites
  • Results: Successfully extracts image URLs, alt text, dimensions
  • Note: Some sites (Pixabay, etc.) have Cloudflare bot protection that blocks headless browsers - this is expected behavior, not a bug

3. Better Action Logging for Debugging 📝 COMPLETE

  • Problem: Need better logging of agent actions for debugging
  • Implementation:
    • Save full session trajectories to logs/ directory as JSON
    • Each session gets a unique file: session_YYYYMMDD_HHMMSS_UUID.json
    • Logs all messages, tool calls with inputs/outputs, timestamps
    • Structured JSON format for easy parsing and replay
    • Automatic on CLI runs (configurable)

4. Automatic Context Compression 🗜️ COMPLETE

  • Problem: Long conversations exceed model context limits, causing errors
  • Solution: Auto-compress middle turns when approaching limit
  • Implementation:
    • Fetches model context lengths from OpenRouter /api/v1/models API (cached 1hr)
    • Tracks actual token usage from API responses (usage.prompt_tokens)
    • Triggers at 85% of model's context limit (configurable)
    • Protects first 3 turns (system, initial request, first response)
    • Protects last 4 turns (recent context most relevant)
    • Summarizes middle turns using fast model (Gemini Flash)
    • Inserts summary as user message, conversation continues seamlessly
    • If context error occurs, attempts compression before failing
  • Configuration (cli-config.yaml / env vars):
    • CONTEXT_COMPRESSION_ENABLED (default: true)
    • CONTEXT_COMPRESSION_THRESHOLD (default: 0.85 = 85%)
    • CONTEXT_COMPRESSION_MODEL (default: google/gemini-2.0-flash-001)

5. Stream Thinking Summaries in Real-Time 💭 ⏸️ DEFERRED

  • Problem: Thinking/reasoning summaries not shown while streaming
  • Complexity: This is a significant refactor - leaving for later

OpenRouter Streaming Info:

  • Uses stream=True with OpenAI SDK
  • Reasoning comes in choices[].delta.reasoning_details chunks
  • Types: reasoning.summary, reasoning.text, reasoning.encrypted
  • Tool call arguments stream as partial JSON (need accumulation)
  • Items paradigm: same ID emitted multiple times with updated content

Key Challenges:

  • Tool call JSON accumulation (partial {"query": "wea{"query": "weather"})
  • Multiple concurrent outputs (thinking + tool calls + text simultaneously)
  • State management for partial responses
  • Error handling if connection drops mid-stream
  • Deciding when tool calls are "complete" enough to execute

UX Questions to Resolve:

  • Show raw thinking text or summarized?
  • Live expanding text vs. spinner replacement?
  • Markdown rendering while streaming?
  • How to handle thinking + tool call display simultaneously?

Implementation Options:

  • New run_conversation_streaming() method (keep non-streaming as fallback)
  • Wrapper that handles streaming internally
  • Big refactor of existing run_conversation()

References:


1. Subagent Architecture (Context Isolation) 🎯

Problem: Long-running tools (terminal commands, browser automation, complex file operations) consume massive context. A single ls -la can add hundreds of lines. Browser snapshots, debugging sessions, and iterative terminal work quickly bloat the main conversation, leaving less room for actual reasoning.

Solution: The main agent becomes an orchestrator that delegates context-heavy tasks to subagents.

Architecture:

┌─────────────────────────────────────────────────────────────────┐
│  ORCHESTRATOR (main agent)                                      │
│  - Receives user request                                        │
│  - Plans approach                                               │
│  - Delegates heavy tasks to subagents                           │
│  - Receives summarized results                                  │
│  - Maintains clean, focused context                             │
└─────────────────────────────────────────────────────────────────┘
         │                    │                    │
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│ TERMINAL AGENT  │  │ BROWSER AGENT   │  │ CODE AGENT      │
│ - terminal tool │  │ - browser tools │  │ - file tools    │
│ - file tools    │  │ - web_search    │  │ - terminal      │
│                 │  │ - web_extract   │  │                 │
│ Isolated context│  │ Isolated context│  │ Isolated context│
│ Returns summary │  │ Returns summary │  │ Returns summary │
└─────────────────┘  └─────────────────┘  └─────────────────┘

How it works:

  1. User asks: "Set up a new Python project with FastAPI and tests"
  2. Orchestrator plans: "I need to create files, install deps, write code"
  3. Orchestrator calls: terminal_task(goal="Create venv, install fastapi pytest", context="New project in ~/myapp")
  4. Subagent spawns with fresh context, only terminal/file tools
  5. Subagent iterates (may take 10+ tool calls, lots of output)
  6. Subagent completes → returns summary: "Created venv, installed fastapi==0.109.0, pytest==8.0.0"
  7. Orchestrator receives only the summary, context stays clean
  8. Orchestrator continues with next subtask

Key tools to implement:

  • terminal_task(goal, context, cwd?) - Delegate terminal/shell work
  • browser_task(goal, context, start_url?) - Delegate web research/automation
  • code_task(goal, context, files?) - Delegate code writing/modification
  • Generic delegate_task(goal, context, toolsets=[]) - Flexible delegation

Implementation details:

  • Subagent uses same run_agent.py but with:
    • Fresh/empty conversation history
    • Limited toolset (only what's needed)
    • Smaller max_iterations (focused task)
    • Task-specific system prompt
  • Subagent returns structured result:
    {
      "success": True,
      "summary": "Installed 3 packages, created 2 files",
      "details": "Optional longer explanation if needed",
      "artifacts": ["~/myapp/requirements.txt", "~/myapp/main.py"],  # Files created
      "errors": []  # Any issues encountered
    }
    
  • Orchestrator sees only the summary in its context
  • Full subagent transcript saved separately for debugging

Benefits:

  • 🧹 Clean context - Orchestrator stays focused, doesn't drown in tool output
  • 📊 Better token efficiency - 50 terminal outputs → 1 summary paragraph
  • 🎯 Focused subagents - Each agent has just the tools it needs
  • 🔄 Parallel potential - Independent subtasks could run concurrently
  • 🐛 Easier debugging - Each subtask has its own isolated transcript

When to use subagents vs direct tools:

  • Subagent: Multi-step tasks, iteration likely, lots of output expected
  • Direct: Quick one-off commands, simple file reads, user needs to see output

Files to modify: run_agent.py (add orchestration mode), new tools/delegate_tools.py, new subagent_runner.py


2. Planning & Task Management 📋

Problem: Agent handles tasks reactively without explicit planning. Complex multi-step tasks lack structure, progress tracking, and the ability to decompose work into manageable chunks.

Ideas:

  • Task decomposition tool - Break complex requests into subtasks:

    User: "Set up a new Python project with FastAPI, tests, and Docker"
    
    Agent creates plan:
    ├── 1. Create project structure and requirements.txt
    ├── 2. Implement FastAPI app skeleton
    ├── 3. Add pytest configuration and initial tests
    ├── 4. Create Dockerfile and docker-compose.yml
    └── 5. Verify everything works together
    
    • Each subtask becomes a trackable unit
    • Agent can report progress: "Completed 3/5 tasks"
  • Progress checkpoints - Periodic self-assessment:

    • After N tool calls or time elapsed, pause to evaluate
    • "What have I accomplished? What remains? Am I on track?"
    • Detect if stuck in loops or making no progress
    • Could trigger replanning if approach isn't working
  • Explicit plan storage - Persist plan in conversation:

    • Store as structured data (not just in context)
    • Update status as tasks complete
    • User can ask "What's the plan?" or "What's left?"
    • Survives context compression (plans are protected)
  • Failure recovery with replanning - When things go wrong:

    • Record what failed and why
    • Revise plan to work around the issue
    • "Step 3 failed because X, adjusting approach to Y"
    • Prevents repeating failed strategies

Files to modify: run_agent.py (add planning hooks), new tools/planning_tool.py


3. Tool Composition & Learning 🔧

Problem: Tools are atomic. Complex tasks require repeated manual orchestration of the same tool sequences.

Ideas:

  • Macro tools / Tool chains - Define reusable tool sequences:

    research_topic:
      description: "Deep research on a topic"
      steps:
        - web_search: {query: "$topic"}
        - web_extract: {urls: "$search_results.urls[:3]"}
        - summarize: {content: "$extracted"}
    
    • Could be defined in skills or a new macros/ directory
    • Agent can invoke macro as single tool call
  • Tool failure patterns - Learn from failures:

    • Track: tool, input pattern, error type, what worked instead
    • Before calling a tool, check: "Has this pattern failed before?"
    • Persistent across sessions (stored in skills or separate DB)
  • Parallel tool execution - When tools are independent, run concurrently:

    • Detect independence (no data dependencies between calls)
    • Use asyncio.gather() for parallel execution
    • Already have async support in some tools, just need orchestration

Files to modify: model_tools.py, toolsets.py, new tool_macros.py


4. Dynamic Skills Expansion 📚

Problem: Skills system is elegant but static. Skills must be manually created and added.

Ideas:

  • Skill acquisition from successful tasks - After completing a complex task:

    • "This approach worked well. Save as a skill?"
    • Extract: goal, steps taken, tools used, key decisions
    • Generate SKILL.md automatically
    • Store in user's skills directory
  • Skill templates - Common patterns that can be parameterized:

    # Debug {language} Error
    1. Reproduce the error
    2. Search for error message: `web_search("{error_message} {language}")`
    3. Check common causes: {common_causes}
    4. Apply fix and verify
    
  • Skill chaining - Combine skills for complex workflows:

    • Skills can reference other skills as dependencies
    • "To do X, first apply skill Y, then skill Z"
    • Directed graph of skill dependencies

Files to modify: tools/skills_tool.py, skills/ directory structure, new skill_generator.py


5. Interactive Clarifying Questions Tool

Problem: Agent sometimes makes assumptions or guesses when it should ask the user. Currently can only ask via text, which gets lost in long outputs.

Ideas:

  • Multiple-choice prompt tool - Let agent present structured choices to user:

    ask_user_choice(
      question="Should the language switcher enable only German or all languages?",
      choices=[
        "Only enable German - works immediately",
        "Enable all, mark untranslated - show fallback notice",
        "Let me specify something else"
      ]
    )
    
    • Renders as interactive terminal UI with arrow key / Tab navigation
    • User selects option, result returned to agent
    • Up to 4 choices + optional free-text option
  • Implementation:

    • Use inquirer or questionary Python library for rich terminal prompts
    • Tool returns selected option text (or user's custom input)
    • CLI-only - only works when running via cli.py (not API/programmatic use)
    • Graceful fallback: if not in interactive mode, return error asking agent to rephrase as text
  • Use cases:

    • Clarify ambiguous requirements before starting work
    • Confirm destructive operations with clear options
    • Let user choose between implementation approaches
    • Checkpoint complex multi-step workflows

Files to modify: New tools/ask_user_tool.py, cli.py (detect interactive mode), model_tools.py


6. Collaborative Problem Solving 🤝

Problem: Interaction is command/response. Complex problems benefit from dialogue.

Ideas:

  • Assumption surfacing - Make implicit assumptions explicit:

    • "I'm assuming you want Python 3.11+. Correct?"
    • "This solution assumes you have sudo access..."
    • Let user correct before going down wrong path
  • Checkpoint & confirm - For high-stakes operations:

    • "About to delete 47 files. Here's the list - proceed?"
    • "This will modify your database. Want a backup first?"
    • Configurable threshold for when to ask

Files to modify: run_agent.py, system prompt configuration


7. Project-Local Context 💾

Problem: Valuable context lost between sessions.

Ideas:

  • Project awareness - Remember project-specific context:

    • Store .hermes/context.md in project directory
    • "This is a Django project using PostgreSQL"
    • Coding style preferences, deployment setup, etc.
    • Load automatically when working in that directory
  • Handoff notes - Leave notes for future sessions:

    • Write to .hermes/notes.md in project
    • "TODO for next session: finish implementing X"
    • "Known issues: Y doesn't work on Windows"

Files to modify: New project_context.py, auto-load in run_agent.py


8. Graceful Degradation & Robustness 🛡️

Problem: When things go wrong, recovery is limited. Should fail gracefully.

Ideas:

  • Fallback chains - When primary approach fails, have backups:

    • web_extract fails → try browser_navigate → try web_search for cached version
    • Define fallback order per tool type
  • Partial progress preservation - Don't lose work on failure:

    • Long task fails midway → save what we've got
    • "I completed 3/5 steps before the error. Here's what I have..."
  • Self-healing - Detect and recover from bad states:

    • Browser stuck → close and retry
    • Terminal hung → timeout and reset

Files to modify: model_tools.py, tool implementations, new fallback_manager.py


9. Tools & Skills Wishlist 🧰

Things that would need new tool implementations (can't do well with current tools):

High-Impact

  • Audio/Video Transcription 🎬 (See also: Section 16 for detailed spec)

    • Transcribe audio files, podcasts, YouTube videos
    • Extract key moments from video
    • Voice memo transcription for messaging integrations
    • Provider options: Whisper API, Deepgram, local Whisper
  • Diagram Rendering 📊

    • Render Mermaid/PlantUML to actual images
    • Can generate the code, but rendering requires external service or tool
    • "Show me how these components connect" → actual visual diagram

Medium-Impact

  • Canvas / Visual Workspace 🖼️

    • Agent-controlled visual panel for rendering interactive UI
    • Inspired by OpenClaw's Canvas feature
    • Capabilities:
      • present / hide - Show/hide the canvas panel
      • navigate - Load HTML files or URLs into the canvas
      • eval - Execute JavaScript in the canvas context
      • snapshot - Capture the rendered UI as an image
    • Use cases:
      • Display generated HTML/CSS/JS previews
      • Show interactive data visualizations (charts, graphs)
      • Render diagrams (Mermaid → rendered output)
      • Present structured information in rich format
      • A2UI-style component system for structured agent UI
    • Implementation options:
      • Electron-based panel for CLI
      • WebSocket-connected web app
      • VS Code webview extension
    • Would let agent "show" things rather than just describe them
  • Document Generation 📄

    • Create styled PDFs, Word docs, presentations
    • Can do basic PDF via terminal tools, but limited
  • Diff/Patch Tool 📝

    • Surgical code modifications with preview
    • "Change line 45-50 to X" without rewriting whole file
    • Show diffs before applying
    • Can use diff/patch but a native tool would be safer

Skills to Create

  • Domain-specific skill packs:

    • DevOps/Infrastructure (Terraform, K8s, AWS)
    • Data Science workflows (EDA, model training)
    • Security/pentesting procedures
  • Framework-specific skills:

    • React/Vue/Angular patterns
    • Django/Rails/Express conventions
    • Database optimization playbooks
  • Troubleshooting flowcharts:

    • "Docker container won't start" → decision tree
    • "Production is slow" → systematic diagnosis

10. Messaging Platform Integrations 💬 COMPLETE

Problem: Agent currently only works via cli.py which requires direct terminal access. Users may want to interact via messaging apps from their phone or other devices.

Architecture:

  • run_agent.py already accepts conversation_history parameter and returns updated messages
  • Need: persistent session storage, platform monitors, session key resolution

Implementation approach:

┌─────────────────────────────────────────────────────────────┐
│  Platform Monitor (e.g., telegram_monitor.py)               │
│  ├─ Long-running daemon connecting to messaging platform    │
│  ├─ On message: resolve session key → load history from disk│
│  ├─ Call run_agent.py with loaded history                   │
│  ├─ Save updated history back to disk (JSONL)               │
│  └─ Send response back to platform                          │
└─────────────────────────────────────────────────────────────┘

Platform support (each user sets up their own credentials):

  • Telegram - via python-telegram-bot
    • Bot token from @BotFather
    • Easiest to set up, good for personal use
  • Discord - via discord.py
    • Bot token from Discord Developer Portal
    • Can work in servers (group sessions) or DMs
  • WhatsApp - via Node.js bridge (whatsapp-web.js/baileys)
    • Requires Node.js bridge setup
    • More complex, but reaches most people

Session management:

  • Session store - JSONL persistence per session key
    • ~/.hermes/sessions/{session_id}.jsonl
    • Session keys: agent:main:telegram:dm, agent:main:discord:group:123, etc.
  • Session expiry - Configurable reset policies
    • Daily reset (default 4am) OR idle timeout (default 2 hours)
    • Manual reset via /reset or /new command in chat
    • Per-platform and per-type overrides
  • Session continuity - Conversations persist across messages until reset

Files created: gateway/, gateway/platforms/, gateway/config.py, gateway/session.py, gateway/delivery.py, gateway/run.py

Configuration:

  • Environment variables: TELEGRAM_BOT_TOKEN, DISCORD_BOT_TOKEN, etc.
  • Config file: ~/.hermes/gateway.json
  • CLI commands: /platforms to check status, --gateway to start

Dynamic context injection:

  • Agent knows its source platform and chat
  • Agent knows connected platforms and home channels
  • Agent can deliver cron outputs to specific platforms

11. Scheduled Tasks / Cron Jobs COMPLETE

Problem: Agent only runs on-demand. Some tasks benefit from scheduled execution (daily summaries, monitoring, reminders).

Solution Implemented:

  • Cron-style scheduler - Run agent turns on a schedule

    • Jobs stored in ~/.hermes/cron/jobs.json
    • Each job: { id, name, prompt, schedule, repeat, enabled, next_run_at, ... }
    • Built-in scheduler daemon or system cron integration
  • Schedule formats:

    • Duration: 30m, 2h, 1d (one-shot delay)
    • Interval: every 30m, every 2h (recurring)
    • Cron expression: 0 9 * * * (requires croniter package)
    • ISO timestamp: 2026-02-03T14:00:00 (one-shot at specific time)
  • Repeat options:

    • repeat=None (or omit): One-shot schedules run once; intervals/cron run forever
    • repeat=1: Run once then auto-delete
    • repeat=N: Run exactly N times then auto-delete
  • CLI interface:

    # List scheduled jobs
    /cron
    /cron list
    
    # Add a one-shot job (runs once in 30 minutes)
    /cron add 30m "Remind me to check the build status"
    
    # Add a recurring job (every 2 hours)
    /cron add "every 2h" "Check server status at 192.168.1.100"
    
    # Add a cron expression (daily at 9am)
    /cron add "0 9 * * *" "Generate morning briefing"
    
    # Remove a job
    /cron remove <job_id>
    
  • Agent self-scheduling tools (hermes-cli toolset):

    • schedule_cronjob(prompt, schedule, name?, repeat?) - Create a scheduled task
    • list_cronjobs() - View all scheduled jobs
    • remove_cronjob(job_id) - Cancel a job
    • Tool descriptions emphasize: cronjobs run in isolated sessions with NO context
  • Daemon modes:

    # Built-in daemon (checks every 60 seconds)
    python cli.py --cron-daemon
    
    # Single tick for system cron integration
    python cli.py --cron-tick-once
    
  • Output storage: ~/.hermes/cron/output/{job_id}/{timestamp}.md

Files created: cron/__init__.py, cron/jobs.py, cron/scheduler.py, tools/cronjob_tools.py

Toolset: hermes-cli (default for CLI) includes cronjob tools; not in batch runner toolsets


12. Text-to-Speech (TTS) 🔊

Problem: Agent can only respond with text. Some users prefer audio responses (accessibility, hands-free use, podcasts).

Ideas:

  • TTS tool - Generate audio files from text

    tts_generate(text="Here's your summary...", voice="nova", output="summary.mp3")
    
    • Returns path to generated audio file
    • For messaging integrations: can send as voice message
  • Provider options:

    • Edge TTS (free, good quality, many voices)
    • OpenAI TTS (paid, excellent quality)
    • ElevenLabs (paid, best quality, voice cloning)
    • Local options (Coqui TTS, Bark)
  • Modes:

    • On-demand: User explicitly asks "read this to me"
    • Auto-TTS: Configurable to always generate audio for responses
    • Long-text handling: Summarize or chunk very long responses
  • Integration with messaging:

    • When enabled, can send voice notes instead of/alongside text
    • User preference per channel

Files to create: tools/tts_tool.py, config in cli-config.yaml


13. Speech-to-Text / Audio Transcription 🎤

Problem: Users may want to send voice memos instead of typing. Agent is blind to audio content.

Ideas:

  • Voice memo transcription - For messaging integrations

    • User sends voice message → transcribe → process as text
    • Seamless: user speaks, agent responds
  • Audio/video file transcription - Existing idea, expanded:

    • Transcribe local audio files (mp3, wav, m4a)
    • Transcribe YouTube videos (download audio → transcribe)
    • Extract key moments with timestamps
  • Provider options:

    • OpenAI Whisper API (good quality, cheap)
    • Deepgram (fast, good for real-time)
    • Local Whisper (free, runs on GPU)
    • Groq Whisper (fast, free tier available)
  • Tool interface:

    transcribe(source="audio.mp3")  # Local file
    transcribe(source="https://youtube.com/...")  # YouTube
    transcribe(source="voice_message", data=bytes)  # Voice memo
    

Files to create: tools/transcribe_tool.py, integrate with messaging monitors


Priority Order (Suggested)

  1. 🎯 Subagent Architecture - Critical for context management, enables everything else
  2. Memory & Context Management - Complements subagents for remaining context
  3. Self-Reflection - Improves reliability and reduces wasted tool calls
  4. Project-Local Context - Practical win, keeps useful info across sessions
  5. Messaging Integrations - Unlocks mobile access, new interaction patterns
  6. Scheduled Tasks / Cron Jobs - Enables automation, reminders, monitoring
  7. Tool Composition - Quality of life, builds on other improvements
  8. Dynamic Skills - Force multiplier for repeated tasks
  9. Interactive Clarifying Questions - Better UX for ambiguous tasks
  10. TTS / Audio Transcription - Accessibility, hands-free use

Removed Items (Unrealistic)

The following were removed because they're architecturally impossible:

  • Proactive suggestions / Prefetching - Agent only runs on user request, can't interject
  • Clipboard integration - No access to user's local system clipboard

The following moved to active TODO (now possible with new architecture):

  • Session save/restore → See Messaging Integrations (session persistence)
  • Voice/TTS playback → See TTS (can generate audio files, send via messaging)
  • Set reminders → See Scheduled Tasks / Cron Jobs

The following were removed because they're already possible:

  • HTTP/API Client → Use curl or Python requests in terminal
  • Structured Data Manipulation → Use pandas in terminal
  • Git-Native Operations → Use git CLI in terminal
  • Symbolic Math → Use SymPy in terminal
  • Code Quality Tools → Run linters (eslint, black, mypy) in terminal
  • Testing Framework → Run pytest, jest, etc. in terminal
  • Translation → LLM handles this fine, or use translation APIs


🧪 Brainstorm Ideas (Not Yet Fleshed Out)

These are early-stage ideas that need more thinking before implementation. Captured here so they don't get lost.

Remote/Distributed Execution 🌐

Concept: Run agent on a powerful remote server while interacting from a thin client.

Why interesting:

  • Run on beefy GPU server for local LLM inference
  • Agent has access to remote machine's resources (files, tools, internet)
  • User interacts via lightweight client (phone, low-power laptop)

Open questions:

  • How does this differ from just SSH + running cli.py on remote?
  • Would need secure communication channel (WebSocket? gRPC?)
  • How to handle tool outputs that reference remote paths?
  • Credential management for remote execution
  • Latency considerations for interactive use

Possible architecture:

┌─────────────┐         ┌─────────────────────────┐
│ Thin Client │ ◄─────► │ Remote Hermes Server    │
│ (phone/web) │  WS/API │ - Full agent + tools    │
└─────────────┘         │ - GPU for local LLM     │
                        │ - Access to server files│
                        └─────────────────────────┘

Related to: Messaging integrations (could be the "server" that monitors receive from)


Multi-Agent Parallel Execution 🤖🤖

Concept: Extension of Subagent Architecture (Section 1) - run multiple subagents in parallel.

Why interesting:

  • Independent subtasks don't need to wait for each other
  • "Research X while setting up Y" - both run simultaneously
  • Faster completion for complex multi-part tasks

Open questions:

  • How to detect which tasks are truly independent?
  • Resource management (API rate limits, concurrent connections)
  • How to merge results when parallel tasks have conflicts?
  • Cost implications of multiple parallel LLM calls

Note: Basic subagent delegation (Section 1) should be implemented first, parallel execution is an optimization on top.


Plugin/Extension System 🔌

Concept: Allow users to add custom tools/skills without modifying core code.

Why interesting:

  • Community contributions
  • Organization-specific tools
  • Clean separation of core vs. extensions

Open questions:

  • Security implications of loading arbitrary code
  • Versioning and compatibility
  • Discovery and installation UX

Last updated: $(date +%Y-%m-%d) 🤖