Files

teknium1 619c72e566 Enhance CLI with multi-platform messaging integration and configuration management

- Updated CLI to load configuration from user-specific and project-specific YAML files, prioritizing user settings.
- Introduced a new command `/platforms` to display the status of connected messaging platforms (Telegram, Discord, WhatsApp).
- Implemented a gateway system for handling messaging interactions, including session management and delivery routing for cron job outputs.
- Added support for environment variable configuration and a dedicated gateway configuration file for advanced settings.
- Enhanced documentation in README.md and added a new messaging.md file to guide users on platform integrations and setup.
- Updated toolsets to include platform-specific capabilities for Telegram, Discord, and WhatsApp, ensuring secure and tailored interactions.

2026-02-02 19:01:51 -08:00

30 KiB

Raw Blame History

Hermes Agent - Future Improvements

Ideas for enhancing the agent's capabilities, generated from self-analysis of the codebase.

🚨 HIGH PRIORITY - Immediate Fixes

These items need to be addressed ASAP:

1. SUDO Breaking Terminal Tool 🔐 ✅ COMPLETE

Problem: SUDO commands break the terminal tool execution (hangs indefinitely)
Fix: Created custom environment wrappers in tools/terminal_tool.py
- stdin=subprocess.DEVNULL prevents hanging on interactive prompts
- Sudo fails gracefully with clear error if no password configured
- Same UX as Claude Code - agent sees error, tells user to run it themselves
All 5 environments now have consistent behavior:
- _LocalEnvironment - local execution
- _DockerEnvironment - Docker containers
- _SingularityEnvironment - Singularity/Apptainer containers
- _ModalEnvironment - Modal cloud sandboxes
- _SSHEnvironment - remote SSH execution
Optional sudo support via SUDO_PASSWORD env var:
- Shared _transform_sudo_command() helper used by all environments
- If set, auto-transforms sudo cmd → pipes password via sudo -S
- Documented in .env.example, cli-config.yaml, and README
- Works for chained commands: cmd1 && sudo cmd2
Interactive sudo prompt in CLI mode:
- When sudo detected and no password configured, prompts user
- 45-second timeout (auto-skips if no input)
- Hidden password input via getpass (password not visible)
- Password cached for session (don't ask repeatedly)
- Spinner pauses during prompt for clean UX
- Uses HERMES_INTERACTIVE env var to detect CLI mode

2. Fix `browser_get_images` Tool 🖼️ ✅ VERIFIED WORKING

Tested: Tool works correctly on multiple sites
Results: Successfully extracts image URLs, alt text, dimensions
Note: Some sites (Pixabay, etc.) have Cloudflare bot protection that blocks headless browsers - this is expected behavior, not a bug

3. Better Action Logging for Debugging 📝 ✅ COMPLETE

Problem: Need better logging of agent actions for debugging
Implementation:
- Save full session trajectories to logs/ directory as JSON
- Each session gets a unique file: session_YYYYMMDD_HHMMSS_UUID.json
- Logs all messages, tool calls with inputs/outputs, timestamps
- Structured JSON format for easy parsing and replay
- Automatic on CLI runs (configurable)

4. Automatic Context Compression 🗜️ ✅ COMPLETE

Problem: Long conversations exceed model context limits, causing errors
Solution: Auto-compress middle turns when approaching limit
Implementation:
- Fetches model context lengths from OpenRouter /api/v1/models API (cached 1hr)
- Tracks actual token usage from API responses (usage.prompt_tokens)
- Triggers at 85% of model's context limit (configurable)
- Protects first 3 turns (system, initial request, first response)
- Protects last 4 turns (recent context most relevant)
- Summarizes middle turns using fast model (Gemini Flash)
- Inserts summary as user message, conversation continues seamlessly
- If context error occurs, attempts compression before failing
Configuration (cli-config.yaml / env vars):
- CONTEXT_COMPRESSION_ENABLED (default: true)
- CONTEXT_COMPRESSION_THRESHOLD (default: 0.85 = 85%)
- CONTEXT_COMPRESSION_MODEL (default: google/gemini-2.0-flash-001)

5. Stream Thinking Summaries in Real-Time 💭 ⏸️ DEFERRED

Problem: Thinking/reasoning summaries not shown while streaming
Complexity: This is a significant refactor - leaving for later

OpenRouter Streaming Info:

Uses stream=True with OpenAI SDK
Reasoning comes in choices[].delta.reasoning_details chunks
Types: reasoning.summary, reasoning.text, reasoning.encrypted
Tool call arguments stream as partial JSON (need accumulation)
Items paradigm: same ID emitted multiple times with updated content

Key Challenges:

Tool call JSON accumulation (partial {"query": "wea → {"query": "weather"})
Multiple concurrent outputs (thinking + tool calls + text simultaneously)
State management for partial responses
Error handling if connection drops mid-stream
Deciding when tool calls are "complete" enough to execute

UX Questions to Resolve:

Show raw thinking text or summarized?
Live expanding text vs. spinner replacement?
Markdown rendering while streaming?
How to handle thinking + tool call display simultaneously?

Implementation Options:

New run_conversation_streaming() method (keep non-streaming as fallback)
Wrapper that handles streaming internally
Big refactor of existing run_conversation()

References:

1. Subagent Architecture (Context Isolation) 🎯

Problem: Long-running tools (terminal commands, browser automation, complex file operations) consume massive context. A single ls -la can add hundreds of lines. Browser snapshots, debugging sessions, and iterative terminal work quickly bloat the main conversation, leaving less room for actual reasoning.

Solution: The main agent becomes an orchestrator that delegates context-heavy tasks to subagents.

Architecture:

┌─────────────────────────────────────────────────────────────────┐
│  ORCHESTRATOR (main agent)                                      │
│  - Receives user request                                        │
│  - Plans approach                                               │
│  - Delegates heavy tasks to subagents                           │
│  - Receives summarized results                                  │
│  - Maintains clean, focused context                             │
└─────────────────────────────────────────────────────────────────┘
         │                    │                    │
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│ TERMINAL AGENT  │  │ BROWSER AGENT   │  │ CODE AGENT      │
│ - terminal tool │  │ - browser tools │  │ - file tools    │
│ - file tools    │  │ - web_search    │  │ - terminal      │
│                 │  │ - web_extract   │  │                 │
│ Isolated context│  │ Isolated context│  │ Isolated context│
│ Returns summary │  │ Returns summary │  │ Returns summary │
└─────────────────┘  └─────────────────┘  └─────────────────┘

How it works:

User asks: "Set up a new Python project with FastAPI and tests"
Orchestrator plans: "I need to create files, install deps, write code"
Orchestrator calls: terminal_task(goal="Create venv, install fastapi pytest", context="New project in ~/myapp")
Subagent spawns with fresh context, only terminal/file tools
Subagent iterates (may take 10+ tool calls, lots of output)
Subagent completes → returns summary: "Created venv, installed fastapi==0.109.0, pytest==8.0.0"
Orchestrator receives only the summary, context stays clean
Orchestrator continues with next subtask

Key tools to implement:

terminal_task(goal, context, cwd?) - Delegate terminal/shell work
browser_task(goal, context, start_url?) - Delegate web research/automation
code_task(goal, context, files?) - Delegate code writing/modification
Generic delegate_task(goal, context, toolsets=[]) - Flexible delegation

Implementation details:

Subagent uses same run_agent.py but with:
- Fresh/empty conversation history
- Limited toolset (only what's needed)
- Smaller max_iterations (focused task)
- Task-specific system prompt

Subagent returns structured result:

{
  "success": True,
  "summary": "Installed 3 packages, created 2 files",
  "details": "Optional longer explanation if needed",
  "artifacts": ["~/myapp/requirements.txt", "~/myapp/main.py"],  # Files created
  "errors": []  # Any issues encountered
}

Orchestrator sees only the summary in its context
Full subagent transcript saved separately for debugging

Benefits:

🧹 Clean context - Orchestrator stays focused, doesn't drown in tool output
📊 Better token efficiency - 50 terminal outputs → 1 summary paragraph
🎯 Focused subagents - Each agent has just the tools it needs
🔄 Parallel potential - Independent subtasks could run concurrently
🐛 Easier debugging - Each subtask has its own isolated transcript

When to use subagents vs direct tools:

Subagent: Multi-step tasks, iteration likely, lots of output expected
Direct: Quick one-off commands, simple file reads, user needs to see output

Files to modify: run_agent.py (add orchestration mode), new tools/delegate_tools.py, new subagent_runner.py

2. Planning & Task Management 📋

Problem: Agent handles tasks reactively without explicit planning. Complex multi-step tasks lack structure, progress tracking, and the ability to decompose work into manageable chunks.

Ideas:

Task decomposition tool - Break complex requests into subtasks:

User: "Set up a new Python project with FastAPI, tests, and Docker"

Agent creates plan:
├── 1. Create project structure and requirements.txt
├── 2. Implement FastAPI app skeleton
├── 3. Add pytest configuration and initial tests
├── 4. Create Dockerfile and docker-compose.yml
└── 5. Verify everything works together

Each subtask becomes a trackable unit
Agent can report progress: "Completed 3/5 tasks"

Progress checkpoints - Periodic self-assessment:
- After N tool calls or time elapsed, pause to evaluate
- "What have I accomplished? What remains? Am I on track?"
- Detect if stuck in loops or making no progress
- Could trigger replanning if approach isn't working
Explicit plan storage - Persist plan in conversation:
- Store as structured data (not just in context)
- Update status as tasks complete
- User can ask "What's the plan?" or "What's left?"
- Survives context compression (plans are protected)
Failure recovery with replanning - When things go wrong:
- Record what failed and why
- Revise plan to work around the issue
- "Step 3 failed because X, adjusting approach to Y"
- Prevents repeating failed strategies

Files to modify: run_agent.py (add planning hooks), new tools/planning_tool.py

3. Tool Composition & Learning 🔧

Problem: Tools are atomic. Complex tasks require repeated manual orchestration of the same tool sequences.

Ideas:

Macro tools / Tool chains - Define reusable tool sequences:

research_topic:
  description: "Deep research on a topic"
  steps:
    - web_search: {query: "$topic"}
    - web_extract: {urls: "$search_results.urls[:3]"}
    - summarize: {content: "$extracted"}

Could be defined in skills or a new macros/ directory
Agent can invoke macro as single tool call

Tool failure patterns - Learn from failures:
- Track: tool, input pattern, error type, what worked instead
- Before calling a tool, check: "Has this pattern failed before?"
- Persistent across sessions (stored in skills or separate DB)
Parallel tool execution - When tools are independent, run concurrently:
- Detect independence (no data dependencies between calls)
- Use asyncio.gather() for parallel execution
- Already have async support in some tools, just need orchestration

Files to modify: model_tools.py, toolsets.py, new tool_macros.py

4. Dynamic Skills Expansion 📚

Problem: Skills system is elegant but static. Skills must be manually created and added.

Ideas:

Skill acquisition from successful tasks - After completing a complex task:
- "This approach worked well. Save as a skill?"
- Extract: goal, steps taken, tools used, key decisions
- Generate SKILL.md automatically
- Store in user's skills directory

Skill templates - Common patterns that can be parameterized:

# Debug {language} Error
1. Reproduce the error
2. Search for error message: `web_search("{error_message} {language}")`
3. Check common causes: {common_causes}
4. Apply fix and verify

Skill chaining - Combine skills for complex workflows:
- Skills can reference other skills as dependencies
- "To do X, first apply skill Y, then skill Z"
- Directed graph of skill dependencies

Files to modify: tools/skills_tool.py, skills/ directory structure, new skill_generator.py

5. Interactive Clarifying Questions Tool ❓

Problem: Agent sometimes makes assumptions or guesses when it should ask the user. Currently can only ask via text, which gets lost in long outputs.

Ideas:

Multiple-choice prompt tool - Let agent present structured choices to user:

ask_user_choice(
  question="Should the language switcher enable only German or all languages?",
  choices=[
    "Only enable German - works immediately",
    "Enable all, mark untranslated - show fallback notice",
    "Let me specify something else"
  ]
)

Renders as interactive terminal UI with arrow key / Tab navigation
User selects option, result returned to agent
Up to 4 choices + optional free-text option

Implementation:
- Use inquirer or questionary Python library for rich terminal prompts
- Tool returns selected option text (or user's custom input)
- CLI-only - only works when running via cli.py (not API/programmatic use)
- Graceful fallback: if not in interactive mode, return error asking agent to rephrase as text
Use cases:
- Clarify ambiguous requirements before starting work
- Confirm destructive operations with clear options
- Let user choose between implementation approaches
- Checkpoint complex multi-step workflows

Files to modify: New tools/ask_user_tool.py, cli.py (detect interactive mode), model_tools.py

6. Collaborative Problem Solving 🤝

Problem: Interaction is command/response. Complex problems benefit from dialogue.

Ideas:

Assumption surfacing - Make implicit assumptions explicit:
- "I'm assuming you want Python 3.11+. Correct?"
- "This solution assumes you have sudo access..."
- Let user correct before going down wrong path
Checkpoint & confirm - For high-stakes operations:
- "About to delete 47 files. Here's the list - proceed?"
- "This will modify your database. Want a backup first?"
- Configurable threshold for when to ask

Files to modify: run_agent.py, system prompt configuration

7. Project-Local Context 💾

Problem: Valuable context lost between sessions.

Ideas:

Project awareness - Remember project-specific context:
- Store .hermes/context.md in project directory
- "This is a Django project using PostgreSQL"
- Coding style preferences, deployment setup, etc.
- Load automatically when working in that directory
Handoff notes - Leave notes for future sessions:
- Write to .hermes/notes.md in project
- "TODO for next session: finish implementing X"
- "Known issues: Y doesn't work on Windows"

Files to modify: New project_context.py, auto-load in run_agent.py

8. Graceful Degradation & Robustness 🛡️

Problem: When things go wrong, recovery is limited. Should fail gracefully.

Ideas:

Fallback chains - When primary approach fails, have backups:
- web_extract fails → try browser_navigate → try web_search for cached version
- Define fallback order per tool type
Partial progress preservation - Don't lose work on failure:
- Long task fails midway → save what we've got
- "I completed 3/5 steps before the error. Here's what I have..."
Self-healing - Detect and recover from bad states:
- Browser stuck → close and retry
- Terminal hung → timeout and reset

Files to modify: model_tools.py, tool implementations, new fallback_manager.py

9. Tools & Skills Wishlist 🧰

Things that would need new tool implementations (can't do well with current tools):

High-Impact

Audio/Video Transcription 🎬 (See also: Section 16 for detailed spec)
- Transcribe audio files, podcasts, YouTube videos
- Extract key moments from video
- Voice memo transcription for messaging integrations
- Provider options: Whisper API, Deepgram, local Whisper
Diagram Rendering 📊
- Render Mermaid/PlantUML to actual images
- Can generate the code, but rendering requires external service or tool
- "Show me how these components connect" → actual visual diagram

Medium-Impact

Canvas / Visual Workspace 🖼️
- Agent-controlled visual panel for rendering interactive UI
- Inspired by OpenClaw's Canvas feature
- Capabilities:
  - present / hide - Show/hide the canvas panel
  - navigate - Load HTML files or URLs into the canvas
  - eval - Execute JavaScript in the canvas context
  - snapshot - Capture the rendered UI as an image
- Use cases:
  - Display generated HTML/CSS/JS previews
  - Show interactive data visualizations (charts, graphs)
  - Render diagrams (Mermaid → rendered output)
  - Present structured information in rich format
  - A2UI-style component system for structured agent UI
- Implementation options:
  - Electron-based panel for CLI
  - WebSocket-connected web app
  - VS Code webview extension
- Would let agent "show" things rather than just describe them
Document Generation 📄
- Create styled PDFs, Word docs, presentations
- Can do basic PDF via terminal tools, but limited
Diff/Patch Tool 📝
- Surgical code modifications with preview
- "Change line 45-50 to X" without rewriting whole file
- Show diffs before applying
- Can use diff/patch but a native tool would be safer

Skills to Create

Domain-specific skill packs:
- DevOps/Infrastructure (Terraform, K8s, AWS)
- Data Science workflows (EDA, model training)
- Security/pentesting procedures
Framework-specific skills:
- React/Vue/Angular patterns
- Django/Rails/Express conventions
- Database optimization playbooks
Troubleshooting flowcharts:
- "Docker container won't start" → decision tree
- "Production is slow" → systematic diagnosis

10. Messaging Platform Integrations 💬 ✅ COMPLETE

Problem: Agent currently only works via cli.py which requires direct terminal access. Users may want to interact via messaging apps from their phone or other devices.

Architecture:

run_agent.py already accepts conversation_history parameter and returns updated messages ✅
Need: persistent session storage, platform monitors, session key resolution

Implementation approach:

┌─────────────────────────────────────────────────────────────┐
│  Platform Monitor (e.g., telegram_monitor.py)               │
│  ├─ Long-running daemon connecting to messaging platform    │
│  ├─ On message: resolve session key → load history from disk│
│  ├─ Call run_agent.py with loaded history                   │
│  ├─ Save updated history back to disk (JSONL)               │
│  └─ Send response back to platform                          │
└─────────────────────────────────────────────────────────────┘

Platform support (each user sets up their own credentials):

Telegram - via python-telegram-bot
- Bot token from @BotFather
- Easiest to set up, good for personal use
Discord - via discord.py
- Bot token from Discord Developer Portal
- Can work in servers (group sessions) or DMs
WhatsApp - via Node.js bridge (whatsapp-web.js/baileys)
- Requires Node.js bridge setup
- More complex, but reaches most people

Session management:

Session store - JSONL persistence per session key
- ~/.hermes/sessions/{session_id}.jsonl
- Session keys: agent:main:telegram:dm, agent:main:discord:group:123, etc.
Session expiry - Configurable reset policies
- Daily reset (default 4am) OR idle timeout (default 2 hours)
- Manual reset via /reset or /new command in chat
- Per-platform and per-type overrides
Session continuity - Conversations persist across messages until reset

Files created: gateway/, gateway/platforms/, gateway/config.py, gateway/session.py, gateway/delivery.py, gateway/run.py

Configuration:

Environment variables: TELEGRAM_BOT_TOKEN, DISCORD_BOT_TOKEN, etc.
Config file: ~/.hermes/gateway.json
CLI commands: /platforms to check status, --gateway to start

Dynamic context injection:

Agent knows its source platform and chat
Agent knows connected platforms and home channels
Agent can deliver cron outputs to specific platforms

11. Scheduled Tasks / Cron Jobs ⏰ ✅ COMPLETE

Problem: Agent only runs on-demand. Some tasks benefit from scheduled execution (daily summaries, monitoring, reminders).

Solution Implemented:

Cron-style scheduler - Run agent turns on a schedule
- Jobs stored in ~/.hermes/cron/jobs.json
- Each job: { id, name, prompt, schedule, repeat, enabled, next_run_at, ... }
- Built-in scheduler daemon or system cron integration
Schedule formats:
- Duration: 30m, 2h, 1d (one-shot delay)
- Interval: every 30m, every 2h (recurring)
- Cron expression: 0 9 * * * (requires croniter package)
- ISO timestamp: 2026-02-03T14:00:00 (one-shot at specific time)
Repeat options:
- repeat=None (or omit): One-shot schedules run once; intervals/cron run forever
- repeat=1: Run once then auto-delete
- repeat=N: Run exactly N times then auto-delete

CLI interface:

# List scheduled jobs
/cron
/cron list

# Add a one-shot job (runs once in 30 minutes)
/cron add 30m "Remind me to check the build status"

# Add a recurring job (every 2 hours)
/cron add "every 2h" "Check server status at 192.168.1.100"

# Add a cron expression (daily at 9am)
/cron add "0 9 * * *" "Generate morning briefing"

# Remove a job
/cron remove <job_id>

Agent self-scheduling tools (hermes-cli toolset):
- schedule_cronjob(prompt, schedule, name?, repeat?) - Create a scheduled task
- list_cronjobs() - View all scheduled jobs
- remove_cronjob(job_id) - Cancel a job
- Tool descriptions emphasize: cronjobs run in isolated sessions with NO context

Daemon modes:

# Built-in daemon (checks every 60 seconds)
python cli.py --cron-daemon

# Single tick for system cron integration
python cli.py --cron-tick-once

Output storage: ~/.hermes/cron/output/{job_id}/{timestamp}.md

Files created: cron/__init__.py, cron/jobs.py, cron/scheduler.py, tools/cronjob_tools.py

Toolset: hermes-cli (default for CLI) includes cronjob tools; not in batch runner toolsets

12. Text-to-Speech (TTS) 🔊

Problem: Agent can only respond with text. Some users prefer audio responses (accessibility, hands-free use, podcasts).

Ideas:

TTS tool - Generate audio files from text
```
tts_generate(text="Here's your summary...", voice="nova", output="summary.mp3")
```
- Returns path to generated audio file
- For messaging integrations: can send as voice message
Provider options:
- Edge TTS (free, good quality, many voices)
- OpenAI TTS (paid, excellent quality)
- ElevenLabs (paid, best quality, voice cloning)
- Local options (Coqui TTS, Bark)
Modes:
- On-demand: User explicitly asks "read this to me"
- Auto-TTS: Configurable to always generate audio for responses
- Long-text handling: Summarize or chunk very long responses
Integration with messaging:
- When enabled, can send voice notes instead of/alongside text
- User preference per channel

Files to create: tools/tts_tool.py, config in cli-config.yaml

13. Speech-to-Text / Audio Transcription 🎤

Problem: Users may want to send voice memos instead of typing. Agent is blind to audio content.

Ideas:

Voice memo transcription - For messaging integrations
- User sends voice message → transcribe → process as text
- Seamless: user speaks, agent responds
Audio/video file transcription - Existing idea, expanded:
- Transcribe local audio files (mp3, wav, m4a)
- Transcribe YouTube videos (download audio → transcribe)
- Extract key moments with timestamps
Provider options:
- OpenAI Whisper API (good quality, cheap)
- Deepgram (fast, good for real-time)
- Local Whisper (free, runs on GPU)
- Groq Whisper (fast, free tier available)

Tool interface:

transcribe(source="audio.mp3")  # Local file
transcribe(source="https://youtube.com/...")  # YouTube
transcribe(source="voice_message", data=bytes)  # Voice memo

Files to create: tools/transcribe_tool.py, integrate with messaging monitors

Priority Order (Suggested)

🎯 Subagent Architecture - Critical for context management, enables everything else
Memory & Context Management - Complements subagents for remaining context
Self-Reflection - Improves reliability and reduces wasted tool calls
Project-Local Context - Practical win, keeps useful info across sessions
Messaging Integrations - Unlocks mobile access, new interaction patterns
Scheduled Tasks / Cron Jobs - Enables automation, reminders, monitoring
Tool Composition - Quality of life, builds on other improvements
Dynamic Skills - Force multiplier for repeated tasks
Interactive Clarifying Questions - Better UX for ambiguous tasks
TTS / Audio Transcription - Accessibility, hands-free use

Removed Items (Unrealistic)

The following were removed because they're architecturally impossible:

~~Proactive suggestions / Prefetching~~ - Agent only runs on user request, can't interject
~~Clipboard integration~~ - No access to user's local system clipboard

The following moved to active TODO (now possible with new architecture):

~~Session save/restore~~ → See Messaging Integrations (session persistence)
~~Voice/TTS playback~~ → See TTS (can generate audio files, send via messaging)
~~Set reminders~~ → See Scheduled Tasks / Cron Jobs

The following were removed because they're already possible:

~~HTTP/API Client~~ → Use curl or Python requests in terminal
~~Structured Data Manipulation~~ → Use pandas in terminal
~~Git-Native Operations~~ → Use git CLI in terminal
~~Symbolic Math~~ → Use SymPy in terminal
~~Code Quality Tools~~ → Run linters (eslint, black, mypy) in terminal
~~Testing Framework~~ → Run pytest, jest, etc. in terminal
~~Translation~~ → LLM handles this fine, or use translation APIs

🧪 Brainstorm Ideas (Not Yet Fleshed Out)

These are early-stage ideas that need more thinking before implementation. Captured here so they don't get lost.

Remote/Distributed Execution 🌐

Concept: Run agent on a powerful remote server while interacting from a thin client.

Why interesting:

Run on beefy GPU server for local LLM inference
Agent has access to remote machine's resources (files, tools, internet)
User interacts via lightweight client (phone, low-power laptop)

Open questions:

How does this differ from just SSH + running cli.py on remote?
Would need secure communication channel (WebSocket? gRPC?)
How to handle tool outputs that reference remote paths?
Credential management for remote execution
Latency considerations for interactive use

Possible architecture:

┌─────────────┐         ┌─────────────────────────┐
│ Thin Client │ ◄─────► │ Remote Hermes Server    │
│ (phone/web) │  WS/API │ - Full agent + tools    │
└─────────────┘         │ - GPU for local LLM     │
                        │ - Access to server files│
                        └─────────────────────────┘

Related to: Messaging integrations (could be the "server" that monitors receive from)

Multi-Agent Parallel Execution 🤖🤖

Concept: Extension of Subagent Architecture (Section 1) - run multiple subagents in parallel.

Why interesting:

Independent subtasks don't need to wait for each other
"Research X while setting up Y" - both run simultaneously
Faster completion for complex multi-part tasks

Open questions:

How to detect which tasks are truly independent?
Resource management (API rate limits, concurrent connections)
How to merge results when parallel tasks have conflicts?
Cost implications of multiple parallel LLM calls

Note: Basic subagent delegation (Section 1) should be implemented first, parallel execution is an optimization on top.

Plugin/Extension System 🔌

Concept: Allow users to add custom tools/skills without modifying core code.

Why interesting:

Community contributions
Organization-specific tools
Clean separation of core vs. extensions

Open questions:

Security implications of loading arbitrary code
Versioning and compatibility
Discovery and installation UX

Last updated: $(date +%Y-%m-%d) 🤖

30 KiB Raw Blame History

Hermes Agent - Future Improvements

🚨 HIGH PRIORITY - Immediate Fixes

1. SUDO Breaking Terminal Tool 🔐 ✅ COMPLETE

2. Fix browser_get_images Tool 🖼️ ✅ VERIFIED WORKING

3. Better Action Logging for Debugging 📝 ✅ COMPLETE

4. Automatic Context Compression 🗜️ ✅ COMPLETE

5. Stream Thinking Summaries in Real-Time 💭 ⏸️ DEFERRED

1. Subagent Architecture (Context Isolation) 🎯

2. Planning & Task Management 📋

3. Tool Composition & Learning 🔧

4. Dynamic Skills Expansion 📚

5. Interactive Clarifying Questions Tool ❓

6. Collaborative Problem Solving 🤝

7. Project-Local Context 💾

8. Graceful Degradation & Robustness 🛡️

9. Tools & Skills Wishlist 🧰

High-Impact

Medium-Impact

Skills to Create

10. Messaging Platform Integrations 💬 ✅ COMPLETE

11. Scheduled Tasks / Cron Jobs ⏰ ✅ COMPLETE

12. Text-to-Speech (TTS) 🔊

13. Speech-to-Text / Audio Transcription 🎤

Priority Order (Suggested)

Removed Items (Unrealistic)

🧪 Brainstorm Ideas (Not Yet Fleshed Out)

Remote/Distributed Execution 🌐

Multi-Agent Parallel Execution 🤖🤖

Plugin/Extension System 🔌

30 KiB

Raw Blame History

2. Fix `browser_get_images` Tool 🖼️ ✅ VERIFIED WORKING