Refactor TODO list and remove completed items

Removed high-priority immediate fixes section and reorganized the TODO list. Updated various sections to reflect new priorities and ideas.
2026-02-02 23:08:27 -08:00
parent c9011fc7e1
commit be91af7551
1 changed files with 7 additions and 317 deletions
--- a/TODO.md
+++ b/TODO.md
@@ -4,101 +4,6 @@

 ---

-## 🚨 HIGH PRIORITY - Immediate Fixes
-
-These items need to be addressed ASAP:
-
-### 1. SUDO Breaking Terminal Tool 🔐 ✅ COMPLETE
- [x] **Problem:** SUDO commands break the terminal tool execution (hangs indefinitely)
- [x] **Fix:** Created custom environment wrappers in `tools/terminal_tool.py`
-  - `stdin=subprocess.DEVNULL` prevents hanging on interactive prompts
-  - Sudo fails gracefully with clear error if no password configured
-  - Same UX as Claude Code - agent sees error, tells user to run it themselves
- [x] **All 5 environments now have consistent behavior:**
-  - `_LocalEnvironment` - local execution
-  - `_DockerEnvironment` - Docker containers
-  - `_SingularityEnvironment` - Singularity/Apptainer containers
-  - `_ModalEnvironment` - Modal cloud sandboxes
-  - `_SSHEnvironment` - remote SSH execution
- [x] **Optional sudo support via `SUDO_PASSWORD` env var:**
-  - Shared `_transform_sudo_command()` helper used by all environments
-  - If set, auto-transforms `sudo cmd` → pipes password via `sudo -S`
-  - Documented in `.env.example`, `cli-config.yaml`, and README
-  - Works for chained commands: `cmd1 && sudo cmd2`
- [x] **Interactive sudo prompt in CLI mode:**
-  - When sudo detected and no password configured, prompts user
-  - 45-second timeout (auto-skips if no input)
-  - Hidden password input via `getpass` (password not visible)
-  - Password cached for session (don't ask repeatedly)
-  - Spinner pauses during prompt for clean UX
-  - Uses `HERMES_INTERACTIVE` env var to detect CLI mode
-
-### 2. Fix `browser_get_images` Tool 🖼️ ✅ VERIFIED WORKING
- [x] **Tested:** Tool works correctly on multiple sites
- [x] **Results:** Successfully extracts image URLs, alt text, dimensions
- [x] **Note:** Some sites (Pixabay, etc.) have Cloudflare bot protection that blocks headless browsers - this is expected behavior, not a bug
-
-### 3. Better Action Logging for Debugging 📝 ✅ COMPLETE
- [x] **Problem:** Need better logging of agent actions for debugging
- [x] **Implementation:**
-  - Save full session trajectories to `logs/` directory as JSON
-  - Each session gets a unique file: `session_YYYYMMDD_HHMMSS_UUID.json`
-  - Logs all messages, tool calls with inputs/outputs, timestamps
-  - Structured JSON format for easy parsing and replay
-  - Automatic on CLI runs (configurable)
-
-### 4. Automatic Context Compression 🗜️ ✅ COMPLETE
- [x] **Problem:** Long conversations exceed model context limits, causing errors
- [x] **Solution:** Auto-compress middle turns when approaching limit
- [x] **Implementation:**
-  - Fetches model context lengths from OpenRouter `/api/v1/models` API (cached 1hr)
-  - Tracks actual token usage from API responses (`usage.prompt_tokens`)
-  - Triggers at 85% of model's context limit (configurable)
-  - Protects first 3 turns (system, initial request, first response)
-  - Protects last 4 turns (recent context most relevant)
-  - Summarizes middle turns using fast model (Gemini Flash)
-  - Inserts summary as user message, conversation continues seamlessly
-  - If context error occurs, attempts compression before failing
- [x] **Configuration (cli-config.yaml / env vars):**
-  - `CONTEXT_COMPRESSION_ENABLED` (default: true)
-  - `CONTEXT_COMPRESSION_THRESHOLD` (default: 0.85 = 85%)
-  - `CONTEXT_COMPRESSION_MODEL` (default: google/gemini-2.0-flash-001)
-
-### 5. Stream Thinking Summaries in Real-Time 💭 ⏸️ DEFERRED
- [ ] **Problem:** Thinking/reasoning summaries not shown while streaming
- [ ] **Complexity:** This is a significant refactor - leaving for later
-
-**OpenRouter Streaming Info:**
- Uses `stream=True` with OpenAI SDK
- Reasoning comes in `choices[].delta.reasoning_details` chunks
- Types: `reasoning.summary`, `reasoning.text`, `reasoning.encrypted`
- Tool call arguments stream as partial JSON (need accumulation)
- Items paradigm: same ID emitted multiple times with updated content
-
-**Key Challenges:**
- Tool call JSON accumulation (partial `{"query": "wea` → `{"query": "weather"}`)
- Multiple concurrent outputs (thinking + tool calls + text simultaneously)
- State management for partial responses
- Error handling if connection drops mid-stream
- Deciding when tool calls are "complete" enough to execute
-
-**UX Questions to Resolve:**
- Show raw thinking text or summarized?
- Live expanding text vs. spinner replacement?
- Markdown rendering while streaming?
- How to handle thinking + tool call display simultaneously?
-
-**Implementation Options:**
- New `run_conversation_streaming()` method (keep non-streaming as fallback)
- Wrapper that handles streaming internally
- Big refactor of existing `run_conversation()`
-
-**References:**
- https://openrouter.ai/docs/api/reference/streaming
- https://openrouter.ai/docs/guides/best-practices/reasoning-tokens#streaming-response
-
---
-
 ## 1. Subagent Architecture (Context Isolation) 🎯

 **Problem:** Long-running tools (terminal commands, browser automation, complex file operations) consume massive context. A single `ls -la` can add hundreds of lines. Browser snapshots, debugging sessions, and iterative terminal work quickly bloat the main conversation, leaving less room for actual reasoning.
@@ -218,38 +123,7 @@ These items need to be addressed ASAP:

 ---

-## 3. Tool Composition & Learning 🔧
-
-**Problem:** Tools are atomic. Complex tasks require repeated manual orchestration of the same tool sequences.
-
-**Ideas:**
- [ ] **Macro tools / Tool chains** - Define reusable tool sequences:
-  ```yaml
-  research_topic:
-    description: "Deep research on a topic"
-    steps:
-      - web_search: {query: "$topic"}
-      - web_extract: {urls: "$search_results.urls[:3]"}
-      - summarize: {content: "$extracted"}
-  ```
-  - Could be defined in skills or a new `macros/` directory
-  - Agent can invoke macro as single tool call
-  
- [ ] **Tool failure patterns** - Learn from failures:
-  - Track: tool, input pattern, error type, what worked instead
-  - Before calling a tool, check: "Has this pattern failed before?"
-  - Persistent across sessions (stored in skills or separate DB)
-  
- [ ] **Parallel tool execution** - When tools are independent, run concurrently:
-  - Detect independence (no data dependencies between calls)
-  - Use `asyncio.gather()` for parallel execution
-  - Already have async support in some tools, just need orchestration
-
-**Files to modify:** `model_tools.py`, `toolsets.py`, new `tool_macros.py`
-
---
-
-## 4. Dynamic Skills Expansion 📚
+## 3. Dynamic Skills Expansion 📚

 **Problem:** Skills system is elegant but static. Skills must be manually created and added.

@@ -278,7 +152,7 @@ These items need to be addressed ASAP:

 ---

-## 5. Interactive Clarifying Questions Tool ❓
+## 4. Interactive Clarifying Questions Tool ❓

 **Problem:** Agent sometimes makes assumptions or guesses when it should ask the user. Currently can only ask via text, which gets lost in long outputs.

@@ -314,7 +188,7 @@ These items need to be addressed ASAP:

 ---

-## 6. Collaborative Problem Solving 🤝
+## 5. Collaborative Problem Solving 🤝

 **Problem:** Interaction is command/response. Complex problems benefit from dialogue.

@@ -333,7 +207,7 @@ These items need to be addressed ASAP:

 ---

-## 7. Project-Local Context 💾
+## 6. Project-Local Context 💾

 **Problem:** Valuable context lost between sessions.

@@ -351,30 +225,7 @@ These items need to be addressed ASAP:

 **Files to modify:** New `project_context.py`, auto-load in `run_agent.py`

---
-
-## 8. Graceful Degradation & Robustness 🛡️
-
-**Problem:** When things go wrong, recovery is limited. Should fail gracefully.
-
-**Ideas:**
- [ ] **Fallback chains** - When primary approach fails, have backups:
-  - `web_extract` fails → try `browser_navigate` → try `web_search` for cached version
-  - Define fallback order per tool type
-  
- [ ] **Partial progress preservation** - Don't lose work on failure:
-  - Long task fails midway → save what we've got
-  - "I completed 3/5 steps before the error. Here's what I have..."
-  
- [ ] **Self-healing** - Detect and recover from bad states:
-  - Browser stuck → close and retry
-  - Terminal hung → timeout and reset
-
-**Files to modify:** `model_tools.py`, tool implementations, new `fallback_manager.py`
-
---
-
-## 9. Tools & Skills Wishlist 🧰
+## 6. Tools & Skills Wishlist 🧰

 *Things that would need new tool implementations (can't do well with current tools):*

@@ -441,7 +292,7 @@ These items need to be addressed ASAP:

 ---

-## 10. Messaging Platform Integrations 💬 ✅ COMPLETE
+## 7. Messaging Platform Integrations 💬 ✅ COMPLETE

 **Problem:** Agent currently only works via `cli.py` which requires direct terminal access. Users may want to interact via messaging apps from their phone or other devices.

@@ -496,71 +347,7 @@ These items need to be addressed ASAP:

 ---

-## 11. Scheduled Tasks / Cron Jobs ⏰ ✅ COMPLETE
-
-**Problem:** Agent only runs on-demand. Some tasks benefit from scheduled execution (daily summaries, monitoring, reminders).
-
-**Solution Implemented:**
-
- [x] **Cron-style scheduler** - Run agent turns on a schedule
-  - Jobs stored in `~/.hermes/cron/jobs.json`
-  - Each job: `{ id, name, prompt, schedule, repeat, enabled, next_run_at, ... }`
-  - Built-in scheduler daemon or system cron integration
-  
- [x] **Schedule formats:**
-  - Duration: `30m`, `2h`, `1d` (one-shot delay)
-  - Interval: `every 30m`, `every 2h` (recurring)
-  - Cron expression: `0 9 * * *` (requires `croniter` package)
-  - ISO timestamp: `2026-02-03T14:00:00` (one-shot at specific time)
-
- [x] **Repeat options:**
-  - `repeat=None` (or omit): One-shot schedules run once; intervals/cron run forever
-  - `repeat=1`: Run once then auto-delete
-  - `repeat=N`: Run exactly N times then auto-delete
-  
- [x] **CLI interface:**
-  ```bash
-  # List scheduled jobs
-  /cron
-  /cron list
-  
-  # Add a one-shot job (runs once in 30 minutes)
-  /cron add 30m "Remind me to check the build status"
-  
-  # Add a recurring job (every 2 hours)
-  /cron add "every 2h" "Check server status at 192.168.1.100"
-  
-  # Add a cron expression (daily at 9am)
-  /cron add "0 9 * * *" "Generate morning briefing"
-  
-  # Remove a job
-  /cron remove <job_id>
-  ```
-
- [x] **Agent self-scheduling tools** (hermes-cli toolset):
-  - `schedule_cronjob(prompt, schedule, name?, repeat?)` - Create a scheduled task
-  - `list_cronjobs()` - View all scheduled jobs
-  - `remove_cronjob(job_id)` - Cancel a job
-  - Tool descriptions emphasize: **cronjobs run in isolated sessions with NO context**
-
- [x] **Daemon modes:**
-  ```bash
-  # Built-in daemon (checks every 60 seconds)
-  python cli.py --cron-daemon
-  
-  # Single tick for system cron integration
-  python cli.py --cron-tick-once
-  ```
-
- [x] **Output storage:** `~/.hermes/cron/output/{job_id}/{timestamp}.md`
-
-**Files created:** `cron/__init__.py`, `cron/jobs.py`, `cron/scheduler.py`, `tools/cronjob_tools.py`
-
-**Toolset:** `hermes-cli` (default for CLI) includes cronjob tools; not in batch runner toolsets
-
---
-
-## 12. Text-to-Speech (TTS) 🔊
+## 8. Text-to-Speech (TTS) 🔊

 **Problem:** Agent can only respond with text. Some users prefer audio responses (accessibility, hands-free use, podcasts).

@@ -620,103 +407,6 @@ These items need to be addressed ASAP:

 **Files to create:** `tools/transcribe_tool.py`, integrate with messaging monitors

---
-
-## Priority Order (Suggested)
-
-1. **🎯 Subagent Architecture** - Critical for context management, enables everything else
-2. **Memory & Context Management** - Complements subagents for remaining context
-3. **Self-Reflection** - Improves reliability and reduces wasted tool calls  
-4. **Project-Local Context** - Practical win, keeps useful info across sessions
-5. **Messaging Integrations** - Unlocks mobile access, new interaction patterns
-6. **Scheduled Tasks / Cron Jobs** - Enables automation, reminders, monitoring
-7. **Tool Composition** - Quality of life, builds on other improvements
-8. **Dynamic Skills** - Force multiplier for repeated tasks
-9. **Interactive Clarifying Questions** - Better UX for ambiguous tasks
-10. **TTS / Audio Transcription** - Accessibility, hands-free use
-
---
-
-## Removed Items (Unrealistic)
-
-The following were removed because they're architecturally impossible:
-
- ~~Proactive suggestions / Prefetching~~ - Agent only runs on user request, can't interject
- ~~Clipboard integration~~ - No access to user's local system clipboard
-
-The following **moved to active TODO** (now possible with new architecture):
-
- ~~Session save/restore~~ → See **Messaging Integrations** (session persistence)
- ~~Voice/TTS playback~~ → See **TTS** (can generate audio files, send via messaging)
- ~~Set reminders~~ → See **Scheduled Tasks / Cron Jobs**
-
-The following were removed because they're **already possible**:
-
- ~~HTTP/API Client~~ → Use `curl` or Python `requests` in terminal
- ~~Structured Data Manipulation~~ → Use `pandas` in terminal
- ~~Git-Native Operations~~ → Use `git` CLI in terminal
- ~~Symbolic Math~~ → Use `SymPy` in terminal
- ~~Code Quality Tools~~ → Run linters (`eslint`, `black`, `mypy`) in terminal
- ~~Testing Framework~~ → Run `pytest`, `jest`, etc. in terminal
- ~~Translation~~ → LLM handles this fine, or use translation APIs
-
---
-
---
-
-## 🧪 Brainstorm Ideas (Not Yet Fleshed Out)
-
-*These are early-stage ideas that need more thinking before implementation. Captured here so they don't get lost.*
-
-### Remote/Distributed Execution 🌐
-
-**Concept:** Run agent on a powerful remote server while interacting from a thin client.
-
-**Why interesting:**
- Run on beefy GPU server for local LLM inference
- Agent has access to remote machine's resources (files, tools, internet)
- User interacts via lightweight client (phone, low-power laptop)
-
-**Open questions:**
- How does this differ from just SSH + running cli.py on remote?
- Would need secure communication channel (WebSocket? gRPC?)
- How to handle tool outputs that reference remote paths?
- Credential management for remote execution
- Latency considerations for interactive use
-
-**Possible architecture:**
-```
-┌─────────────┐         ┌─────────────────────────┐
-│ Thin Client │ ◄─────► │ Remote Hermes Server    │
-│ (phone/web) │  WS/API │ - Full agent + tools    │
-└─────────────┘         │ - GPU for local LLM     │
-                        │ - Access to server files│
-                        └─────────────────────────┘
-```
-
-**Related to:** Messaging integrations (could be the "server" that monitors receive from)
-
---
-
-### Multi-Agent Parallel Execution 🤖🤖
-
-**Concept:** Extension of Subagent Architecture (Section 1) - run multiple subagents in parallel.
-
-**Why interesting:**
- Independent subtasks don't need to wait for each other
- "Research X while setting up Y" - both run simultaneously
- Faster completion for complex multi-part tasks
-
-**Open questions:**
- How to detect which tasks are truly independent?
- Resource management (API rate limits, concurrent connections)
- How to merge results when parallel tasks have conflicts?
- Cost implications of multiple parallel LLM calls
-
-*Note: Basic subagent delegation (Section 1) should be implemented first, parallel execution is an optimization on top.*
-
---
-
 ### Plugin/Extension System 🔌

 **Concept:** Allow users to add custom tools/skills without modifying core code.