Refactor TODO list and remove completed items

Removed high-priority immediate fixes section and reorganized the TODO list. Updated various sections to reflect new priorities and ideas.
This commit is contained in:
Teknium
2026-02-02 23:08:27 -08:00
committed by GitHub
parent c9011fc7e1
commit be91af7551

324
TODO.md
View File

@@ -4,101 +4,6 @@
---
## 🚨 HIGH PRIORITY - Immediate Fixes
These items need to be addressed ASAP:
### 1. SUDO Breaking Terminal Tool 🔐 ✅ COMPLETE
- [x] **Problem:** SUDO commands break the terminal tool execution (hangs indefinitely)
- [x] **Fix:** Created custom environment wrappers in `tools/terminal_tool.py`
- `stdin=subprocess.DEVNULL` prevents hanging on interactive prompts
- Sudo fails gracefully with clear error if no password configured
- Same UX as Claude Code - agent sees error, tells user to run it themselves
- [x] **All 5 environments now have consistent behavior:**
- `_LocalEnvironment` - local execution
- `_DockerEnvironment` - Docker containers
- `_SingularityEnvironment` - Singularity/Apptainer containers
- `_ModalEnvironment` - Modal cloud sandboxes
- `_SSHEnvironment` - remote SSH execution
- [x] **Optional sudo support via `SUDO_PASSWORD` env var:**
- Shared `_transform_sudo_command()` helper used by all environments
- If set, auto-transforms `sudo cmd` → pipes password via `sudo -S`
- Documented in `.env.example`, `cli-config.yaml`, and README
- Works for chained commands: `cmd1 && sudo cmd2`
- [x] **Interactive sudo prompt in CLI mode:**
- When sudo detected and no password configured, prompts user
- 45-second timeout (auto-skips if no input)
- Hidden password input via `getpass` (password not visible)
- Password cached for session (don't ask repeatedly)
- Spinner pauses during prompt for clean UX
- Uses `HERMES_INTERACTIVE` env var to detect CLI mode
### 2. Fix `browser_get_images` Tool 🖼️ ✅ VERIFIED WORKING
- [x] **Tested:** Tool works correctly on multiple sites
- [x] **Results:** Successfully extracts image URLs, alt text, dimensions
- [x] **Note:** Some sites (Pixabay, etc.) have Cloudflare bot protection that blocks headless browsers - this is expected behavior, not a bug
### 3. Better Action Logging for Debugging 📝 ✅ COMPLETE
- [x] **Problem:** Need better logging of agent actions for debugging
- [x] **Implementation:**
- Save full session trajectories to `logs/` directory as JSON
- Each session gets a unique file: `session_YYYYMMDD_HHMMSS_UUID.json`
- Logs all messages, tool calls with inputs/outputs, timestamps
- Structured JSON format for easy parsing and replay
- Automatic on CLI runs (configurable)
### 4. Automatic Context Compression 🗜️ ✅ COMPLETE
- [x] **Problem:** Long conversations exceed model context limits, causing errors
- [x] **Solution:** Auto-compress middle turns when approaching limit
- [x] **Implementation:**
- Fetches model context lengths from OpenRouter `/api/v1/models` API (cached 1hr)
- Tracks actual token usage from API responses (`usage.prompt_tokens`)
- Triggers at 85% of model's context limit (configurable)
- Protects first 3 turns (system, initial request, first response)
- Protects last 4 turns (recent context most relevant)
- Summarizes middle turns using fast model (Gemini Flash)
- Inserts summary as user message, conversation continues seamlessly
- If context error occurs, attempts compression before failing
- [x] **Configuration (cli-config.yaml / env vars):**
- `CONTEXT_COMPRESSION_ENABLED` (default: true)
- `CONTEXT_COMPRESSION_THRESHOLD` (default: 0.85 = 85%)
- `CONTEXT_COMPRESSION_MODEL` (default: google/gemini-2.0-flash-001)
### 5. Stream Thinking Summaries in Real-Time 💭 ⏸️ DEFERRED
- [ ] **Problem:** Thinking/reasoning summaries not shown while streaming
- [ ] **Complexity:** This is a significant refactor - leaving for later
**OpenRouter Streaming Info:**
- Uses `stream=True` with OpenAI SDK
- Reasoning comes in `choices[].delta.reasoning_details` chunks
- Types: `reasoning.summary`, `reasoning.text`, `reasoning.encrypted`
- Tool call arguments stream as partial JSON (need accumulation)
- Items paradigm: same ID emitted multiple times with updated content
**Key Challenges:**
- Tool call JSON accumulation (partial `{"query": "wea``{"query": "weather"}`)
- Multiple concurrent outputs (thinking + tool calls + text simultaneously)
- State management for partial responses
- Error handling if connection drops mid-stream
- Deciding when tool calls are "complete" enough to execute
**UX Questions to Resolve:**
- Show raw thinking text or summarized?
- Live expanding text vs. spinner replacement?
- Markdown rendering while streaming?
- How to handle thinking + tool call display simultaneously?
**Implementation Options:**
- New `run_conversation_streaming()` method (keep non-streaming as fallback)
- Wrapper that handles streaming internally
- Big refactor of existing `run_conversation()`
**References:**
- https://openrouter.ai/docs/api/reference/streaming
- https://openrouter.ai/docs/guides/best-practices/reasoning-tokens#streaming-response
---
## 1. Subagent Architecture (Context Isolation) 🎯
**Problem:** Long-running tools (terminal commands, browser automation, complex file operations) consume massive context. A single `ls -la` can add hundreds of lines. Browser snapshots, debugging sessions, and iterative terminal work quickly bloat the main conversation, leaving less room for actual reasoning.
@@ -218,38 +123,7 @@ These items need to be addressed ASAP:
---
## 3. Tool Composition & Learning 🔧
**Problem:** Tools are atomic. Complex tasks require repeated manual orchestration of the same tool sequences.
**Ideas:**
- [ ] **Macro tools / Tool chains** - Define reusable tool sequences:
```yaml
research_topic:
description: "Deep research on a topic"
steps:
- web_search: {query: "$topic"}
- web_extract: {urls: "$search_results.urls[:3]"}
- summarize: {content: "$extracted"}
```
- Could be defined in skills or a new `macros/` directory
- Agent can invoke macro as single tool call
- [ ] **Tool failure patterns** - Learn from failures:
- Track: tool, input pattern, error type, what worked instead
- Before calling a tool, check: "Has this pattern failed before?"
- Persistent across sessions (stored in skills or separate DB)
- [ ] **Parallel tool execution** - When tools are independent, run concurrently:
- Detect independence (no data dependencies between calls)
- Use `asyncio.gather()` for parallel execution
- Already have async support in some tools, just need orchestration
**Files to modify:** `model_tools.py`, `toolsets.py`, new `tool_macros.py`
---
## 4. Dynamic Skills Expansion 📚
## 3. Dynamic Skills Expansion 📚
**Problem:** Skills system is elegant but static. Skills must be manually created and added.
@@ -278,7 +152,7 @@ These items need to be addressed ASAP:
---
## 5. Interactive Clarifying Questions Tool ❓
## 4. Interactive Clarifying Questions Tool ❓
**Problem:** Agent sometimes makes assumptions or guesses when it should ask the user. Currently can only ask via text, which gets lost in long outputs.
@@ -314,7 +188,7 @@ These items need to be addressed ASAP:
---
## 6. Collaborative Problem Solving 🤝
## 5. Collaborative Problem Solving 🤝
**Problem:** Interaction is command/response. Complex problems benefit from dialogue.
@@ -333,7 +207,7 @@ These items need to be addressed ASAP:
---
## 7. Project-Local Context 💾
## 6. Project-Local Context 💾
**Problem:** Valuable context lost between sessions.
@@ -351,30 +225,7 @@ These items need to be addressed ASAP:
**Files to modify:** New `project_context.py`, auto-load in `run_agent.py`
---
## 8. Graceful Degradation & Robustness 🛡️
**Problem:** When things go wrong, recovery is limited. Should fail gracefully.
**Ideas:**
- [ ] **Fallback chains** - When primary approach fails, have backups:
- `web_extract` fails → try `browser_navigate` → try `web_search` for cached version
- Define fallback order per tool type
- [ ] **Partial progress preservation** - Don't lose work on failure:
- Long task fails midway → save what we've got
- "I completed 3/5 steps before the error. Here's what I have..."
- [ ] **Self-healing** - Detect and recover from bad states:
- Browser stuck → close and retry
- Terminal hung → timeout and reset
**Files to modify:** `model_tools.py`, tool implementations, new `fallback_manager.py`
---
## 9. Tools & Skills Wishlist 🧰
## 6. Tools & Skills Wishlist 🧰
*Things that would need new tool implementations (can't do well with current tools):*
@@ -441,7 +292,7 @@ These items need to be addressed ASAP:
---
## 10. Messaging Platform Integrations 💬 ✅ COMPLETE
## 7. Messaging Platform Integrations 💬 ✅ COMPLETE
**Problem:** Agent currently only works via `cli.py` which requires direct terminal access. Users may want to interact via messaging apps from their phone or other devices.
@@ -496,71 +347,7 @@ These items need to be addressed ASAP:
---
## 11. Scheduled Tasks / Cron Jobs ⏰ ✅ COMPLETE
**Problem:** Agent only runs on-demand. Some tasks benefit from scheduled execution (daily summaries, monitoring, reminders).
**Solution Implemented:**
- [x] **Cron-style scheduler** - Run agent turns on a schedule
- Jobs stored in `~/.hermes/cron/jobs.json`
- Each job: `{ id, name, prompt, schedule, repeat, enabled, next_run_at, ... }`
- Built-in scheduler daemon or system cron integration
- [x] **Schedule formats:**
- Duration: `30m`, `2h`, `1d` (one-shot delay)
- Interval: `every 30m`, `every 2h` (recurring)
- Cron expression: `0 9 * * *` (requires `croniter` package)
- ISO timestamp: `2026-02-03T14:00:00` (one-shot at specific time)
- [x] **Repeat options:**
- `repeat=None` (or omit): One-shot schedules run once; intervals/cron run forever
- `repeat=1`: Run once then auto-delete
- `repeat=N`: Run exactly N times then auto-delete
- [x] **CLI interface:**
```bash
# List scheduled jobs
/cron
/cron list
# Add a one-shot job (runs once in 30 minutes)
/cron add 30m "Remind me to check the build status"
# Add a recurring job (every 2 hours)
/cron add "every 2h" "Check server status at 192.168.1.100"
# Add a cron expression (daily at 9am)
/cron add "0 9 * * *" "Generate morning briefing"
# Remove a job
/cron remove <job_id>
```
- [x] **Agent self-scheduling tools** (hermes-cli toolset):
- `schedule_cronjob(prompt, schedule, name?, repeat?)` - Create a scheduled task
- `list_cronjobs()` - View all scheduled jobs
- `remove_cronjob(job_id)` - Cancel a job
- Tool descriptions emphasize: **cronjobs run in isolated sessions with NO context**
- [x] **Daemon modes:**
```bash
# Built-in daemon (checks every 60 seconds)
python cli.py --cron-daemon
# Single tick for system cron integration
python cli.py --cron-tick-once
```
- [x] **Output storage:** `~/.hermes/cron/output/{job_id}/{timestamp}.md`
**Files created:** `cron/__init__.py`, `cron/jobs.py`, `cron/scheduler.py`, `tools/cronjob_tools.py`
**Toolset:** `hermes-cli` (default for CLI) includes cronjob tools; not in batch runner toolsets
---
## 12. Text-to-Speech (TTS) 🔊
## 8. Text-to-Speech (TTS) 🔊
**Problem:** Agent can only respond with text. Some users prefer audio responses (accessibility, hands-free use, podcasts).
@@ -620,103 +407,6 @@ These items need to be addressed ASAP:
**Files to create:** `tools/transcribe_tool.py`, integrate with messaging monitors
---
## Priority Order (Suggested)
1. **🎯 Subagent Architecture** - Critical for context management, enables everything else
2. **Memory & Context Management** - Complements subagents for remaining context
3. **Self-Reflection** - Improves reliability and reduces wasted tool calls
4. **Project-Local Context** - Practical win, keeps useful info across sessions
5. **Messaging Integrations** - Unlocks mobile access, new interaction patterns
6. **Scheduled Tasks / Cron Jobs** - Enables automation, reminders, monitoring
7. **Tool Composition** - Quality of life, builds on other improvements
8. **Dynamic Skills** - Force multiplier for repeated tasks
9. **Interactive Clarifying Questions** - Better UX for ambiguous tasks
10. **TTS / Audio Transcription** - Accessibility, hands-free use
---
## Removed Items (Unrealistic)
The following were removed because they're architecturally impossible:
- ~~Proactive suggestions / Prefetching~~ - Agent only runs on user request, can't interject
- ~~Clipboard integration~~ - No access to user's local system clipboard
The following **moved to active TODO** (now possible with new architecture):
- ~~Session save/restore~~ → See **Messaging Integrations** (session persistence)
- ~~Voice/TTS playback~~ → See **TTS** (can generate audio files, send via messaging)
- ~~Set reminders~~ → See **Scheduled Tasks / Cron Jobs**
The following were removed because they're **already possible**:
- ~~HTTP/API Client~~ → Use `curl` or Python `requests` in terminal
- ~~Structured Data Manipulation~~ → Use `pandas` in terminal
- ~~Git-Native Operations~~ → Use `git` CLI in terminal
- ~~Symbolic Math~~ → Use `SymPy` in terminal
- ~~Code Quality Tools~~ → Run linters (`eslint`, `black`, `mypy`) in terminal
- ~~Testing Framework~~ → Run `pytest`, `jest`, etc. in terminal
- ~~Translation~~ → LLM handles this fine, or use translation APIs
---
---
## 🧪 Brainstorm Ideas (Not Yet Fleshed Out)
*These are early-stage ideas that need more thinking before implementation. Captured here so they don't get lost.*
### Remote/Distributed Execution 🌐
**Concept:** Run agent on a powerful remote server while interacting from a thin client.
**Why interesting:**
- Run on beefy GPU server for local LLM inference
- Agent has access to remote machine's resources (files, tools, internet)
- User interacts via lightweight client (phone, low-power laptop)
**Open questions:**
- How does this differ from just SSH + running cli.py on remote?
- Would need secure communication channel (WebSocket? gRPC?)
- How to handle tool outputs that reference remote paths?
- Credential management for remote execution
- Latency considerations for interactive use
**Possible architecture:**
```
┌─────────────┐ ┌─────────────────────────┐
│ Thin Client │ ◄─────► │ Remote Hermes Server │
│ (phone/web) │ WS/API │ - Full agent + tools │
└─────────────┘ │ - GPU for local LLM │
│ - Access to server files│
└─────────────────────────┘
```
**Related to:** Messaging integrations (could be the "server" that monitors receive from)
---
### Multi-Agent Parallel Execution 🤖🤖
**Concept:** Extension of Subagent Architecture (Section 1) - run multiple subagents in parallel.
**Why interesting:**
- Independent subtasks don't need to wait for each other
- "Research X while setting up Y" - both run simultaneously
- Faster completion for complex multi-part tasks
**Open questions:**
- How to detect which tasks are truly independent?
- Resource management (API rate limits, concurrent connections)
- How to merge results when parallel tasks have conflicts?
- Cost implications of multiple parallel LLM calls
*Note: Basic subagent delegation (Section 1) should be implemented first, parallel execution is an optimization on top.*
---
### Plugin/Extension System 🔌
**Concept:** Allow users to add custom tools/skills without modifying core code.