diff --git a/TODO.md b/TODO.md index 9b35e3b80..ed58a7b2f 100644 --- a/TODO.md +++ b/TODO.md @@ -177,56 +177,48 @@ These items need to be addressed ASAP: --- -## 2. Context Management (complements Subagents) +## 2. Planning & Task Management 📋 -**Problem:** Context grows unbounded during long conversations. Trajectory compression exists for training data post-hoc, but live conversations lack intelligent context management. +**Problem:** Agent handles tasks reactively without explicit planning. Complex multi-step tasks lack structure, progress tracking, and the ability to decompose work into manageable chunks. **Ideas:** -- [ ] **Incremental summarization** - Compress old tool outputs on-the-fly during conversations - - Trigger when context exceeds threshold (e.g., 80% of max tokens) - - Preserve recent turns fully, summarize older tool responses - - Could reuse logic from `trajectory_compressor.py` +- [ ] **Task decomposition tool** - Break complex requests into subtasks: + ``` + User: "Set up a new Python project with FastAPI, tests, and Docker" -- [ ] **Semantic memory retrieval** - Vector store for long conversation recall - - Embed important facts/findings as conversation progresses - - Retrieve relevant memories when needed instead of keeping everything in context - - Consider lightweight solutions: ChromaDB, FAISS, or even a simple embedding cache + Agent creates plan: + ├── 1. Create project structure and requirements.txt + ├── 2. Implement FastAPI app skeleton + ├── 3. Add pytest configuration and initial tests + ├── 4. Create Dockerfile and docker-compose.yml + └── 5. Verify everything works together + ``` + - Each subtask becomes a trackable unit + - Agent can report progress: "Completed 3/5 tasks" -- [ ] **Working vs. episodic memory** distinction - - Working memory: Current task state, recent tool results (always in context) - - Episodic memory: Past findings, tried approaches (retrieved on demand) - - Clear eviction policies for each +- [ ] **Progress checkpoints** - Periodic self-assessment: + - After N tool calls or time elapsed, pause to evaluate + - "What have I accomplished? What remains? Am I on track?" + - Detect if stuck in loops or making no progress + - Could trigger replanning if approach isn't working + +- [ ] **Explicit plan storage** - Persist plan in conversation: + - Store as structured data (not just in context) + - Update status as tasks complete + - User can ask "What's the plan?" or "What's left?" + - Survives context compression (plans are protected) -**Files to modify:** `run_agent.py` (add memory manager), possibly new `tools/memory_tool.py` +- [ ] **Failure recovery with replanning** - When things go wrong: + - Record what failed and why + - Revise plan to work around the issue + - "Step 3 failed because X, adjusting approach to Y" + - Prevents repeating failed strategies + +**Files to modify:** `run_agent.py` (add planning hooks), new `tools/planning_tool.py` --- -## 3. Self-Reflection & Course Correction 🔄 - -**Problem:** Current retry logic handles malformed outputs but not semantic failures. Agent doesn't reason about *why* something failed. - -**Ideas:** -- [ ] **Meta-reasoning after failures** - When a tool returns an error or unexpected result: - ``` - Tool failed → Reflect: "Why did this fail? What assumptions were wrong?" - → Adjust approach → Retry with new strategy - ``` - - Could be a lightweight LLM call or structured self-prompt - -- [ ] **Planning/replanning module** - For complex multi-step tasks: - - Generate plan before execution - - After each step, evaluate: "Am I on track? Should I revise the plan?" - - Store plan in working memory, update as needed - -- [ ] **Approach memory** - Remember what didn't work: - - "I tried X for this type of problem and it failed because Y" - - Prevents repeating failed strategies in the same conversation - -**Files to modify:** `run_agent.py` (add reflection hooks in tool loop), new `tools/reflection_tool.py` - ---- - -## 4. Tool Composition & Learning 🔧 +## 3. Tool Composition & Learning 🔧 **Problem:** Tools are atomic. Complex tasks require repeated manual orchestration of the same tool sequences. @@ -257,7 +249,7 @@ These items need to be addressed ASAP: --- -## 5. Dynamic Skills Expansion 📚 +## 4. Dynamic Skills Expansion 📚 **Problem:** Skills system is elegant but static. Skills must be manually created and added. @@ -286,7 +278,7 @@ These items need to be addressed ASAP: --- -## 6. Task Continuation Hints 🎯 +## 5. Task Continuation Hints 🎯 **Problem:** Could be more helpful by suggesting logical next steps. @@ -336,7 +328,7 @@ These items need to be addressed ASAP: --- -## 8. Resource Awareness & Efficiency 💰 +## 6. Resource Awareness & Efficiency 💰 **Problem:** No awareness of costs, time, or resource usage. Could be smarter about efficiency. @@ -373,7 +365,7 @@ These items need to be addressed ASAP: --- -## 10. Project-Local Context 💾 +## 7. Project-Local Context 💾 **Problem:** Valuable context lost between sessions. @@ -393,7 +385,7 @@ These items need to be addressed ASAP: --- -## 11. Graceful Degradation & Robustness 🛡️ +## 8. Graceful Degradation & Robustness 🛡️ **Problem:** When things go wrong, recovery is limited. Should fail gracefully. @@ -414,7 +406,7 @@ These items need to be addressed ASAP: --- -## 12. Tools & Skills Wishlist 🧰 +## 9. Tools & Skills Wishlist 🧰 *Things that would need new tool implementations (can't do well with current tools):* @@ -481,7 +473,7 @@ These items need to be addressed ASAP: --- -## 13. Messaging Platform Integrations 💬 +## 10. Messaging Platform Integrations 💬 **Problem:** Agent currently only works via `cli.py` which requires direct terminal access. Users may want to interact via messaging apps from their phone or other devices. @@ -525,7 +517,7 @@ These items need to be addressed ASAP: --- -## 14. Scheduled Tasks / Cron Jobs ⏰ +## 11. Scheduled Tasks / Cron Jobs ⏰ **Problem:** Agent only runs on-demand. Some tasks benefit from scheduled execution (daily summaries, monitoring, reminders). @@ -570,7 +562,7 @@ These items need to be addressed ASAP: --- -## 15. Text-to-Speech (TTS) 🔊 +## 12. Text-to-Speech (TTS) 🔊 **Problem:** Agent can only respond with text. Some users prefer audio responses (accessibility, hands-free use, podcasts). @@ -601,7 +593,7 @@ These items need to be addressed ASAP: --- -## 16. Speech-to-Text / Audio Transcription 🎤 +## 13. Speech-to-Text / Audio Transcription 🎤 **Problem:** Users may want to send voice memos instead of typing. Agent is blind to audio content. diff --git a/run_agent.py b/run_agent.py index 2bd68d321..963a9db4f 100644 --- a/run_agent.py +++ b/run_agent.py @@ -980,6 +980,49 @@ class AIAgent: # Check if there's any non-whitespace content remaining return bool(cleaned.strip()) + def _extract_reasoning(self, assistant_message) -> Optional[str]: + """ + Extract reasoning/thinking content from an assistant message. + + OpenRouter and various providers can return reasoning in multiple formats: + 1. message.reasoning - Direct reasoning field (DeepSeek, Qwen, etc.) + 2. message.reasoning_content - Alternative field (Moonshot AI, Novita, etc.) + 3. message.reasoning_details - Array of {type, summary, ...} objects (OpenRouter unified) + + Args: + assistant_message: The assistant message object from the API response + + Returns: + Combined reasoning text, or None if no reasoning found + """ + reasoning_parts = [] + + # Check direct reasoning field + if hasattr(assistant_message, 'reasoning') and assistant_message.reasoning: + reasoning_parts.append(assistant_message.reasoning) + + # Check reasoning_content field (alternative name used by some providers) + if hasattr(assistant_message, 'reasoning_content') and assistant_message.reasoning_content: + # Don't duplicate if same as reasoning + if assistant_message.reasoning_content not in reasoning_parts: + reasoning_parts.append(assistant_message.reasoning_content) + + # Check reasoning_details array (OpenRouter unified format) + # Format: [{"type": "reasoning.summary", "summary": "...", ...}, ...] + if hasattr(assistant_message, 'reasoning_details') and assistant_message.reasoning_details: + for detail in assistant_message.reasoning_details: + if isinstance(detail, dict): + # Extract summary from reasoning detail object + summary = detail.get('summary') or detail.get('content') or detail.get('text') + if summary and summary not in reasoning_parts: + reasoning_parts.append(summary) + + # Combine all reasoning parts + if reasoning_parts: + return "\n\n".join(reasoning_parts) + + return None + def _get_messages_up_to_last_assistant(self, messages: List[Dict]) -> List[Dict]: """ Get messages up to (but not including) the last assistant turn. @@ -1318,22 +1361,20 @@ class AIAgent: for msg in messages: api_msg = msg.copy() - # For assistant messages with tool_calls, providers require 'reasoning_content' field - # Extract reasoning from our stored 'reasoning' field and add it as 'reasoning_content' - if msg.get("role") == "assistant" and msg.get("tool_calls"): + # For ALL assistant messages, pass reasoning back to the API + # This ensures multi-turn reasoning context is preserved + if msg.get("role") == "assistant": reasoning_text = msg.get("reasoning") if reasoning_text: - # Add reasoning_content for API compatibility (Moonshot AI, Novita, etc.) + # Add reasoning_content for API compatibility (Moonshot AI, Novita, OpenRouter) api_msg["reasoning_content"] = reasoning_text # Remove 'reasoning' field - it's for trajectory storage only - # The reasoning is already in the content via tags AND - # we've added reasoning_content for API compatibility above + # We've copied it to 'reasoning_content' for the API above if "reasoning" in api_msg: api_msg.pop("reasoning") - # Remove 'reasoning_details' if present - we use reasoning_content instead - if "reasoning_details" in api_msg: - api_msg.pop("reasoning_details") + # Keep 'reasoning_details' - OpenRouter uses this for multi-turn reasoning context + # The signature field helps maintain reasoning continuity api_messages.append(api_msg) if active_system_prompt: @@ -1694,14 +1735,16 @@ class AIAgent: # Reset retry counter on successful JSON validation self._invalid_json_retries = 0 - # Extract reasoning from response if available (for reasoning models like minimax, kimi, etc.) - # Extract reasoning from response for storage - # The reasoning_content field will be added when preparing API messages - reasoning_text = None - if hasattr(assistant_message, 'reasoning') and assistant_message.reasoning: - reasoning_text = assistant_message.reasoning - elif hasattr(assistant_message, 'reasoning_content') and assistant_message.reasoning_content: - reasoning_text = assistant_message.reasoning_content + # Extract reasoning from response if available + # OpenRouter can return reasoning in multiple formats: + # 1. message.reasoning - direct reasoning field + # 2. message.reasoning_content - alternative field (some providers) + # 3. message.reasoning_details - array with {summary: "..."} objects + reasoning_text = self._extract_reasoning(assistant_message) + + if reasoning_text and self.verbose_logging: + preview = reasoning_text[:100] + "..." if len(reasoning_text) > 100 else reasoning_text + logging.debug(f"Captured reasoning ({len(reasoning_text)} chars): {preview}") # Build assistant message with tool calls # Content stays as-is; reasoning is stored separately and will be passed @@ -1723,6 +1766,14 @@ class AIAgent: ] } + # Store reasoning_details for multi-turn reasoning context (OpenRouter) + if hasattr(assistant_message, 'reasoning_details') and assistant_message.reasoning_details: + assistant_msg["reasoning_details"] = [ + {"type": d.get("type"), "text": d.get("text"), "signature": d.get("signature")} + for d in assistant_message.reasoning_details + if isinstance(d, dict) + ] + messages.append(assistant_msg) # Execute each tool call @@ -1810,6 +1861,10 @@ class AIAgent: current_tokens=self.context_compressor.last_prompt_tokens ) + # Save session log incrementally (so progress is visible even if interrupted) + self._session_messages = messages + self._save_session_log(messages) + # Continue loop for next response continue @@ -1865,11 +1920,11 @@ class AIAgent: self._empty_content_retries = 0 # Extract reasoning from response if available - reasoning_text = None - if hasattr(assistant_message, 'reasoning') and assistant_message.reasoning: - reasoning_text = assistant_message.reasoning - elif hasattr(assistant_message, 'reasoning_content') and assistant_message.reasoning_content: - reasoning_text = assistant_message.reasoning_content + reasoning_text = self._extract_reasoning(assistant_message) + + if reasoning_text and self.verbose_logging: + preview = reasoning_text[:100] + "..." if len(reasoning_text) > 100 else reasoning_text + logging.debug(f"Captured final reasoning ({len(reasoning_text)} chars): {preview}") # Build final assistant message # Content stays as-is; reasoning stored separately for trajectory extraction @@ -1879,6 +1934,14 @@ class AIAgent: "reasoning": reasoning_text # Stored for trajectory extraction } + # Store reasoning_details for multi-turn reasoning context (OpenRouter) + if hasattr(assistant_message, 'reasoning_details') and assistant_message.reasoning_details: + final_msg["reasoning_details"] = [ + {"type": d.get("type"), "text": d.get("text"), "signature": d.get("signature")} + for d in assistant_message.reasoning_details + if isinstance(d, dict) + ] + messages.append(final_msg) if not self.quiet_mode: