Implement reasoning extraction and enhance assistant message handling
- Added a new method `_extract_reasoning` to extract reasoning content from assistant messages, accommodating multiple formats from various providers. - Updated message handling to ensure all assistant messages include reasoning content for API compatibility, preserving multi-turn reasoning context. - Enhanced logging to capture reasoning details for debugging and analysis. - Modified the TODO.md to reflect changes in planning and task management, emphasizing the need for structured task decomposition and progress tracking.
This commit is contained in:
94
TODO.md
94
TODO.md
@@ -177,56 +177,48 @@ These items need to be addressed ASAP:
|
||||
|
||||
---
|
||||
|
||||
## 2. Context Management (complements Subagents)
|
||||
## 2. Planning & Task Management 📋
|
||||
|
||||
**Problem:** Context grows unbounded during long conversations. Trajectory compression exists for training data post-hoc, but live conversations lack intelligent context management.
|
||||
**Problem:** Agent handles tasks reactively without explicit planning. Complex multi-step tasks lack structure, progress tracking, and the ability to decompose work into manageable chunks.
|
||||
|
||||
**Ideas:**
|
||||
- [ ] **Incremental summarization** - Compress old tool outputs on-the-fly during conversations
|
||||
- Trigger when context exceeds threshold (e.g., 80% of max tokens)
|
||||
- Preserve recent turns fully, summarize older tool responses
|
||||
- Could reuse logic from `trajectory_compressor.py`
|
||||
- [ ] **Task decomposition tool** - Break complex requests into subtasks:
|
||||
```
|
||||
User: "Set up a new Python project with FastAPI, tests, and Docker"
|
||||
|
||||
- [ ] **Semantic memory retrieval** - Vector store for long conversation recall
|
||||
- Embed important facts/findings as conversation progresses
|
||||
- Retrieve relevant memories when needed instead of keeping everything in context
|
||||
- Consider lightweight solutions: ChromaDB, FAISS, or even a simple embedding cache
|
||||
Agent creates plan:
|
||||
├── 1. Create project structure and requirements.txt
|
||||
├── 2. Implement FastAPI app skeleton
|
||||
├── 3. Add pytest configuration and initial tests
|
||||
├── 4. Create Dockerfile and docker-compose.yml
|
||||
└── 5. Verify everything works together
|
||||
```
|
||||
- Each subtask becomes a trackable unit
|
||||
- Agent can report progress: "Completed 3/5 tasks"
|
||||
|
||||
- [ ] **Working vs. episodic memory** distinction
|
||||
- Working memory: Current task state, recent tool results (always in context)
|
||||
- Episodic memory: Past findings, tried approaches (retrieved on demand)
|
||||
- Clear eviction policies for each
|
||||
- [ ] **Progress checkpoints** - Periodic self-assessment:
|
||||
- After N tool calls or time elapsed, pause to evaluate
|
||||
- "What have I accomplished? What remains? Am I on track?"
|
||||
- Detect if stuck in loops or making no progress
|
||||
- Could trigger replanning if approach isn't working
|
||||
|
||||
- [ ] **Explicit plan storage** - Persist plan in conversation:
|
||||
- Store as structured data (not just in context)
|
||||
- Update status as tasks complete
|
||||
- User can ask "What's the plan?" or "What's left?"
|
||||
- Survives context compression (plans are protected)
|
||||
|
||||
**Files to modify:** `run_agent.py` (add memory manager), possibly new `tools/memory_tool.py`
|
||||
- [ ] **Failure recovery with replanning** - When things go wrong:
|
||||
- Record what failed and why
|
||||
- Revise plan to work around the issue
|
||||
- "Step 3 failed because X, adjusting approach to Y"
|
||||
- Prevents repeating failed strategies
|
||||
|
||||
**Files to modify:** `run_agent.py` (add planning hooks), new `tools/planning_tool.py`
|
||||
|
||||
---
|
||||
|
||||
## 3. Self-Reflection & Course Correction 🔄
|
||||
|
||||
**Problem:** Current retry logic handles malformed outputs but not semantic failures. Agent doesn't reason about *why* something failed.
|
||||
|
||||
**Ideas:**
|
||||
- [ ] **Meta-reasoning after failures** - When a tool returns an error or unexpected result:
|
||||
```
|
||||
Tool failed → Reflect: "Why did this fail? What assumptions were wrong?"
|
||||
→ Adjust approach → Retry with new strategy
|
||||
```
|
||||
- Could be a lightweight LLM call or structured self-prompt
|
||||
|
||||
- [ ] **Planning/replanning module** - For complex multi-step tasks:
|
||||
- Generate plan before execution
|
||||
- After each step, evaluate: "Am I on track? Should I revise the plan?"
|
||||
- Store plan in working memory, update as needed
|
||||
|
||||
- [ ] **Approach memory** - Remember what didn't work:
|
||||
- "I tried X for this type of problem and it failed because Y"
|
||||
- Prevents repeating failed strategies in the same conversation
|
||||
|
||||
**Files to modify:** `run_agent.py` (add reflection hooks in tool loop), new `tools/reflection_tool.py`
|
||||
|
||||
---
|
||||
|
||||
## 4. Tool Composition & Learning 🔧
|
||||
## 3. Tool Composition & Learning 🔧
|
||||
|
||||
**Problem:** Tools are atomic. Complex tasks require repeated manual orchestration of the same tool sequences.
|
||||
|
||||
@@ -257,7 +249,7 @@ These items need to be addressed ASAP:
|
||||
|
||||
---
|
||||
|
||||
## 5. Dynamic Skills Expansion 📚
|
||||
## 4. Dynamic Skills Expansion 📚
|
||||
|
||||
**Problem:** Skills system is elegant but static. Skills must be manually created and added.
|
||||
|
||||
@@ -286,7 +278,7 @@ These items need to be addressed ASAP:
|
||||
|
||||
---
|
||||
|
||||
## 6. Task Continuation Hints 🎯
|
||||
## 5. Task Continuation Hints 🎯
|
||||
|
||||
**Problem:** Could be more helpful by suggesting logical next steps.
|
||||
|
||||
@@ -336,7 +328,7 @@ These items need to be addressed ASAP:
|
||||
|
||||
---
|
||||
|
||||
## 8. Resource Awareness & Efficiency 💰
|
||||
## 6. Resource Awareness & Efficiency 💰
|
||||
|
||||
**Problem:** No awareness of costs, time, or resource usage. Could be smarter about efficiency.
|
||||
|
||||
@@ -373,7 +365,7 @@ These items need to be addressed ASAP:
|
||||
|
||||
---
|
||||
|
||||
## 10. Project-Local Context 💾
|
||||
## 7. Project-Local Context 💾
|
||||
|
||||
**Problem:** Valuable context lost between sessions.
|
||||
|
||||
@@ -393,7 +385,7 @@ These items need to be addressed ASAP:
|
||||
|
||||
---
|
||||
|
||||
## 11. Graceful Degradation & Robustness 🛡️
|
||||
## 8. Graceful Degradation & Robustness 🛡️
|
||||
|
||||
**Problem:** When things go wrong, recovery is limited. Should fail gracefully.
|
||||
|
||||
@@ -414,7 +406,7 @@ These items need to be addressed ASAP:
|
||||
|
||||
---
|
||||
|
||||
## 12. Tools & Skills Wishlist 🧰
|
||||
## 9. Tools & Skills Wishlist 🧰
|
||||
|
||||
*Things that would need new tool implementations (can't do well with current tools):*
|
||||
|
||||
@@ -481,7 +473,7 @@ These items need to be addressed ASAP:
|
||||
|
||||
---
|
||||
|
||||
## 13. Messaging Platform Integrations 💬
|
||||
## 10. Messaging Platform Integrations 💬
|
||||
|
||||
**Problem:** Agent currently only works via `cli.py` which requires direct terminal access. Users may want to interact via messaging apps from their phone or other devices.
|
||||
|
||||
@@ -525,7 +517,7 @@ These items need to be addressed ASAP:
|
||||
|
||||
---
|
||||
|
||||
## 14. Scheduled Tasks / Cron Jobs ⏰
|
||||
## 11. Scheduled Tasks / Cron Jobs ⏰
|
||||
|
||||
**Problem:** Agent only runs on-demand. Some tasks benefit from scheduled execution (daily summaries, monitoring, reminders).
|
||||
|
||||
@@ -570,7 +562,7 @@ These items need to be addressed ASAP:
|
||||
|
||||
---
|
||||
|
||||
## 15. Text-to-Speech (TTS) 🔊
|
||||
## 12. Text-to-Speech (TTS) 🔊
|
||||
|
||||
**Problem:** Agent can only respond with text. Some users prefer audio responses (accessibility, hands-free use, podcasts).
|
||||
|
||||
@@ -601,7 +593,7 @@ These items need to be addressed ASAP:
|
||||
|
||||
---
|
||||
|
||||
## 16. Speech-to-Text / Audio Transcription 🎤
|
||||
## 13. Speech-to-Text / Audio Transcription 🎤
|
||||
|
||||
**Problem:** Users may want to send voice memos instead of typing. Agent is blind to audio content.
|
||||
|
||||
|
||||
107
run_agent.py
107
run_agent.py
@@ -980,6 +980,49 @@ class AIAgent:
|
||||
# Check if there's any non-whitespace content remaining
|
||||
return bool(cleaned.strip())
|
||||
|
||||
def _extract_reasoning(self, assistant_message) -> Optional[str]:
|
||||
"""
|
||||
Extract reasoning/thinking content from an assistant message.
|
||||
|
||||
OpenRouter and various providers can return reasoning in multiple formats:
|
||||
1. message.reasoning - Direct reasoning field (DeepSeek, Qwen, etc.)
|
||||
2. message.reasoning_content - Alternative field (Moonshot AI, Novita, etc.)
|
||||
3. message.reasoning_details - Array of {type, summary, ...} objects (OpenRouter unified)
|
||||
|
||||
Args:
|
||||
assistant_message: The assistant message object from the API response
|
||||
|
||||
Returns:
|
||||
Combined reasoning text, or None if no reasoning found
|
||||
"""
|
||||
reasoning_parts = []
|
||||
|
||||
# Check direct reasoning field
|
||||
if hasattr(assistant_message, 'reasoning') and assistant_message.reasoning:
|
||||
reasoning_parts.append(assistant_message.reasoning)
|
||||
|
||||
# Check reasoning_content field (alternative name used by some providers)
|
||||
if hasattr(assistant_message, 'reasoning_content') and assistant_message.reasoning_content:
|
||||
# Don't duplicate if same as reasoning
|
||||
if assistant_message.reasoning_content not in reasoning_parts:
|
||||
reasoning_parts.append(assistant_message.reasoning_content)
|
||||
|
||||
# Check reasoning_details array (OpenRouter unified format)
|
||||
# Format: [{"type": "reasoning.summary", "summary": "...", ...}, ...]
|
||||
if hasattr(assistant_message, 'reasoning_details') and assistant_message.reasoning_details:
|
||||
for detail in assistant_message.reasoning_details:
|
||||
if isinstance(detail, dict):
|
||||
# Extract summary from reasoning detail object
|
||||
summary = detail.get('summary') or detail.get('content') or detail.get('text')
|
||||
if summary and summary not in reasoning_parts:
|
||||
reasoning_parts.append(summary)
|
||||
|
||||
# Combine all reasoning parts
|
||||
if reasoning_parts:
|
||||
return "\n\n".join(reasoning_parts)
|
||||
|
||||
return None
|
||||
|
||||
def _get_messages_up_to_last_assistant(self, messages: List[Dict]) -> List[Dict]:
|
||||
"""
|
||||
Get messages up to (but not including) the last assistant turn.
|
||||
@@ -1318,22 +1361,20 @@ class AIAgent:
|
||||
for msg in messages:
|
||||
api_msg = msg.copy()
|
||||
|
||||
# For assistant messages with tool_calls, providers require 'reasoning_content' field
|
||||
# Extract reasoning from our stored 'reasoning' field and add it as 'reasoning_content'
|
||||
if msg.get("role") == "assistant" and msg.get("tool_calls"):
|
||||
# For ALL assistant messages, pass reasoning back to the API
|
||||
# This ensures multi-turn reasoning context is preserved
|
||||
if msg.get("role") == "assistant":
|
||||
reasoning_text = msg.get("reasoning")
|
||||
if reasoning_text:
|
||||
# Add reasoning_content for API compatibility (Moonshot AI, Novita, etc.)
|
||||
# Add reasoning_content for API compatibility (Moonshot AI, Novita, OpenRouter)
|
||||
api_msg["reasoning_content"] = reasoning_text
|
||||
|
||||
# Remove 'reasoning' field - it's for trajectory storage only
|
||||
# The reasoning is already in the content via <think> tags AND
|
||||
# we've added reasoning_content for API compatibility above
|
||||
# We've copied it to 'reasoning_content' for the API above
|
||||
if "reasoning" in api_msg:
|
||||
api_msg.pop("reasoning")
|
||||
# Remove 'reasoning_details' if present - we use reasoning_content instead
|
||||
if "reasoning_details" in api_msg:
|
||||
api_msg.pop("reasoning_details")
|
||||
# Keep 'reasoning_details' - OpenRouter uses this for multi-turn reasoning context
|
||||
# The signature field helps maintain reasoning continuity
|
||||
api_messages.append(api_msg)
|
||||
|
||||
if active_system_prompt:
|
||||
@@ -1694,14 +1735,16 @@ class AIAgent:
|
||||
# Reset retry counter on successful JSON validation
|
||||
self._invalid_json_retries = 0
|
||||
|
||||
# Extract reasoning from response if available (for reasoning models like minimax, kimi, etc.)
|
||||
# Extract reasoning from response for storage
|
||||
# The reasoning_content field will be added when preparing API messages
|
||||
reasoning_text = None
|
||||
if hasattr(assistant_message, 'reasoning') and assistant_message.reasoning:
|
||||
reasoning_text = assistant_message.reasoning
|
||||
elif hasattr(assistant_message, 'reasoning_content') and assistant_message.reasoning_content:
|
||||
reasoning_text = assistant_message.reasoning_content
|
||||
# Extract reasoning from response if available
|
||||
# OpenRouter can return reasoning in multiple formats:
|
||||
# 1. message.reasoning - direct reasoning field
|
||||
# 2. message.reasoning_content - alternative field (some providers)
|
||||
# 3. message.reasoning_details - array with {summary: "..."} objects
|
||||
reasoning_text = self._extract_reasoning(assistant_message)
|
||||
|
||||
if reasoning_text and self.verbose_logging:
|
||||
preview = reasoning_text[:100] + "..." if len(reasoning_text) > 100 else reasoning_text
|
||||
logging.debug(f"Captured reasoning ({len(reasoning_text)} chars): {preview}")
|
||||
|
||||
# Build assistant message with tool calls
|
||||
# Content stays as-is; reasoning is stored separately and will be passed
|
||||
@@ -1723,6 +1766,14 @@ class AIAgent:
|
||||
]
|
||||
}
|
||||
|
||||
# Store reasoning_details for multi-turn reasoning context (OpenRouter)
|
||||
if hasattr(assistant_message, 'reasoning_details') and assistant_message.reasoning_details:
|
||||
assistant_msg["reasoning_details"] = [
|
||||
{"type": d.get("type"), "text": d.get("text"), "signature": d.get("signature")}
|
||||
for d in assistant_message.reasoning_details
|
||||
if isinstance(d, dict)
|
||||
]
|
||||
|
||||
messages.append(assistant_msg)
|
||||
|
||||
# Execute each tool call
|
||||
@@ -1810,6 +1861,10 @@ class AIAgent:
|
||||
current_tokens=self.context_compressor.last_prompt_tokens
|
||||
)
|
||||
|
||||
# Save session log incrementally (so progress is visible even if interrupted)
|
||||
self._session_messages = messages
|
||||
self._save_session_log(messages)
|
||||
|
||||
# Continue loop for next response
|
||||
continue
|
||||
|
||||
@@ -1865,11 +1920,11 @@ class AIAgent:
|
||||
self._empty_content_retries = 0
|
||||
|
||||
# Extract reasoning from response if available
|
||||
reasoning_text = None
|
||||
if hasattr(assistant_message, 'reasoning') and assistant_message.reasoning:
|
||||
reasoning_text = assistant_message.reasoning
|
||||
elif hasattr(assistant_message, 'reasoning_content') and assistant_message.reasoning_content:
|
||||
reasoning_text = assistant_message.reasoning_content
|
||||
reasoning_text = self._extract_reasoning(assistant_message)
|
||||
|
||||
if reasoning_text and self.verbose_logging:
|
||||
preview = reasoning_text[:100] + "..." if len(reasoning_text) > 100 else reasoning_text
|
||||
logging.debug(f"Captured final reasoning ({len(reasoning_text)} chars): {preview}")
|
||||
|
||||
# Build final assistant message
|
||||
# Content stays as-is; reasoning stored separately for trajectory extraction
|
||||
@@ -1879,6 +1934,14 @@ class AIAgent:
|
||||
"reasoning": reasoning_text # Stored for trajectory extraction
|
||||
}
|
||||
|
||||
# Store reasoning_details for multi-turn reasoning context (OpenRouter)
|
||||
if hasattr(assistant_message, 'reasoning_details') and assistant_message.reasoning_details:
|
||||
final_msg["reasoning_details"] = [
|
||||
{"type": d.get("type"), "text": d.get("text"), "signature": d.get("signature")}
|
||||
for d in assistant_message.reasoning_details
|
||||
if isinstance(d, dict)
|
||||
]
|
||||
|
||||
messages.append(final_msg)
|
||||
|
||||
if not self.quiet_mode:
|
||||
|
||||
Reference in New Issue
Block a user