2026-03-24 01:56:38 +00:00
1 changed files with 190 additions and 0 deletions
--- a/docs/research/deerflow-evaluation.md
+++ b/docs/research/deerflow-evaluation.md
@@ -0,0 +1,190 @@
+# DeerFlow Evaluation — Autonomous Research Orchestration Layer
+
+**Status:** No-go for full adoption · Selective borrowing recommended
+**Date:** 2026-03-23
+**Issue:** #1283 (spawned from #1275 screenshot triage)
+**Refs:** #972 (Timmy research pipeline) · #975 (ResearchOrchestrator)
+
+---
+
+## What Is DeerFlow?
+
+DeerFlow (`bytedance/deer-flow`) is an open-source "super-agent harness" built by ByteDance on top of LangGraph. It provides a production-grade multi-agent research and code-execution framework with a web UI, REST API, Docker deployment, and optional IM channel integration (Telegram, Slack, Feishu/Lark).
+
+- **Stars:** ~39,600 · **License:** MIT
+- **Stack:** Python 3.12+ (backend) · TypeScript/Next.js (frontend) · LangGraph runtime
+- **Entry point:** `http://localhost:2026` (Nginx reverse proxy, configurable via `PORT`)
+
+---
+
+## Research Questions — Answers
+
+### 1. Agent Roles
+
+DeerFlow uses a two-tier architecture:
+
+| Role | Description |
+|------|-------------|
+| **Lead Agent** | Entry point; decomposes tasks, dispatches sub-agents, synthesizes results |
+| **Sub-Agent (general-purpose)** | All tools except `task`; spawned dynamically |
+| **Sub-Agent (bash)** | Command-execution specialist |
+
+The lead agent runs through a 12-middleware chain in order: thread setup → uploads → sandbox → tool-call repair → guardrails → summarization → todo tracking → title generation → memory update → image injection → sub-agent concurrency cap → clarification intercept.
+
+**Concurrency:** up to 3 sub-agents in parallel (configurable), 15-minute default timeout each, structured SSE event stream (`task_started` / `task_running` / `task_completed` / `task_failed`).
+
+**Mapping to Timmy personas:** DeerFlow's lead/sub-agent split roughly maps to Timmy's orchestrator + specialist-agent pattern. DeerFlow doesn't have named personas — it routes by capability (tools available to the agent type), not by identity. Timmy's persona system is richer and more opinionated.
+
+---
+
+### 2. API Surface
+
+DeerFlow exposes a full REST API at port 2026 (via Nginx). **No authentication by default.**
+
+**Core integration endpoints:**
+
+| Endpoint | Method | Purpose |
+|----------|--------|---------|
+| `POST /api/langgraph/threads` | | Create conversation thread |
+| `POST /api/langgraph/threads/{id}/runs` | | Submit task (blocking) |
+| `POST /api/langgraph/threads/{id}/runs/stream` | | Submit task (streaming SSE/WS) |
+| `GET /api/langgraph/threads/{id}/state` | | Get full thread state + artifacts |
+| `GET /api/models` | | List configured models |
+| `GET /api/threads/{id}/artifacts/{path}` | | Download generated artifacts |
+| `DELETE /api/threads/{id}` | | Clean up thread data |
+
+These are callable from Timmy with `httpx` — no special client library needed.
+
+---
+
+### 3. LLM Backend Support
+
+DeerFlow uses LangChain model classes declared in `config.yaml`.
+
+**Documented providers:** OpenAI, Anthropic, Google Gemini, DeepSeek, Doubao (ByteDance), Kimi/Moonshot, OpenRouter, MiniMax, Novita AI, Claude Code (OAuth).
+
+**Ollama:** Not in official documentation, but works via the `langchain_openai:ChatOpenAI` class with `base_url: http://localhost:11434/v1` and a dummy API key. Community-confirmed (GitHub issues #37, #1004) with Qwen2.5, Llama 3.1, and DeepSeek-R1.
+
+**vLLM:** Not documented, but architecturally identical — vLLM exposes an OpenAI-compatible endpoint. Should work with the same `base_url` override.
+
+**Practical caveat:** The lead agent requires strong instruction-following for consistent tool use and structured output. Community findings suggest ≥14B parameter models (Qwen2.5-14B minimum) for reliable orchestration. Our current `qwen3:14b` should be viable.
+
+---
+
+### 4. License
+
+**MIT License** — Copyright 2025 ByteDance Ltd. and DeerFlow Authors 2025–2026.
+
+Permissive: use, modify, distribute, commercialize freely. Attribution required. No warranty.
+
+**Compatible with Timmy's use case.** No CLA, no copyleft, no commercial restrictions.
+
+---
+
+### 5. Docker Port Conflicts
+
+DeerFlow's Docker Compose exposes a single host port:
+
+| Service | Host Port | Notes |
+|---------|-----------|-------|
+| Nginx (entry point) | **2026** (configurable via `PORT`) | Only externally exposed port |
+| Frontend (Next.js) | 3000 | Internal only |
+| Gateway API | 8001 | Internal only |
+| LangGraph runtime | 2024 | Internal only |
+| Provisioner (optional) | 8002 | Internal only, Kubernetes mode only |
+
+Timmy's existing Docker Compose exposes:
+- **8000** — dashboard (FastAPI)
+- **8080** — openfang (via `openfang` profile)
+- **11434** — Ollama (host process, not containerized)
+
+**No conflict.** Port 2026 is not used by Timmy. DeerFlow can run alongside the existing stack without modification.
+
+---
+
+## Full Capability Comparison
+
+| Capability | DeerFlow | Timmy (`research.py`) |
+|------------|----------|-----------------------|
+| Multi-agent fan-out | ✅ 3 concurrent sub-agents | ❌ Sequential only |
+| Web search | ✅ Tavily / InfoQuest | ✅ `research_tools.py` |
+| Web fetch | ✅ Jina AI / Firecrawl | ✅ trafilatura |
+| Code execution (sandbox) | ✅ Local / Docker / K8s | ❌ Not implemented |
+| Artifact generation | ✅ HTML, Markdown, slides | ❌ Markdown report only |
+| Document upload + conversion | ✅ PDF, PPT, Excel, Word | ❌ Not implemented |
+| Long-term memory | ✅ LLM-extracted facts, persistent | ✅ SQLite semantic cache |
+| Streaming results | ✅ SSE + WebSocket | ❌ Blocking call |
+| Web UI | ✅ Next.js included | ✅ Jinja2/HTMX dashboard |
+| IM integration | ✅ Telegram, Slack, Feishu | ✅ Telegram, Discord |
+| Ollama backend | ✅ (via config, community-confirmed) | ✅ Native |
+| Persona system | ❌ Role-based only | ✅ Named personas |
+| Semantic cache tier | ❌ Not implemented | ✅ SQLite (Tier 4) |
+| Free-tier cascade | ❌ Not applicable | 🔲 Planned (Groq, #980) |
+| Python version requirement | 3.12+ | 3.11+ |
+| Lock-in | LangGraph + LangChain | None |
+
+---
+
+## Integration Options Assessment
+
+### Option A — Full Adoption (replace `research.py`)
+**Verdict: Not recommended.**
+
+DeerFlow is a substantial full-stack system (Python + Node.js, Docker, Nginx, LangGraph). Adopting it fully would:
+- Replace Timmy's custom cascade tier system (SQLite cache → Ollama → Claude API → Groq) with a single-tier LangChain model config
+- Lose Timmy's persona-aware research routing
+- Add Python 3.12+ dependency (Timmy currently targets 3.11+)
+- Introduce LangGraph/LangChain lock-in for all research tasks
+- Require running a parallel Node.js frontend process (redundant given Timmy's own UI)
+
+### Option B — Sidecar for Heavy Research (call DeerFlow's API from Timmy)
+**Verdict: Viable but over-engineered for current needs.**
+
+DeerFlow could run as an optional sidecar (`docker compose --profile deerflow up`) and Timmy could delegate multi-agent research tasks via `POST /api/langgraph/threads/{id}/runs`. This would unlock parallel sub-agent fan-out and code-execution sandboxing without replacing Timmy's stack.
+
+The integration would be ~50 lines of `httpx` code in a new `DeerFlowClient` adapter. The `ResearchOrchestrator` in `research.py` could route tasks above a complexity threshold to DeerFlow.
+
+**Barrier:** DeerFlow's lack of default authentication means the sidecar would need to be network-isolated (internal Docker network only) or firewalled. Also, DeerFlow's Ollama integration is community-maintained, not officially supported — risk of breaking on upstream updates.
+
+### Option C — Selective Borrowing (copy patterns, not code)
+**Verdict: Recommended.**
+
+DeerFlow's architecture reveals concrete gaps in Timmy's current pipeline that are worth addressing independently:
+
+| DeerFlow Pattern | Timmy Gap to Close | Implementation Path |
+|------------------|--------------------|---------------------|
+| Parallel sub-agent fan-out | Research is sequential | Add `asyncio.gather()` to `ResearchOrchestrator` for concurrent query execution |
+| `SummarizationMiddleware` | Long contexts blow token budget | Add a context-trimming step in the synthesis cascade |
+| `TodoListMiddleware` | No progress tracking during long research | Wire into the dashboard task panel |
+| Artifact storage + serving | Reports are ephemeral (not persistently downloadable) | Add file-based artifact store to `research.py` (issue #976 already planned) |
+| Skill modules (Markdown-based) | Research templates are `.md` files — same pattern | Already done in `skills/research/` |
+| MCP integration | Research tools are hard-coded | Add MCP server discovery to `research_tools.py` for pluggable tool backends |
+
+---
+
+## Recommendation
+
+**No-go for full adoption or sidecar deployment at this stage.**
+
+Timmy's `ResearchOrchestrator` already covers the core pipeline (query → search → fetch → synthesize → store). DeerFlow's value proposition is primarily the parallel sub-agent fan-out and code-execution sandbox — capabilities that are useful but not blocking Timmy's current roadmap.
+
+**Recommended actions:**
+
+1. **Close the parallelism gap (high value, low effort):** Refactor `ResearchOrchestrator` to execute queries concurrently with `asyncio.gather()`. This delivers DeerFlow's most impactful capability without any new dependencies.
+
+2. **Re-evaluate after #980 and #981 are done:** Once Timmy has the Groq free-tier cascade and a sovereignty metrics dashboard, we'll have a clearer picture of whether the custom orchestrator is performing well enough to make DeerFlow unnecessary entirely.
+
+3. **File a follow-up for MCP tool integration:** DeerFlow's use of `langchain-mcp-adapters` for pluggable tool backends is the most architecturally interesting pattern. Adding MCP server discovery to `research_tools.py` would give Timmy the same extensibility without LangGraph lock-in.
+
+4. **Revisit DeerFlow's code-execution sandbox if #978 (Paperclip task runner) proves insufficient:** DeerFlow's sandboxed `bash` tool is production-tested and well-isolated. If Timmy's task runner needs secure code execution, DeerFlow's sandbox implementation is worth borrowing or wrapping.
+
+---
+
+## Follow-up Issues to File
+
+| Issue | Title | Priority |
+|-------|-------|----------|
+| New | Parallelize ResearchOrchestrator query execution (`asyncio.gather`) | Medium |
+| New | Add context-trimming step to synthesis cascade | Low |
+| New | MCP server discovery in `research_tools.py` | Low |
+| #976 | Semantic index for research outputs (already planned) | High |