# DeerFlow Evaluation — Autonomous Research Orchestration Layer **Status:** No-go for full adoption · Selective borrowing recommended **Date:** 2026-03-23 **Issue:** #1283 (spawned from #1275 screenshot triage) **Refs:** #972 (Timmy research pipeline) · #975 (ResearchOrchestrator) --- ## What Is DeerFlow? DeerFlow (`bytedance/deer-flow`) is an open-source "super-agent harness" built by ByteDance on top of LangGraph. It provides a production-grade multi-agent research and code-execution framework with a web UI, REST API, Docker deployment, and optional IM channel integration (Telegram, Slack, Feishu/Lark). - **Stars:** ~39,600 · **License:** MIT - **Stack:** Python 3.12+ (backend) · TypeScript/Next.js (frontend) · LangGraph runtime - **Entry point:** `http://localhost:2026` (Nginx reverse proxy, configurable via `PORT`) --- ## Research Questions — Answers ### 1. Agent Roles DeerFlow uses a two-tier architecture: | Role | Description | |------|-------------| | **Lead Agent** | Entry point; decomposes tasks, dispatches sub-agents, synthesizes results | | **Sub-Agent (general-purpose)** | All tools except `task`; spawned dynamically | | **Sub-Agent (bash)** | Command-execution specialist | The lead agent runs through a 12-middleware chain in order: thread setup → uploads → sandbox → tool-call repair → guardrails → summarization → todo tracking → title generation → memory update → image injection → sub-agent concurrency cap → clarification intercept. **Concurrency:** up to 3 sub-agents in parallel (configurable), 15-minute default timeout each, structured SSE event stream (`task_started` / `task_running` / `task_completed` / `task_failed`). **Mapping to Timmy personas:** DeerFlow's lead/sub-agent split roughly maps to Timmy's orchestrator + specialist-agent pattern. DeerFlow doesn't have named personas — it routes by capability (tools available to the agent type), not by identity. Timmy's persona system is richer and more opinionated. --- ### 2. API Surface DeerFlow exposes a full REST API at port 2026 (via Nginx). **No authentication by default.** **Core integration endpoints:** | Endpoint | Method | Purpose | |----------|--------|---------| | `POST /api/langgraph/threads` | | Create conversation thread | | `POST /api/langgraph/threads/{id}/runs` | | Submit task (blocking) | | `POST /api/langgraph/threads/{id}/runs/stream` | | Submit task (streaming SSE/WS) | | `GET /api/langgraph/threads/{id}/state` | | Get full thread state + artifacts | | `GET /api/models` | | List configured models | | `GET /api/threads/{id}/artifacts/{path}` | | Download generated artifacts | | `DELETE /api/threads/{id}` | | Clean up thread data | These are callable from Timmy with `httpx` — no special client library needed. --- ### 3. LLM Backend Support DeerFlow uses LangChain model classes declared in `config.yaml`. **Documented providers:** OpenAI, Anthropic, Google Gemini, DeepSeek, Doubao (ByteDance), Kimi/Moonshot, OpenRouter, MiniMax, Novita AI, Claude Code (OAuth). **Ollama:** Not in official documentation, but works via the `langchain_openai:ChatOpenAI` class with `base_url: http://localhost:11434/v1` and a dummy API key. Community-confirmed (GitHub issues #37, #1004) with Qwen2.5, Llama 3.1, and DeepSeek-R1. **vLLM:** Not documented, but architecturally identical — vLLM exposes an OpenAI-compatible endpoint. Should work with the same `base_url` override. **Practical caveat:** The lead agent requires strong instruction-following for consistent tool use and structured output. Community findings suggest ≥14B parameter models (Qwen2.5-14B minimum) for reliable orchestration. Our current `qwen3:14b` should be viable. --- ### 4. License **MIT License** — Copyright 2025 ByteDance Ltd. and DeerFlow Authors 2025–2026. Permissive: use, modify, distribute, commercialize freely. Attribution required. No warranty. **Compatible with Timmy's use case.** No CLA, no copyleft, no commercial restrictions. --- ### 5. Docker Port Conflicts DeerFlow's Docker Compose exposes a single host port: | Service | Host Port | Notes | |---------|-----------|-------| | Nginx (entry point) | **2026** (configurable via `PORT`) | Only externally exposed port | | Frontend (Next.js) | 3000 | Internal only | | Gateway API | 8001 | Internal only | | LangGraph runtime | 2024 | Internal only | | Provisioner (optional) | 8002 | Internal only, Kubernetes mode only | Timmy's existing Docker Compose exposes: - **8000** — dashboard (FastAPI) - **8080** — openfang (via `openfang` profile) - **11434** — Ollama (host process, not containerized) **No conflict.** Port 2026 is not used by Timmy. DeerFlow can run alongside the existing stack without modification. --- ## Full Capability Comparison | Capability | DeerFlow | Timmy (`research.py`) | |------------|----------|-----------------------| | Multi-agent fan-out | ✅ 3 concurrent sub-agents | ❌ Sequential only | | Web search | ✅ Tavily / InfoQuest | ✅ `research_tools.py` | | Web fetch | ✅ Jina AI / Firecrawl | ✅ trafilatura | | Code execution (sandbox) | ✅ Local / Docker / K8s | ❌ Not implemented | | Artifact generation | ✅ HTML, Markdown, slides | ❌ Markdown report only | | Document upload + conversion | ✅ PDF, PPT, Excel, Word | ❌ Not implemented | | Long-term memory | ✅ LLM-extracted facts, persistent | ✅ SQLite semantic cache | | Streaming results | ✅ SSE + WebSocket | ❌ Blocking call | | Web UI | ✅ Next.js included | ✅ Jinja2/HTMX dashboard | | IM integration | ✅ Telegram, Slack, Feishu | ✅ Telegram, Discord | | Ollama backend | ✅ (via config, community-confirmed) | ✅ Native | | Persona system | ❌ Role-based only | ✅ Named personas | | Semantic cache tier | ❌ Not implemented | ✅ SQLite (Tier 4) | | Free-tier cascade | ❌ Not applicable | 🔲 Planned (Groq, #980) | | Python version requirement | 3.12+ | 3.11+ | | Lock-in | LangGraph + LangChain | None | --- ## Integration Options Assessment ### Option A — Full Adoption (replace `research.py`) **Verdict: Not recommended.** DeerFlow is a substantial full-stack system (Python + Node.js, Docker, Nginx, LangGraph). Adopting it fully would: - Replace Timmy's custom cascade tier system (SQLite cache → Ollama → Claude API → Groq) with a single-tier LangChain model config - Lose Timmy's persona-aware research routing - Add Python 3.12+ dependency (Timmy currently targets 3.11+) - Introduce LangGraph/LangChain lock-in for all research tasks - Require running a parallel Node.js frontend process (redundant given Timmy's own UI) ### Option B — Sidecar for Heavy Research (call DeerFlow's API from Timmy) **Verdict: Viable but over-engineered for current needs.** DeerFlow could run as an optional sidecar (`docker compose --profile deerflow up`) and Timmy could delegate multi-agent research tasks via `POST /api/langgraph/threads/{id}/runs`. This would unlock parallel sub-agent fan-out and code-execution sandboxing without replacing Timmy's stack. The integration would be ~50 lines of `httpx` code in a new `DeerFlowClient` adapter. The `ResearchOrchestrator` in `research.py` could route tasks above a complexity threshold to DeerFlow. **Barrier:** DeerFlow's lack of default authentication means the sidecar would need to be network-isolated (internal Docker network only) or firewalled. Also, DeerFlow's Ollama integration is community-maintained, not officially supported — risk of breaking on upstream updates. ### Option C — Selective Borrowing (copy patterns, not code) **Verdict: Recommended.** DeerFlow's architecture reveals concrete gaps in Timmy's current pipeline that are worth addressing independently: | DeerFlow Pattern | Timmy Gap to Close | Implementation Path | |------------------|--------------------|---------------------| | Parallel sub-agent fan-out | Research is sequential | Add `asyncio.gather()` to `ResearchOrchestrator` for concurrent query execution | | `SummarizationMiddleware` | Long contexts blow token budget | Add a context-trimming step in the synthesis cascade | | `TodoListMiddleware` | No progress tracking during long research | Wire into the dashboard task panel | | Artifact storage + serving | Reports are ephemeral (not persistently downloadable) | Add file-based artifact store to `research.py` (issue #976 already planned) | | Skill modules (Markdown-based) | Research templates are `.md` files — same pattern | Already done in `skills/research/` | | MCP integration | Research tools are hard-coded | Add MCP server discovery to `research_tools.py` for pluggable tool backends | --- ## Recommendation **No-go for full adoption or sidecar deployment at this stage.** Timmy's `ResearchOrchestrator` already covers the core pipeline (query → search → fetch → synthesize → store). DeerFlow's value proposition is primarily the parallel sub-agent fan-out and code-execution sandbox — capabilities that are useful but not blocking Timmy's current roadmap. **Recommended actions:** 1. **Close the parallelism gap (high value, low effort):** Refactor `ResearchOrchestrator` to execute queries concurrently with `asyncio.gather()`. This delivers DeerFlow's most impactful capability without any new dependencies. 2. **Re-evaluate after #980 and #981 are done:** Once Timmy has the Groq free-tier cascade and a sovereignty metrics dashboard, we'll have a clearer picture of whether the custom orchestrator is performing well enough to make DeerFlow unnecessary entirely. 3. **File a follow-up for MCP tool integration:** DeerFlow's use of `langchain-mcp-adapters` for pluggable tool backends is the most architecturally interesting pattern. Adding MCP server discovery to `research_tools.py` would give Timmy the same extensibility without LangGraph lock-in. 4. **Revisit DeerFlow's code-execution sandbox if #978 (Paperclip task runner) proves insufficient:** DeerFlow's sandboxed `bash` tool is production-tested and well-isolated. If Timmy's task runner needs secure code execution, DeerFlow's sandbox implementation is worth borrowing or wrapping. --- ## Follow-up Issues to File | Issue | Title | Priority | |-------|-------|----------| | New | Parallelize ResearchOrchestrator query execution (`asyncio.gather`) | Medium | | New | Add context-trimming step to synthesis cascade | Low | | New | MCP server discovery in `research_tools.py` | Low | | #976 | Semantic index for research outputs (already planned) | High |