10 KiB
DeerFlow Evaluation — Autonomous Research Orchestration Layer
Status: No-go for full adoption · Selective borrowing recommended Date: 2026-03-23 Issue: #1283 (spawned from #1275 screenshot triage) Refs: #972 (Timmy research pipeline) · #975 (ResearchOrchestrator)
What Is DeerFlow?
DeerFlow (bytedance/deer-flow) is an open-source "super-agent harness" built by ByteDance on top of LangGraph. It provides a production-grade multi-agent research and code-execution framework with a web UI, REST API, Docker deployment, and optional IM channel integration (Telegram, Slack, Feishu/Lark).
- Stars: ~39,600 · License: MIT
- Stack: Python 3.12+ (backend) · TypeScript/Next.js (frontend) · LangGraph runtime
- Entry point:
http://localhost:2026(Nginx reverse proxy, configurable viaPORT)
Research Questions — Answers
1. Agent Roles
DeerFlow uses a two-tier architecture:
| Role | Description |
|---|---|
| Lead Agent | Entry point; decomposes tasks, dispatches sub-agents, synthesizes results |
| Sub-Agent (general-purpose) | All tools except task; spawned dynamically |
| Sub-Agent (bash) | Command-execution specialist |
The lead agent runs through a 12-middleware chain in order: thread setup → uploads → sandbox → tool-call repair → guardrails → summarization → todo tracking → title generation → memory update → image injection → sub-agent concurrency cap → clarification intercept.
Concurrency: up to 3 sub-agents in parallel (configurable), 15-minute default timeout each, structured SSE event stream (task_started / task_running / task_completed / task_failed).
Mapping to Timmy personas: DeerFlow's lead/sub-agent split roughly maps to Timmy's orchestrator + specialist-agent pattern. DeerFlow doesn't have named personas — it routes by capability (tools available to the agent type), not by identity. Timmy's persona system is richer and more opinionated.
2. API Surface
DeerFlow exposes a full REST API at port 2026 (via Nginx). No authentication by default.
Core integration endpoints:
| Endpoint | Method | Purpose |
|---|---|---|
POST /api/langgraph/threads |
Create conversation thread | |
POST /api/langgraph/threads/{id}/runs |
Submit task (blocking) | |
POST /api/langgraph/threads/{id}/runs/stream |
Submit task (streaming SSE/WS) | |
GET /api/langgraph/threads/{id}/state |
Get full thread state + artifacts | |
GET /api/models |
List configured models | |
GET /api/threads/{id}/artifacts/{path} |
Download generated artifacts | |
DELETE /api/threads/{id} |
Clean up thread data |
These are callable from Timmy with httpx — no special client library needed.
3. LLM Backend Support
DeerFlow uses LangChain model classes declared in config.yaml.
Documented providers: OpenAI, Anthropic, Google Gemini, DeepSeek, Doubao (ByteDance), Kimi/Moonshot, OpenRouter, MiniMax, Novita AI, Claude Code (OAuth).
Ollama: Not in official documentation, but works via the langchain_openai:ChatOpenAI class with base_url: http://localhost:11434/v1 and a dummy API key. Community-confirmed (GitHub issues #37, #1004) with Qwen2.5, Llama 3.1, and DeepSeek-R1.
vLLM: Not documented, but architecturally identical — vLLM exposes an OpenAI-compatible endpoint. Should work with the same base_url override.
Practical caveat: The lead agent requires strong instruction-following for consistent tool use and structured output. Community findings suggest ≥14B parameter models (Qwen2.5-14B minimum) for reliable orchestration. Our current qwen3:14b should be viable.
4. License
MIT License — Copyright 2025 ByteDance Ltd. and DeerFlow Authors 2025–2026.
Permissive: use, modify, distribute, commercialize freely. Attribution required. No warranty.
Compatible with Timmy's use case. No CLA, no copyleft, no commercial restrictions.
5. Docker Port Conflicts
DeerFlow's Docker Compose exposes a single host port:
| Service | Host Port | Notes |
|---|---|---|
| Nginx (entry point) | 2026 (configurable via PORT) |
Only externally exposed port |
| Frontend (Next.js) | 3000 | Internal only |
| Gateway API | 8001 | Internal only |
| LangGraph runtime | 2024 | Internal only |
| Provisioner (optional) | 8002 | Internal only, Kubernetes mode only |
Timmy's existing Docker Compose exposes:
- 8000 — dashboard (FastAPI)
- 8080 — openfang (via
openfangprofile) - 11434 — Ollama (host process, not containerized)
No conflict. Port 2026 is not used by Timmy. DeerFlow can run alongside the existing stack without modification.
Full Capability Comparison
| Capability | DeerFlow | Timmy (research.py) |
|---|---|---|
| Multi-agent fan-out | ✅ 3 concurrent sub-agents | ❌ Sequential only |
| Web search | ✅ Tavily / InfoQuest | ✅ research_tools.py |
| Web fetch | ✅ Jina AI / Firecrawl | ✅ trafilatura |
| Code execution (sandbox) | ✅ Local / Docker / K8s | ❌ Not implemented |
| Artifact generation | ✅ HTML, Markdown, slides | ❌ Markdown report only |
| Document upload + conversion | ✅ PDF, PPT, Excel, Word | ❌ Not implemented |
| Long-term memory | ✅ LLM-extracted facts, persistent | ✅ SQLite semantic cache |
| Streaming results | ✅ SSE + WebSocket | ❌ Blocking call |
| Web UI | ✅ Next.js included | ✅ Jinja2/HTMX dashboard |
| IM integration | ✅ Telegram, Slack, Feishu | ✅ Telegram, Discord |
| Ollama backend | ✅ (via config, community-confirmed) | ✅ Native |
| Persona system | ❌ Role-based only | ✅ Named personas |
| Semantic cache tier | ❌ Not implemented | ✅ SQLite (Tier 4) |
| Free-tier cascade | ❌ Not applicable | 🔲 Planned (Groq, #980) |
| Python version requirement | 3.12+ | 3.11+ |
| Lock-in | LangGraph + LangChain | None |
Integration Options Assessment
Option A — Full Adoption (replace research.py)
Verdict: Not recommended.
DeerFlow is a substantial full-stack system (Python + Node.js, Docker, Nginx, LangGraph). Adopting it fully would:
- Replace Timmy's custom cascade tier system (SQLite cache → Ollama → Claude API → Groq) with a single-tier LangChain model config
- Lose Timmy's persona-aware research routing
- Add Python 3.12+ dependency (Timmy currently targets 3.11+)
- Introduce LangGraph/LangChain lock-in for all research tasks
- Require running a parallel Node.js frontend process (redundant given Timmy's own UI)
Option B — Sidecar for Heavy Research (call DeerFlow's API from Timmy)
Verdict: Viable but over-engineered for current needs.
DeerFlow could run as an optional sidecar (docker compose --profile deerflow up) and Timmy could delegate multi-agent research tasks via POST /api/langgraph/threads/{id}/runs. This would unlock parallel sub-agent fan-out and code-execution sandboxing without replacing Timmy's stack.
The integration would be ~50 lines of httpx code in a new DeerFlowClient adapter. The ResearchOrchestrator in research.py could route tasks above a complexity threshold to DeerFlow.
Barrier: DeerFlow's lack of default authentication means the sidecar would need to be network-isolated (internal Docker network only) or firewalled. Also, DeerFlow's Ollama integration is community-maintained, not officially supported — risk of breaking on upstream updates.
Option C — Selective Borrowing (copy patterns, not code)
Verdict: Recommended.
DeerFlow's architecture reveals concrete gaps in Timmy's current pipeline that are worth addressing independently:
| DeerFlow Pattern | Timmy Gap to Close | Implementation Path |
|---|---|---|
| Parallel sub-agent fan-out | Research is sequential | Add asyncio.gather() to ResearchOrchestrator for concurrent query execution |
SummarizationMiddleware |
Long contexts blow token budget | Add a context-trimming step in the synthesis cascade |
TodoListMiddleware |
No progress tracking during long research | Wire into the dashboard task panel |
| Artifact storage + serving | Reports are ephemeral (not persistently downloadable) | Add file-based artifact store to research.py (issue #976 already planned) |
| Skill modules (Markdown-based) | Research templates are .md files — same pattern |
Already done in skills/research/ |
| MCP integration | Research tools are hard-coded | Add MCP server discovery to research_tools.py for pluggable tool backends |
Recommendation
No-go for full adoption or sidecar deployment at this stage.
Timmy's ResearchOrchestrator already covers the core pipeline (query → search → fetch → synthesize → store). DeerFlow's value proposition is primarily the parallel sub-agent fan-out and code-execution sandbox — capabilities that are useful but not blocking Timmy's current roadmap.
Recommended actions:
-
Close the parallelism gap (high value, low effort): Refactor
ResearchOrchestratorto execute queries concurrently withasyncio.gather(). This delivers DeerFlow's most impactful capability without any new dependencies. -
Re-evaluate after #980 and #981 are done: Once Timmy has the Groq free-tier cascade and a sovereignty metrics dashboard, we'll have a clearer picture of whether the custom orchestrator is performing well enough to make DeerFlow unnecessary entirely.
-
File a follow-up for MCP tool integration: DeerFlow's use of
langchain-mcp-adaptersfor pluggable tool backends is the most architecturally interesting pattern. Adding MCP server discovery toresearch_tools.pywould give Timmy the same extensibility without LangGraph lock-in. -
Revisit DeerFlow's code-execution sandbox if #978 (Paperclip task runner) proves insufficient: DeerFlow's sandboxed
bashtool is production-tested and well-isolated. If Timmy's task runner needs secure code execution, DeerFlow's sandbox implementation is worth borrowing or wrapping.
Follow-up Issues to File
| Issue | Title | Priority |
|---|---|---|
| New | Parallelize ResearchOrchestrator query execution (asyncio.gather) |
Medium |
| New | Add context-trimming step to synthesis cascade | Low |
| New | MCP server discovery in research_tools.py |
Low |
| #976 | Semantic index for research outputs (already planned) | High |