This repository has been archived on 2026-03-24. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
Timmy-time-dashboard/docs/research/deerflow-evaluation.md

10 KiB
Raw Blame History

DeerFlow Evaluation — Autonomous Research Orchestration Layer

Status: No-go for full adoption · Selective borrowing recommended Date: 2026-03-23 Issue: #1283 (spawned from #1275 screenshot triage) Refs: #972 (Timmy research pipeline) · #975 (ResearchOrchestrator)


What Is DeerFlow?

DeerFlow (bytedance/deer-flow) is an open-source "super-agent harness" built by ByteDance on top of LangGraph. It provides a production-grade multi-agent research and code-execution framework with a web UI, REST API, Docker deployment, and optional IM channel integration (Telegram, Slack, Feishu/Lark).

  • Stars: ~39,600 · License: MIT
  • Stack: Python 3.12+ (backend) · TypeScript/Next.js (frontend) · LangGraph runtime
  • Entry point: http://localhost:2026 (Nginx reverse proxy, configurable via PORT)

Research Questions — Answers

1. Agent Roles

DeerFlow uses a two-tier architecture:

Role Description
Lead Agent Entry point; decomposes tasks, dispatches sub-agents, synthesizes results
Sub-Agent (general-purpose) All tools except task; spawned dynamically
Sub-Agent (bash) Command-execution specialist

The lead agent runs through a 12-middleware chain in order: thread setup → uploads → sandbox → tool-call repair → guardrails → summarization → todo tracking → title generation → memory update → image injection → sub-agent concurrency cap → clarification intercept.

Concurrency: up to 3 sub-agents in parallel (configurable), 15-minute default timeout each, structured SSE event stream (task_started / task_running / task_completed / task_failed).

Mapping to Timmy personas: DeerFlow's lead/sub-agent split roughly maps to Timmy's orchestrator + specialist-agent pattern. DeerFlow doesn't have named personas — it routes by capability (tools available to the agent type), not by identity. Timmy's persona system is richer and more opinionated.


2. API Surface

DeerFlow exposes a full REST API at port 2026 (via Nginx). No authentication by default.

Core integration endpoints:

Endpoint Method Purpose
POST /api/langgraph/threads Create conversation thread
POST /api/langgraph/threads/{id}/runs Submit task (blocking)
POST /api/langgraph/threads/{id}/runs/stream Submit task (streaming SSE/WS)
GET /api/langgraph/threads/{id}/state Get full thread state + artifacts
GET /api/models List configured models
GET /api/threads/{id}/artifacts/{path} Download generated artifacts
DELETE /api/threads/{id} Clean up thread data

These are callable from Timmy with httpx — no special client library needed.


3. LLM Backend Support

DeerFlow uses LangChain model classes declared in config.yaml.

Documented providers: OpenAI, Anthropic, Google Gemini, DeepSeek, Doubao (ByteDance), Kimi/Moonshot, OpenRouter, MiniMax, Novita AI, Claude Code (OAuth).

Ollama: Not in official documentation, but works via the langchain_openai:ChatOpenAI class with base_url: http://localhost:11434/v1 and a dummy API key. Community-confirmed (GitHub issues #37, #1004) with Qwen2.5, Llama 3.1, and DeepSeek-R1.

vLLM: Not documented, but architecturally identical — vLLM exposes an OpenAI-compatible endpoint. Should work with the same base_url override.

Practical caveat: The lead agent requires strong instruction-following for consistent tool use and structured output. Community findings suggest ≥14B parameter models (Qwen2.5-14B minimum) for reliable orchestration. Our current qwen3:14b should be viable.


4. License

MIT License — Copyright 2025 ByteDance Ltd. and DeerFlow Authors 20252026.

Permissive: use, modify, distribute, commercialize freely. Attribution required. No warranty.

Compatible with Timmy's use case. No CLA, no copyleft, no commercial restrictions.


5. Docker Port Conflicts

DeerFlow's Docker Compose exposes a single host port:

Service Host Port Notes
Nginx (entry point) 2026 (configurable via PORT) Only externally exposed port
Frontend (Next.js) 3000 Internal only
Gateway API 8001 Internal only
LangGraph runtime 2024 Internal only
Provisioner (optional) 8002 Internal only, Kubernetes mode only

Timmy's existing Docker Compose exposes:

  • 8000 — dashboard (FastAPI)
  • 8080 — openfang (via openfang profile)
  • 11434 — Ollama (host process, not containerized)

No conflict. Port 2026 is not used by Timmy. DeerFlow can run alongside the existing stack without modification.


Full Capability Comparison

Capability DeerFlow Timmy (research.py)
Multi-agent fan-out 3 concurrent sub-agents Sequential only
Web search Tavily / InfoQuest research_tools.py
Web fetch Jina AI / Firecrawl trafilatura
Code execution (sandbox) Local / Docker / K8s Not implemented
Artifact generation HTML, Markdown, slides Markdown report only
Document upload + conversion PDF, PPT, Excel, Word Not implemented
Long-term memory LLM-extracted facts, persistent SQLite semantic cache
Streaming results SSE + WebSocket Blocking call
Web UI Next.js included Jinja2/HTMX dashboard
IM integration Telegram, Slack, Feishu Telegram, Discord
Ollama backend (via config, community-confirmed) Native
Persona system Role-based only Named personas
Semantic cache tier Not implemented SQLite (Tier 4)
Free-tier cascade Not applicable 🔲 Planned (Groq, #980)
Python version requirement 3.12+ 3.11+
Lock-in LangGraph + LangChain None

Integration Options Assessment

Option A — Full Adoption (replace research.py)

Verdict: Not recommended.

DeerFlow is a substantial full-stack system (Python + Node.js, Docker, Nginx, LangGraph). Adopting it fully would:

  • Replace Timmy's custom cascade tier system (SQLite cache → Ollama → Claude API → Groq) with a single-tier LangChain model config
  • Lose Timmy's persona-aware research routing
  • Add Python 3.12+ dependency (Timmy currently targets 3.11+)
  • Introduce LangGraph/LangChain lock-in for all research tasks
  • Require running a parallel Node.js frontend process (redundant given Timmy's own UI)

Option B — Sidecar for Heavy Research (call DeerFlow's API from Timmy)

Verdict: Viable but over-engineered for current needs.

DeerFlow could run as an optional sidecar (docker compose --profile deerflow up) and Timmy could delegate multi-agent research tasks via POST /api/langgraph/threads/{id}/runs. This would unlock parallel sub-agent fan-out and code-execution sandboxing without replacing Timmy's stack.

The integration would be ~50 lines of httpx code in a new DeerFlowClient adapter. The ResearchOrchestrator in research.py could route tasks above a complexity threshold to DeerFlow.

Barrier: DeerFlow's lack of default authentication means the sidecar would need to be network-isolated (internal Docker network only) or firewalled. Also, DeerFlow's Ollama integration is community-maintained, not officially supported — risk of breaking on upstream updates.

Option C — Selective Borrowing (copy patterns, not code)

Verdict: Recommended.

DeerFlow's architecture reveals concrete gaps in Timmy's current pipeline that are worth addressing independently:

DeerFlow Pattern Timmy Gap to Close Implementation Path
Parallel sub-agent fan-out Research is sequential Add asyncio.gather() to ResearchOrchestrator for concurrent query execution
SummarizationMiddleware Long contexts blow token budget Add a context-trimming step in the synthesis cascade
TodoListMiddleware No progress tracking during long research Wire into the dashboard task panel
Artifact storage + serving Reports are ephemeral (not persistently downloadable) Add file-based artifact store to research.py (issue #976 already planned)
Skill modules (Markdown-based) Research templates are .md files — same pattern Already done in skills/research/
MCP integration Research tools are hard-coded Add MCP server discovery to research_tools.py for pluggable tool backends

Recommendation

No-go for full adoption or sidecar deployment at this stage.

Timmy's ResearchOrchestrator already covers the core pipeline (query → search → fetch → synthesize → store). DeerFlow's value proposition is primarily the parallel sub-agent fan-out and code-execution sandbox — capabilities that are useful but not blocking Timmy's current roadmap.

Recommended actions:

  1. Close the parallelism gap (high value, low effort): Refactor ResearchOrchestrator to execute queries concurrently with asyncio.gather(). This delivers DeerFlow's most impactful capability without any new dependencies.

  2. Re-evaluate after #980 and #981 are done: Once Timmy has the Groq free-tier cascade and a sovereignty metrics dashboard, we'll have a clearer picture of whether the custom orchestrator is performing well enough to make DeerFlow unnecessary entirely.

  3. File a follow-up for MCP tool integration: DeerFlow's use of langchain-mcp-adapters for pluggable tool backends is the most architecturally interesting pattern. Adding MCP server discovery to research_tools.py would give Timmy the same extensibility without LangGraph lock-in.

  4. Revisit DeerFlow's code-execution sandbox if #978 (Paperclip task runner) proves insufficient: DeerFlow's sandboxed bash tool is production-tested and well-isolated. If Timmy's task runner needs secure code execution, DeerFlow's sandbox implementation is worth borrowing or wrapping.


Follow-up Issues to File

Issue Title Priority
New Parallelize ResearchOrchestrator query execution (asyncio.gather) Medium
New Add context-trimming step to synthesis cascade Low
New MCP server discovery in research_tools.py Low
#976 Semantic index for research outputs (already planned) High