Archived

forked from Rockachopa/Timmy-time-dashboard

This repository has been archived on 2026-03-24. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

Claude (Opus 4.6) de7744916c [claude] DeerFlow evaluation research note (#1283 ) (#1305 )

2026-03-24 01:56:37 +00:00

10 KiB

Raw Blame History

DeerFlow Evaluation — Autonomous Research Orchestration Layer

Status: No-go for full adoption · Selective borrowing recommended Date: 2026-03-23 Issue: #1283 (spawned from #1275 screenshot triage) Refs: #972 (Timmy research pipeline) · #975 (ResearchOrchestrator)

What Is DeerFlow?

DeerFlow (bytedance/deer-flow) is an open-source "super-agent harness" built by ByteDance on top of LangGraph. It provides a production-grade multi-agent research and code-execution framework with a web UI, REST API, Docker deployment, and optional IM channel integration (Telegram, Slack, Feishu/Lark).

Stars: ~39,600 · License: MIT
Stack: Python 3.12+ (backend) · TypeScript/Next.js (frontend) · LangGraph runtime
Entry point: http://localhost:2026 (Nginx reverse proxy, configurable via PORT)

Research Questions — Answers

1. Agent Roles

DeerFlow uses a two-tier architecture:

Role	Description
Lead Agent	Entry point; decomposes tasks, dispatches sub-agents, synthesizes results
Sub-Agent (general-purpose)	All tools except `task`; spawned dynamically
Sub-Agent (bash)	Command-execution specialist

The lead agent runs through a 12-middleware chain in order: thread setup → uploads → sandbox → tool-call repair → guardrails → summarization → todo tracking → title generation → memory update → image injection → sub-agent concurrency cap → clarification intercept.

Concurrency: up to 3 sub-agents in parallel (configurable), 15-minute default timeout each, structured SSE event stream (task_started / task_running / task_completed / task_failed).

Mapping to Timmy personas: DeerFlow's lead/sub-agent split roughly maps to Timmy's orchestrator + specialist-agent pattern. DeerFlow doesn't have named personas — it routes by capability (tools available to the agent type), not by identity. Timmy's persona system is richer and more opinionated.

2. API Surface

DeerFlow exposes a full REST API at port 2026 (via Nginx). No authentication by default.

Core integration endpoints:

Endpoint	Method	Purpose
`POST /api/langgraph/threads`		Create conversation thread
`POST /api/langgraph/threads/{id}/runs`		Submit task (blocking)
`POST /api/langgraph/threads/{id}/runs/stream`		Submit task (streaming SSE/WS)
`GET /api/langgraph/threads/{id}/state`		Get full thread state + artifacts
`GET /api/models`		List configured models
`GET /api/threads/{id}/artifacts/{path}`		Download generated artifacts
`DELETE /api/threads/{id}`		Clean up thread data

These are callable from Timmy with httpx — no special client library needed.

3. LLM Backend Support

DeerFlow uses LangChain model classes declared in config.yaml.

Documented providers: OpenAI, Anthropic, Google Gemini, DeepSeek, Doubao (ByteDance), Kimi/Moonshot, OpenRouter, MiniMax, Novita AI, Claude Code (OAuth).

Ollama: Not in official documentation, but works via the langchain_openai:ChatOpenAI class with base_url: http://localhost:11434/v1 and a dummy API key. Community-confirmed (GitHub issues #37, #1004) with Qwen2.5, Llama 3.1, and DeepSeek-R1.

vLLM: Not documented, but architecturally identical — vLLM exposes an OpenAI-compatible endpoint. Should work with the same base_url override.

Practical caveat: The lead agent requires strong instruction-following for consistent tool use and structured output. Community findings suggest ≥14B parameter models (Qwen2.5-14B minimum) for reliable orchestration. Our current qwen3:14b should be viable.

4. License

Permissive: use, modify, distribute, commercialize freely. Attribution required. No warranty.

Compatible with Timmy's use case. No CLA, no copyleft, no commercial restrictions.

5. Docker Port Conflicts

DeerFlow's Docker Compose exposes a single host port:

Service	Host Port	Notes
Nginx (entry point)	2026 (configurable via `PORT`)	Only externally exposed port
Frontend (Next.js)	3000	Internal only
Gateway API	8001	Internal only
LangGraph runtime	2024	Internal only
Provisioner (optional)	8002	Internal only, Kubernetes mode only

Timmy's existing Docker Compose exposes:

8000 — dashboard (FastAPI)
8080 — openfang (via openfang profile)
11434 — Ollama (host process, not containerized)

No conflict. Port 2026 is not used by Timmy. DeerFlow can run alongside the existing stack without modification.

Full Capability Comparison

Capability	DeerFlow	Timmy (`research.py`)
Multi-agent fan-out	✅ 3 concurrent sub-agents	❌ Sequential only
Web search	✅ Tavily / InfoQuest	✅ `research_tools.py`
Web fetch	✅ Jina AI / Firecrawl	✅ trafilatura
Code execution (sandbox)	✅ Local / Docker / K8s	❌ Not implemented
Artifact generation	✅ HTML, Markdown, slides	❌ Markdown report only
Document upload + conversion	✅ PDF, PPT, Excel, Word	❌ Not implemented
Long-term memory	✅ LLM-extracted facts, persistent	✅ SQLite semantic cache
Streaming results	✅ SSE + WebSocket	❌ Blocking call
Web UI	✅ Next.js included	✅ Jinja2/HTMX dashboard
IM integration	✅ Telegram, Slack, Feishu	✅ Telegram, Discord
Ollama backend	✅ (via config, community-confirmed)	✅ Native
Persona system	❌ Role-based only	✅ Named personas
Semantic cache tier	❌ Not implemented	✅ SQLite (Tier 4)
Free-tier cascade	❌ Not applicable	🔲 Planned (Groq, #980)
Python version requirement	3.12+	3.11+
Lock-in	LangGraph + LangChain	None

Integration Options Assessment

Option A — Full Adoption (replace `research.py`)

Verdict: Not recommended.

DeerFlow is a substantial full-stack system (Python + Node.js, Docker, Nginx, LangGraph). Adopting it fully would:

Replace Timmy's custom cascade tier system (SQLite cache → Ollama → Claude API → Groq) with a single-tier LangChain model config
Lose Timmy's persona-aware research routing
Add Python 3.12+ dependency (Timmy currently targets 3.11+)
Introduce LangGraph/LangChain lock-in for all research tasks
Require running a parallel Node.js frontend process (redundant given Timmy's own UI)

Option B — Sidecar for Heavy Research (call DeerFlow's API from Timmy)

Verdict: Viable but over-engineered for current needs.

DeerFlow could run as an optional sidecar (docker compose --profile deerflow up) and Timmy could delegate multi-agent research tasks via POST /api/langgraph/threads/{id}/runs. This would unlock parallel sub-agent fan-out and code-execution sandboxing without replacing Timmy's stack.

The integration would be ~50 lines of httpx code in a new DeerFlowClient adapter. The ResearchOrchestrator in research.py could route tasks above a complexity threshold to DeerFlow.

Barrier: DeerFlow's lack of default authentication means the sidecar would need to be network-isolated (internal Docker network only) or firewalled. Also, DeerFlow's Ollama integration is community-maintained, not officially supported — risk of breaking on upstream updates.

Option C — Selective Borrowing (copy patterns, not code)

Verdict: Recommended.

DeerFlow's architecture reveals concrete gaps in Timmy's current pipeline that are worth addressing independently:

DeerFlow Pattern	Timmy Gap to Close	Implementation Path
Parallel sub-agent fan-out	Research is sequential	Add `asyncio.gather()` to `ResearchOrchestrator` for concurrent query execution
`SummarizationMiddleware`	Long contexts blow token budget	Add a context-trimming step in the synthesis cascade
`TodoListMiddleware`	No progress tracking during long research	Wire into the dashboard task panel
Artifact storage + serving	Reports are ephemeral (not persistently downloadable)	Add file-based artifact store to `research.py` (issue #976 already planned)
Skill modules (Markdown-based)	Research templates are `.md` files — same pattern	Already done in `skills/research/`
MCP integration	Research tools are hard-coded	Add MCP server discovery to `research_tools.py` for pluggable tool backends

Recommendation

No-go for full adoption or sidecar deployment at this stage.

Timmy's ResearchOrchestrator already covers the core pipeline (query → search → fetch → synthesize → store). DeerFlow's value proposition is primarily the parallel sub-agent fan-out and code-execution sandbox — capabilities that are useful but not blocking Timmy's current roadmap.

Recommended actions:

Close the parallelism gap (high value, low effort): Refactor ResearchOrchestrator to execute queries concurrently with asyncio.gather(). This delivers DeerFlow's most impactful capability without any new dependencies.
Re-evaluate after #980 and #981 are done: Once Timmy has the Groq free-tier cascade and a sovereignty metrics dashboard, we'll have a clearer picture of whether the custom orchestrator is performing well enough to make DeerFlow unnecessary entirely.
File a follow-up for MCP tool integration: DeerFlow's use of langchain-mcp-adapters for pluggable tool backends is the most architecturally interesting pattern. Adding MCP server discovery to research_tools.py would give Timmy the same extensibility without LangGraph lock-in.
Revisit DeerFlow's code-execution sandbox if #978 (Paperclip task runner) proves insufficient: DeerFlow's sandboxed bash tool is production-tested and well-isolated. If Timmy's task runner needs secure code execution, DeerFlow's sandbox implementation is worth borrowing or wrapping.

Follow-up Issues to File

Issue	Title	Priority
New	Parallelize ResearchOrchestrator query execution (`asyncio.gather`)	Medium
New	Add context-trimming step to synthesis cascade	Low
New	MCP server discovery in `research_tools.py`	Low
#976	Semantic index for research outputs (already planned)	High

10 KiB Raw Blame History Unescape Escape