From 87cd5fcc6fab8ab894e5bebe79c9cf39fcba8c6b Mon Sep 17 00:00:00 2001 From: Alexander Whitestone Date: Mon, 23 Mar 2026 21:55:20 -0400 Subject: [PATCH] docs: add DeerFlow evaluation research note Evaluates bytedance/deer-flow as an autonomous research orchestration layer for Timmy (issue #1283). Covers agent architecture, API surface, Ollama/vLLM backend compatibility, MIT license, Docker port analysis (no conflict with existing stack on port 2026), and full capability comparison against ResearchOrchestrator. Recommendation: No-go for full adoption. Selective borrowing recommended: parallelize ResearchOrchestrator with asyncio.gather, add context trimming, and revisit MCP tool integration as a follow-up. Refs #1283 Co-Authored-By: Claude Sonnet 4.6 --- docs/research/deerflow-evaluation.md | 190 +++++++++++++++++++++++++++ 1 file changed, 190 insertions(+) create mode 100644 docs/research/deerflow-evaluation.md diff --git a/docs/research/deerflow-evaluation.md b/docs/research/deerflow-evaluation.md new file mode 100644 index 00000000..af5097da --- /dev/null +++ b/docs/research/deerflow-evaluation.md @@ -0,0 +1,190 @@ +# DeerFlow Evaluation — Autonomous Research Orchestration Layer + +**Status:** No-go for full adoption · Selective borrowing recommended +**Date:** 2026-03-23 +**Issue:** #1283 (spawned from #1275 screenshot triage) +**Refs:** #972 (Timmy research pipeline) · #975 (ResearchOrchestrator) + +--- + +## What Is DeerFlow? + +DeerFlow (`bytedance/deer-flow`) is an open-source "super-agent harness" built by ByteDance on top of LangGraph. It provides a production-grade multi-agent research and code-execution framework with a web UI, REST API, Docker deployment, and optional IM channel integration (Telegram, Slack, Feishu/Lark). + +- **Stars:** ~39,600 · **License:** MIT +- **Stack:** Python 3.12+ (backend) · TypeScript/Next.js (frontend) · LangGraph runtime +- **Entry point:** `http://localhost:2026` (Nginx reverse proxy, configurable via `PORT`) + +--- + +## Research Questions — Answers + +### 1. Agent Roles + +DeerFlow uses a two-tier architecture: + +| Role | Description | +|------|-------------| +| **Lead Agent** | Entry point; decomposes tasks, dispatches sub-agents, synthesizes results | +| **Sub-Agent (general-purpose)** | All tools except `task`; spawned dynamically | +| **Sub-Agent (bash)** | Command-execution specialist | + +The lead agent runs through a 12-middleware chain in order: thread setup → uploads → sandbox → tool-call repair → guardrails → summarization → todo tracking → title generation → memory update → image injection → sub-agent concurrency cap → clarification intercept. + +**Concurrency:** up to 3 sub-agents in parallel (configurable), 15-minute default timeout each, structured SSE event stream (`task_started` / `task_running` / `task_completed` / `task_failed`). + +**Mapping to Timmy personas:** DeerFlow's lead/sub-agent split roughly maps to Timmy's orchestrator + specialist-agent pattern. DeerFlow doesn't have named personas — it routes by capability (tools available to the agent type), not by identity. Timmy's persona system is richer and more opinionated. + +--- + +### 2. API Surface + +DeerFlow exposes a full REST API at port 2026 (via Nginx). **No authentication by default.** + +**Core integration endpoints:** + +| Endpoint | Method | Purpose | +|----------|--------|---------| +| `POST /api/langgraph/threads` | | Create conversation thread | +| `POST /api/langgraph/threads/{id}/runs` | | Submit task (blocking) | +| `POST /api/langgraph/threads/{id}/runs/stream` | | Submit task (streaming SSE/WS) | +| `GET /api/langgraph/threads/{id}/state` | | Get full thread state + artifacts | +| `GET /api/models` | | List configured models | +| `GET /api/threads/{id}/artifacts/{path}` | | Download generated artifacts | +| `DELETE /api/threads/{id}` | | Clean up thread data | + +These are callable from Timmy with `httpx` — no special client library needed. + +--- + +### 3. LLM Backend Support + +DeerFlow uses LangChain model classes declared in `config.yaml`. + +**Documented providers:** OpenAI, Anthropic, Google Gemini, DeepSeek, Doubao (ByteDance), Kimi/Moonshot, OpenRouter, MiniMax, Novita AI, Claude Code (OAuth). + +**Ollama:** Not in official documentation, but works via the `langchain_openai:ChatOpenAI` class with `base_url: http://localhost:11434/v1` and a dummy API key. Community-confirmed (GitHub issues #37, #1004) with Qwen2.5, Llama 3.1, and DeepSeek-R1. + +**vLLM:** Not documented, but architecturally identical — vLLM exposes an OpenAI-compatible endpoint. Should work with the same `base_url` override. + +**Practical caveat:** The lead agent requires strong instruction-following for consistent tool use and structured output. Community findings suggest ≥14B parameter models (Qwen2.5-14B minimum) for reliable orchestration. Our current `qwen3:14b` should be viable. + +--- + +### 4. License + +**MIT License** — Copyright 2025 ByteDance Ltd. and DeerFlow Authors 2025–2026. + +Permissive: use, modify, distribute, commercialize freely. Attribution required. No warranty. + +**Compatible with Timmy's use case.** No CLA, no copyleft, no commercial restrictions. + +--- + +### 5. Docker Port Conflicts + +DeerFlow's Docker Compose exposes a single host port: + +| Service | Host Port | Notes | +|---------|-----------|-------| +| Nginx (entry point) | **2026** (configurable via `PORT`) | Only externally exposed port | +| Frontend (Next.js) | 3000 | Internal only | +| Gateway API | 8001 | Internal only | +| LangGraph runtime | 2024 | Internal only | +| Provisioner (optional) | 8002 | Internal only, Kubernetes mode only | + +Timmy's existing Docker Compose exposes: +- **8000** — dashboard (FastAPI) +- **8080** — openfang (via `openfang` profile) +- **11434** — Ollama (host process, not containerized) + +**No conflict.** Port 2026 is not used by Timmy. DeerFlow can run alongside the existing stack without modification. + +--- + +## Full Capability Comparison + +| Capability | DeerFlow | Timmy (`research.py`) | +|------------|----------|-----------------------| +| Multi-agent fan-out | ✅ 3 concurrent sub-agents | ❌ Sequential only | +| Web search | ✅ Tavily / InfoQuest | ✅ `research_tools.py` | +| Web fetch | ✅ Jina AI / Firecrawl | ✅ trafilatura | +| Code execution (sandbox) | ✅ Local / Docker / K8s | ❌ Not implemented | +| Artifact generation | ✅ HTML, Markdown, slides | ❌ Markdown report only | +| Document upload + conversion | ✅ PDF, PPT, Excel, Word | ❌ Not implemented | +| Long-term memory | ✅ LLM-extracted facts, persistent | ✅ SQLite semantic cache | +| Streaming results | ✅ SSE + WebSocket | ❌ Blocking call | +| Web UI | ✅ Next.js included | ✅ Jinja2/HTMX dashboard | +| IM integration | ✅ Telegram, Slack, Feishu | ✅ Telegram, Discord | +| Ollama backend | ✅ (via config, community-confirmed) | ✅ Native | +| Persona system | ❌ Role-based only | ✅ Named personas | +| Semantic cache tier | ❌ Not implemented | ✅ SQLite (Tier 4) | +| Free-tier cascade | ❌ Not applicable | 🔲 Planned (Groq, #980) | +| Python version requirement | 3.12+ | 3.11+ | +| Lock-in | LangGraph + LangChain | None | + +--- + +## Integration Options Assessment + +### Option A — Full Adoption (replace `research.py`) +**Verdict: Not recommended.** + +DeerFlow is a substantial full-stack system (Python + Node.js, Docker, Nginx, LangGraph). Adopting it fully would: +- Replace Timmy's custom cascade tier system (SQLite cache → Ollama → Claude API → Groq) with a single-tier LangChain model config +- Lose Timmy's persona-aware research routing +- Add Python 3.12+ dependency (Timmy currently targets 3.11+) +- Introduce LangGraph/LangChain lock-in for all research tasks +- Require running a parallel Node.js frontend process (redundant given Timmy's own UI) + +### Option B — Sidecar for Heavy Research (call DeerFlow's API from Timmy) +**Verdict: Viable but over-engineered for current needs.** + +DeerFlow could run as an optional sidecar (`docker compose --profile deerflow up`) and Timmy could delegate multi-agent research tasks via `POST /api/langgraph/threads/{id}/runs`. This would unlock parallel sub-agent fan-out and code-execution sandboxing without replacing Timmy's stack. + +The integration would be ~50 lines of `httpx` code in a new `DeerFlowClient` adapter. The `ResearchOrchestrator` in `research.py` could route tasks above a complexity threshold to DeerFlow. + +**Barrier:** DeerFlow's lack of default authentication means the sidecar would need to be network-isolated (internal Docker network only) or firewalled. Also, DeerFlow's Ollama integration is community-maintained, not officially supported — risk of breaking on upstream updates. + +### Option C — Selective Borrowing (copy patterns, not code) +**Verdict: Recommended.** + +DeerFlow's architecture reveals concrete gaps in Timmy's current pipeline that are worth addressing independently: + +| DeerFlow Pattern | Timmy Gap to Close | Implementation Path | +|------------------|--------------------|---------------------| +| Parallel sub-agent fan-out | Research is sequential | Add `asyncio.gather()` to `ResearchOrchestrator` for concurrent query execution | +| `SummarizationMiddleware` | Long contexts blow token budget | Add a context-trimming step in the synthesis cascade | +| `TodoListMiddleware` | No progress tracking during long research | Wire into the dashboard task panel | +| Artifact storage + serving | Reports are ephemeral (not persistently downloadable) | Add file-based artifact store to `research.py` (issue #976 already planned) | +| Skill modules (Markdown-based) | Research templates are `.md` files — same pattern | Already done in `skills/research/` | +| MCP integration | Research tools are hard-coded | Add MCP server discovery to `research_tools.py` for pluggable tool backends | + +--- + +## Recommendation + +**No-go for full adoption or sidecar deployment at this stage.** + +Timmy's `ResearchOrchestrator` already covers the core pipeline (query → search → fetch → synthesize → store). DeerFlow's value proposition is primarily the parallel sub-agent fan-out and code-execution sandbox — capabilities that are useful but not blocking Timmy's current roadmap. + +**Recommended actions:** + +1. **Close the parallelism gap (high value, low effort):** Refactor `ResearchOrchestrator` to execute queries concurrently with `asyncio.gather()`. This delivers DeerFlow's most impactful capability without any new dependencies. + +2. **Re-evaluate after #980 and #981 are done:** Once Timmy has the Groq free-tier cascade and a sovereignty metrics dashboard, we'll have a clearer picture of whether the custom orchestrator is performing well enough to make DeerFlow unnecessary entirely. + +3. **File a follow-up for MCP tool integration:** DeerFlow's use of `langchain-mcp-adapters` for pluggable tool backends is the most architecturally interesting pattern. Adding MCP server discovery to `research_tools.py` would give Timmy the same extensibility without LangGraph lock-in. + +4. **Revisit DeerFlow's code-execution sandbox if #978 (Paperclip task runner) proves insufficient:** DeerFlow's sandboxed `bash` tool is production-tested and well-isolated. If Timmy's task runner needs secure code execution, DeerFlow's sandbox implementation is worth borrowing or wrapping. + +--- + +## Follow-up Issues to File + +| Issue | Title | Priority | +|-------|-------|----------| +| New | Parallelize ResearchOrchestrator query execution (`asyncio.gather`) | Medium | +| New | Add context-trimming step to synthesis cascade | Low | +| New | MCP server discovery in `research_tools.py` | Low | +| #976 | Semantic index for research outputs (already planned) | High | -- 2.43.0