feat: integrate SearXNG + Crawl4AI as self-hosted search backend for research agents #1282

Closed
opened 2026-03-24 01:43:51 +00:00 by claude · 1 comment
Collaborator

Context

From screenshot triage (issue #1275).

A self-hosted MCP server combining SearXNG (meta-search engine) and Crawl4AI (web scraper) provides search and scraping capabilities without a paid API key. Community reports 3x faster than native tools with 100% scraping reliability.

Repo: https://github.com/luxiaolei/searxng-crawl4ai-mcp

Problem

Timmy’s research pipeline agents currently have no search capability unless an external paid API (e.g. Brave Search, Tavily) is configured. This is a hard dependency that blocks fully local/private operation.

Proposed Solution

Add SearXNG + Crawl4AI as the default search backend for research agents:

  1. Docker Compose — add SearXNG and Crawl4AI services to the local stack
  2. Search tool — implement a web_search(query) tool in timmy/ that calls SearXNG
  3. Scrape tool — implement a scrape_url(url) tool that calls Crawl4AI
  4. ConfigTIMMY_SEARCH_BACKEND=searxng (default), TIMMY_SEARCH_URL setting
  5. Graceful degradation — log warning and return empty results if SearXNG unavailable

Implementation Plan

  • Add SearXNG service to docker-compose.yml (image: searxng/searxng)
  • Add Crawl4AI service to docker-compose.yml
  • Add search_url and crawl_url settings to config.py
  • Create timmy/tools/search.py with web_search() and scrape_url() functions
  • Register tools with the agent in timmy/agent.py
  • Unit tests with mocked HTTP responses
  • Update AGENTS.md to document the new search capability

Acceptance Criteria

  • Research agents can search the web without any paid API key
  • TIMMY_SEARCH_BACKEND=none disables search gracefully
  • tox -e unit passes
  • Docker Compose make docker-start brings up SearXNG and Crawl4AI
## Context From screenshot triage (issue #1275). A self-hosted MCP server combining SearXNG (meta-search engine) and Crawl4AI (web scraper) provides search and scraping capabilities without a paid API key. Community reports 3x faster than native tools with 100% scraping reliability. **Repo:** https://github.com/luxiaolei/searxng-crawl4ai-mcp ## Problem Timmy’s research pipeline agents currently have no search capability unless an external paid API (e.g. Brave Search, Tavily) is configured. This is a hard dependency that blocks fully local/private operation. ## Proposed Solution Add SearXNG + Crawl4AI as the default search backend for research agents: 1. **Docker Compose** — add SearXNG and Crawl4AI services to the local stack 2. **Search tool** — implement a `web_search(query)` tool in `timmy/` that calls SearXNG 3. **Scrape tool** — implement a `scrape_url(url)` tool that calls Crawl4AI 4. **Config** — `TIMMY_SEARCH_BACKEND=searxng` (default), `TIMMY_SEARCH_URL` setting 5. **Graceful degradation** — log warning and return empty results if SearXNG unavailable ## Implementation Plan - [ ] Add SearXNG service to `docker-compose.yml` (image: `searxng/searxng`) - [ ] Add Crawl4AI service to `docker-compose.yml` - [ ] Add `search_url` and `crawl_url` settings to `config.py` - [ ] Create `timmy/tools/search.py` with `web_search()` and `scrape_url()` functions - [ ] Register tools with the agent in `timmy/agent.py` - [ ] Unit tests with mocked HTTP responses - [ ] Update AGENTS.md to document the new search capability ## Acceptance Criteria - Research agents can search the web without any paid API key - `TIMMY_SEARCH_BACKEND=none` disables search gracefully - `tox -e unit` passes - Docker Compose `make docker-start` brings up SearXNG and Crawl4AI
claude self-assigned this 2026-03-24 01:43:55 +00:00
Author
Collaborator

PR created: http://143.198.27.163:3000/Rockachopa/Timmy-time-dashboard/pulls/1299

What was implemented:

  • timmy/tools/search.py — new module with web_search() (SearXNG) and scrape_url() (Crawl4AI)
  • Config: TIMMY_SEARCH_BACKEND (searxng/none), TIMMY_SEARCH_URL, TIMMY_CRAWL_URL
  • Both tools in orchestrator + echo (research) toolkits
  • Docker Compose --profile search adds SearXNG (port 8888) + Crawl4AI (port 11235)
  • 21 unit tests; full suite passes (tox -e unit — 500 tests green)
  • AGENTS.md documents the new Search Capability section

Graceful degradation: TIMMY_SEARCH_BACKEND=none disables both tools; unreachable services log WARNING and return descriptive error strings — app never crashes.

PR created: http://143.198.27.163:3000/Rockachopa/Timmy-time-dashboard/pulls/1299 **What was implemented:** - `timmy/tools/search.py` — new module with `web_search()` (SearXNG) and `scrape_url()` (Crawl4AI) - Config: `TIMMY_SEARCH_BACKEND` (searxng/none), `TIMMY_SEARCH_URL`, `TIMMY_CRAWL_URL` - Both tools in orchestrator + echo (research) toolkits - Docker Compose `--profile search` adds SearXNG (port 8888) + Crawl4AI (port 11235) - 21 unit tests; full suite passes (`tox -e unit` — 500 tests green) - AGENTS.md documents the new Search Capability section Graceful degradation: `TIMMY_SEARCH_BACKEND=none` disables both tools; unreachable services log WARNING and return descriptive error strings — app never crashes.
claude was unassigned by Timmy 2026-03-24 01:56:14 +00:00
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1282