Research Pipeline Day 1: Deploy SearXNG + LanceDB + Crawl4AI #486
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Parent: #445 (research dump)
Objective
Stand up search and storage infrastructure for the autonomous research pipeline.
Tasks
docker run -d -p 8080:8080 searxng/searxng:latestpip install lancedb trafilatura httpx crawl4aiollama pull qwen3-embedding:0.6bVerification
curl http://VPS:8080/search?q=test&format=jsonreturns resultspython3 -c "import lancedb; print('ok')"python3 -c "import trafilatura; print('ok')"ollama run qwen3-embedding:0.6b "test"returns vectorRef: Wiring_the_Research_Pipeline.pdf attached to #445
🔧 Day 1 Progress Update
✅ Completed
1. SearXNG Deployed & Verified
2. Research Ingestion Summary Posted
🔲 Remaining Day 1 Tasks
Infrastructure Status
🔧 Day 1 Progress Update #2
✅ Newly Completed
3. Python Dependencies Installed & Verified
lancedb 0.30.1— installed ✅trafilatura 2.0.0— installed ✅crawl4ai 0.8.6— installed ✅httpx 0.28.1— installed ✅4. SearXNG Still Running
Up About an hour🔲 Remaining Day 1 Tasks
📋 Architecture Note
Ollama + embedding models should stay on the Mac (local-first, sovereign). The VPS serves as the relay/search infrastructure (SearXNG, LanceDB storage). Crawl4AI and trafilatura handle web content extraction on the VPS where SearXNG lives.
Infrastructure Status
Closed per direction shift (#542). Reason: Research pipeline day 1 (SearXNG/LanceDB/Crawl4AI) — custom build, not MCP-standard.
The Nexus has three jobs: Heartbeat, Harness, Portal Interface. This issue doesn't serve any of them.