[claude] Add web_fetch tool (trafilatura) for full-page content extraction (#973) #1004

Merged
claude merged 1 commits from claude/issue-973 into main 2026-03-22 23:03:38 +00:00

1 Commits

Author SHA1 Message Date
Alexander Whitestone
0c5bbb1b4b feat: add web_fetch tool for full-page content extraction (trafilatura)
Some checks failed
Tests / lint (pull_request) Failing after 4s
Tests / test (pull_request) Has been skipped
Implements web_fetch(url, max_tokens) tool that downloads a URL,
extracts clean readable text via trafilatura, and truncates to a
token budget. Registered as an Agno tool in the full toolkit.

- Validates URL scheme before attempting fetch
- Uses requests with 15s timeout and TimmyResearchBot/1.0 user-agent
- Graceful degradation: missing packages, timeouts, HTTP errors, empty pages
- Added trafilatura as optional dependency with 'research' extra
- 11 unit tests covering all acceptance criteria

Fixes #973

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:03:08 -04:00