fix: improve read-loop detection — consecutive-only, correct thresholds, fix bugs

Follow-up to PR #705 (merged from 0xbyt4). Addresses several issues: 1. CONSECUTIVE-ONLY TRACKING: Redesigned the read/search tracker to only warn/block on truly consecutive identical calls. Any other tool call in between (write, patch, terminal, etc.) resets the counter via notify_other_tool_call(), called from handle_function_call() in model_tools.py. This prevents false blocks in read→edit→verify flows. 2. THRESHOLD ADJUSTMENT: Warn on 3rd consecutive (was 2nd), block on 4th+ consecutive (was 3rd+). Gives the model more room before intervening. 3. TUPLE UNPACKING BUG: Fixed get_read_files_summary() which crashed on search keys (5-tuple) when trying to unpack as 3-tuple. Now uses a separate read_history set that only tracks file reads. 4. WEB_EXTRACT DOCSTRING: Reverted incorrect removal of 'title' from web_extract return docs in code_execution_tool.py — the field IS returned by web_tools.py. 5. TESTS: Rewrote test_read_loop_detection.py (35 tests) to cover consecutive-only behavior, notify_other_tool_call, interleaved read/search, and summary-unaffected-by-searches.
2026-03-10 16:25:41 -07:00
parent b53d5dad67
commit a458b535c9
4 changed files with 223 additions and 55 deletions
--- a/tools/code_execution_tool.py
+++ b/tools/code_execution_tool.py
@@ -78,7 +78,7 @@ _TOOL_STUBS = {
    "web_extract": (
        "web_extract",
        "urls: list",
-        '"""Extract content from URLs. Returns dict with results list of {url, content, error}."""',
+        '"""Extract content from URLs. Returns dict with results list of {url, title, content, error}."""',
        '{"urls": urls}',
    ),
    "read_file": (
@@ -616,7 +616,7 @@ _TOOL_DOC_LINES = [
     "    Returns {\"data\": {\"web\": [{\"url\", \"title\", \"description\"}, ...]}}"),
    ("web_extract",
     "  web_extract(urls: list[str]) -> dict\n"
-     "    Returns {\"results\": [{\"url\", \"content\", \"error\"}, ...]} where content is markdown"),
+     "    Returns {\"results\": [{\"url\", \"title\", \"content\", \"error\"}, ...]} where content is markdown"),
    ("read_file",
     "  read_file(path: str, offset: int = 1, limit: int = 500) -> dict\n"
     "    Lines are 1-indexed. Returns {\"content\": \"...\", \"total_lines\": N}"),