fix: harden Gemma 4 tool-call argument normalization (#797 )

- normalize repairable Gemma 4 / Ollama tool-call argument quirks before validation - keep truncated JSON marked incomplete so the agent retries instead of silently dropping fields - merge consecutive assistant tool-call messages in API sanitization - add regression coverage for whitespace, single quotes, trailing commas, bare key/value pairs, and streamed chunks Closes #797
2026-04-22 10:44:30 -04:00
6 changed files with 314 additions and 1750 deletions
--- a/docs/issue-954-verification.md
+++ b/docs/issue-954-verification.md
@@ -1,100 +0,0 @@
-# Issue #954 Verification — maps skill guest_house / camp_site / bakery
-
-Status: PASS
-
-## Drift noted
-
-Issue #954 asked for validation on `upstream/main` (commit `c5a814b23`).
-Fresh `forge/main` did not contain `skills/productivity/maps/`, so the forge branch was behind upstream for this feature cluster.
-This branch ports the upstream maps skill files into the forge checkout and adds regression coverage.
-
-## Automated verification
-
-Command:
-
-```bash
-pytest -q tests/skills/test_maps_client.py
-```
-
-Result:
-
- 5 passed
-
-Coverage added:
-
- maps skill files exist in the repo
- `guest_house` category maps to `tourism=guest_house`
- `camp_site` category maps to `tourism=camp_site`
- `bakery` expands to both `shop=bakery` and `amenity=bakery`
- dual-key bakery results dedupe correctly
- skill documentation lists the new categories and supersedes `find-nearby`
-
-## Manual evidence
-
-### 1) guest_house lookup
-
-Command:
-
-```bash
-python3 skills/productivity/maps/scripts/maps_client.py nearby --near "Bath, United Kingdom" --category guest_house --limit 3
-```
-
-Observed results:
-
- Henrietta House — 390.3 m
- The Windsor — 437.2 m
- The Old Rectory Bed & Breakfast — 495.7 m
-
-All returned `tourism=guest_house` in the raw tags.
-
-### 2) camp_site lookup
-
-Command:
-
-```bash
-python3 skills/productivity/maps/scripts/maps_client.py nearby --near "Yosemite Valley, California" --category camp_site --limit 5
-```
-
-Observed result:
-
- Yellow Pine Administrative Campground — 90.3 m
-
-Returned `tourism=camp_site` in the raw tags.
-
-### 3) bakery lookup via `shop=bakery`
-
-Command:
-
-```bash
-python3 skills/productivity/maps/scripts/maps_client.py nearby --near "Lawrenceville, New Jersey" --category bakery --radius 5000 --limit 10
-```
-
-Observed results:
-
- The Gingered Peach — 713.8 m
- WildFlour Bakery — 741.9 m
-
-Both returned `shop=bakery` in the raw tags.
-
-### 4) bakery lookup via `amenity=bakery`
-
-Command:
-
-```bash
-python3 skills/productivity/maps/scripts/maps_client.py nearby --near "20735 Stevens Creek Boulevard, Cupertino, CA" --category bakery --radius 600 --limit 5
-```
-
-Observed result:
-
- Paris Baguette — 28.6 m
-
-Returned `amenity=bakery` in the raw tags (and also includes `shop=bakery`), proving the dual-key union query reaches amenity-tagged bakeries too.
-
-## Conclusion
-
-PASS.
-
- `guest_house` resolves correctly
- `camp_site` resolves correctly
- `bakery` resolves through both supported keys
- forge/main drift from upstream/main was real and is addressed on this branch
--- a/run_agent.py
+++ b/run_agent.py
@@ -20,6 +20,7 @@ Usage:
    response = agent.run_conversation("Tell me about the latest Python updates")
 """

+import ast
 import asyncio
 import base64
 import concurrent.futures
@@ -3328,6 +3329,119 @@ class AIAgent:

    _VALID_API_ROLES = frozenset({"system", "user", "assistant", "tool", "function", "developer"})

+    @staticmethod
+    def _normalize_tool_call_arguments(arguments: Any) -> tuple[str, bool]:
+        """Return ``(normalized_text, is_complete)`` for tool-call arguments.
+
+        Conservative by design: repairs harmless formatting quirks common in
+        Gemma 4 / Ollama output (whitespace, trailing commas, Python-style
+        single-quoted dicts, bare key/value pairs) but does NOT auto-close
+        truncated JSON objects. Truly incomplete fragments must remain marked
+        incomplete so the agent can retry instead of silently dropping fields.
+        """
+        if isinstance(arguments, (dict, list)):
+            return json.dumps(arguments, ensure_ascii=False, separators=(",", ":")), True
+        if arguments is None:
+            return "{}", True
+        if not isinstance(arguments, str):
+            arguments = str(arguments)
+
+        text = arguments.strip()
+        if not text:
+            return "{}", True
+
+        def _parse_candidate(candidate: str):
+            try:
+                return json.loads(candidate)
+            except (json.JSONDecodeError, TypeError, ValueError):
+                pass
+            try:
+                return ast.literal_eval(candidate)
+            except (SyntaxError, ValueError):
+                return None
+
+        candidates: list[str] = [text]
+
+        trimmed_trailing_commas = re.sub(r",\s*([}\]])", r"\1", text)
+        if trimmed_trailing_commas != text:
+            candidates.append(trimmed_trailing_commas)
+
+        if ":" in text and not text.startswith(("{", "[")):
+            wrapped = "{" + text + "}"
+            candidates.append(wrapped)
+            quoted_keys = re.sub(
+                r'([\{,]\s*)([A-Za-z_][A-Za-z0-9_\-]*)(\s*:)',
+                r'\1"\2"\3',
+                wrapped,
+            )
+            if quoted_keys != wrapped:
+                candidates.append(quoted_keys)
+                trimmed_quoted_keys = re.sub(r",\s*([}\]])", r"\1", quoted_keys)
+                if trimmed_quoted_keys != quoted_keys:
+                    candidates.append(trimmed_quoted_keys)
+
+        seen: set[str] = set()
+        for candidate in candidates:
+            if candidate in seen:
+                continue
+            seen.add(candidate)
+            parsed = _parse_candidate(candidate)
+            if isinstance(parsed, (dict, list)):
+                return json.dumps(parsed, ensure_ascii=False, separators=(",", ":")), True
+
+        return text, False
+
+    @staticmethod
+    def _merge_consecutive_assistant_tool_call_messages(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+        """Merge adjacent assistant messages that each carry tool_calls.
+
+        Some providers emit parallel tool calls as multiple consecutive assistant
+        messages instead of a single assistant message with multiple tool calls.
+        Merge only adjacent assistant/tool-call messages; any non-assistant
+        boundary flushes the current batch.
+        """
+        merged: List[Dict[str, Any]] = []
+        pending: Optional[Dict[str, Any]] = None
+
+        def _flush_pending() -> None:
+            nonlocal pending
+            if pending is not None:
+                merged.append(pending)
+                pending = None
+
+        for msg in messages:
+            if not isinstance(msg, dict):
+                _flush_pending()
+                merged.append(msg)
+                continue
+
+            role = msg.get("role")
+            tool_calls = msg.get("tool_calls")
+            if role == "assistant" and isinstance(tool_calls, list) and tool_calls:
+                if pending is None:
+                    pending = copy.deepcopy(msg)
+                    continue
+
+                pending_tool_calls = pending.get("tool_calls")
+                if not isinstance(pending_tool_calls, list):
+                    pending_tool_calls = []
+                    pending["tool_calls"] = pending_tool_calls
+                pending_tool_calls.extend(copy.deepcopy(tool_calls))
+
+                pending_content = pending.get("content") or ""
+                current_content = msg.get("content") or ""
+                if pending_content and current_content:
+                    pending["content"] = pending_content + "\n" + current_content
+                elif current_content:
+                    pending["content"] = current_content
+                continue
+
+            _flush_pending()
+            merged.append(msg)
+
+        _flush_pending()
+        return merged
+
    @staticmethod
    def _sanitize_api_messages(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """Fix orphaned tool_call / tool_result pairs before every LLM call.
@@ -3347,7 +3461,7 @@ class AIAgent:
                )
                continue
            filtered.append(msg)
-        messages = filtered
+        messages = AIAgent._merge_consecutive_assistant_tool_call_messages(filtered)

        surviving_call_ids: set = set()
        for msg in messages:
@@ -5254,12 +5368,9 @@ class AIAgent:
                mock_tool_calls = []
                for idx in sorted(tool_calls_acc):
                    tc = tool_calls_acc[idx]
-                    arguments = tc["function"]["arguments"]
-                    if arguments and arguments.strip():
-                        try:
-                            json.loads(arguments)
-                        except json.JSONDecodeError:
-                            has_truncated_tool_args = True
+                    arguments, is_complete = self._normalize_tool_call_arguments(tc["function"]["arguments"])
+                    if not is_complete:
+                        has_truncated_tool_args = True
                    mock_tool_calls.append(SimpleNamespace(
                        id=tc["id"],
                        type=tc["type"],
@@ -6563,6 +6674,7 @@ class AIAgent:
                    response_item_id if isinstance(response_item_id, str) else None,
                )

+                normalized_args, _ = self._normalize_tool_call_arguments(tool_call.function.arguments)
                tc_dict = {
                    "id": call_id,
                    "call_id": call_id,
@@ -6570,7 +6682,7 @@ class AIAgent:
                    "type": tool_call.type,
                    "function": {
                        "name": tool_call.function.name,
-                        "arguments": tool_call.function.arguments
+                        "arguments": normalized_args,
                    },
                }
                # Preserve extra_content (e.g. Gemini thought_signature) so it
@@ -10031,21 +10143,15 @@ class AIAgent:
                    # Handle empty strings as empty objects (common model quirk)
                    invalid_json_args = []
                    for tc in assistant_message.tool_calls:
-                        args = tc.function.arguments
-                        if isinstance(args, (dict, list)):
-                            tc.function.arguments = json.dumps(args)
-                            continue
-                        if args is not None and not isinstance(args, str):
-                            tc.function.arguments = str(args)
-                            args = tc.function.arguments
-                        # Treat empty/whitespace strings as empty object
-                        if not args or not args.strip():
-                            tc.function.arguments = "{}"
-                            continue
-                        try:
-                            json.loads(args)
-                        except json.JSONDecodeError as e:
-                            invalid_json_args.append((tc.function.name, str(e)))
+                        normalized_args, is_complete = self._normalize_tool_call_arguments(tc.function.arguments)
+                        tc.function.arguments = normalized_args
+                        if not is_complete:
+                            try:
+                                json.loads(normalized_args)
+                            except json.JSONDecodeError as e:
+                                invalid_json_args.append((tc.function.name, str(e)))
+                            except Exception as e:
+                                invalid_json_args.append((tc.function.name, str(e)))
                    
                    if invalid_json_args:
                        # Check if the invalid JSON is due to truncation rather
--- a/skills/productivity/maps/SKILL.md
+++ b/skills/productivity/maps/SKILL.md
@@ -1,199 +0,0 @@
---
-name: maps
-description: >
-  Location intelligence — geocode a place, reverse-geocode coordinates,
-  find nearby places (46 POI categories), driving/walking/cycling
-  distance + time, turn-by-turn directions, timezone lookup, bounding
-  box + area for a named place, and POI search within a rectangle.
-  Uses OpenStreetMap + Overpass + OSRM. Free, no API key.
-version: 1.2.0
-author: Mibayy
-license: MIT
-metadata:
-  hermes:
-    tags: [maps, geocoding, places, routing, distance, directions, nearby, location, openstreetmap, nominatim, overpass, osrm]
-    category: productivity
-    requires_toolsets: [terminal]
-    supersedes: [find-nearby]
---
-
-# Maps Skill
-
-Location intelligence using free, open data sources. 8 commands, 44 POI
-categories, zero dependencies (Python stdlib only), no API key required.
-
-Data sources: OpenStreetMap/Nominatim, Overpass API, OSRM, TimeAPI.io.
-
-This skill supersedes the old `find-nearby` skill — all of find-nearby's
-functionality is covered by the `nearby` command below, with the same
-`--near "<place>"` shortcut and multi-category support.
-
-## When to Use
-
- User sends a Telegram location pin (latitude/longitude in the message) → `nearby`
- User wants coordinates for a place name → `search`
- User has coordinates and wants the address → `reverse`
- User asks for nearby restaurants, hospitals, pharmacies, hotels, etc. → `nearby`
- User wants driving/walking/cycling distance or travel time → `distance`
- User wants turn-by-turn directions between two places → `directions`
- User wants timezone information for a location → `timezone`
- User wants to search for POIs within a geographic area → `area` + `bbox`
-
-## Prerequisites
-
-Python 3.8+ (stdlib only — no pip installs needed).
-
-Script path: `~/.hermes/skills/maps/scripts/maps_client.py`
-
-## Commands
-
-```bash
-MAPS=~/.hermes/skills/maps/scripts/maps_client.py
-```
-
-### search — Geocode a place name
-
-```bash
-python3 $MAPS search "Eiffel Tower"
-python3 $MAPS search "1600 Pennsylvania Ave, Washington DC"
-```
-
-Returns: lat, lon, display name, type, bounding box, importance score.
-
-### reverse — Coordinates to address
-
-```bash
-python3 $MAPS reverse 48.8584 2.2945
-```
-
-Returns: full address breakdown (street, city, state, country, postcode).
-
-### nearby — Find places by category
-
-```bash
-# By coordinates (from a Telegram location pin, for example)
-python3 $MAPS nearby 48.8584 2.2945 restaurant --limit 10
-python3 $MAPS nearby 40.7128 -74.0060 hospital --radius 2000
-
-# By address / city / zip / landmark — --near auto-geocodes
-python3 $MAPS nearby --near "Times Square, New York" --category cafe
-python3 $MAPS nearby --near "90210" --category pharmacy
-
-# Multiple categories merged into one query
-python3 $MAPS nearby --near "downtown austin" --category restaurant --category bar --limit 10
-```
-
-46 categories: restaurant, cafe, bar, hospital, pharmacy, hotel, guest_house,
-camp_site, supermarket, atm, gas_station, parking, museum, park, school,
-university, bank, police, fire_station, library, airport, train_station,
-bus_stop, church, mosque, synagogue, dentist, doctor, cinema, theatre, gym,
-swimming_pool, post_office, convenience_store, bakery, bookshop, laundry,
-car_wash, car_rental, bicycle_rental, taxi, veterinary, zoo, playground,
-stadium, nightclub.
-
-Each result includes: `name`, `address`, `lat`/`lon`, `distance_m`,
-`maps_url` (clickable Google Maps link), `directions_url` (Google Maps
-directions from the search point), and promoted tags when available —
-`cuisine`, `hours` (opening_hours), `phone`, `website`.
-
-### distance — Travel distance and time
-
-```bash
-python3 $MAPS distance "Paris" --to "Lyon"
-python3 $MAPS distance "New York" --to "Boston" --mode driving
-python3 $MAPS distance "Big Ben" --to "Tower Bridge" --mode walking
-```
-
-Modes: driving (default), walking, cycling. Returns road distance, duration,
-and straight-line distance for comparison.
-
-### directions — Turn-by-turn navigation
-
-```bash
-python3 $MAPS directions "Eiffel Tower" --to "Louvre Museum" --mode walking
-python3 $MAPS directions "JFK Airport" --to "Times Square" --mode driving
-```
-
-Returns numbered steps with instruction, distance, duration, road name, and
-maneuver type (turn, depart, arrive, etc.).
-
-### timezone — Timezone for coordinates
-
-```bash
-python3 $MAPS timezone 48.8584 2.2945
-python3 $MAPS timezone 35.6762 139.6503
-```
-
-Returns timezone name, UTC offset, and current local time.
-
-### area — Bounding box and area for a place
-
-```bash
-python3 $MAPS area "Manhattan, New York"
-python3 $MAPS area "London"
-```
-
-Returns bounding box coordinates, width/height in km, and approximate area.
-Useful as input for the bbox command.
-
-### bbox — Search within a bounding box
-
-```bash
-python3 $MAPS bbox 40.75 -74.00 40.77 -73.98 restaurant --limit 20
-```
-
-Finds POIs within a geographic rectangle. Use `area` first to get the
-bounding box coordinates for a named place.
-
-## Working With Telegram Location Pins
-
-When a user sends a location pin, the message contains `latitude:` and
-`longitude:` fields. Extract those and pass them straight to `nearby`:
-
-```bash
-# User sent a pin at 36.17, -115.14 and asked "find cafes nearby"
-python3 $MAPS nearby 36.17 -115.14 cafe --radius 1500
-```
-
-Present results as a numbered list with names, distances, and the
-`maps_url` field so the user gets a tap-to-open link in chat. For "open
-now?" questions, check the `hours` field; if missing or unclear, verify
-with `web_search` since OSM hours are community-maintained and not always
-current.
-
-## Workflow Examples
-
-**"Find Italian restaurants near the Colosseum":**
-1. `nearby --near "Colosseum Rome" --category restaurant --radius 500`
-   — one command, auto-geocoded
-
-**"What's near this location pin they sent?":**
-1. Extract lat/lon from the Telegram message
-2. `nearby LAT LON cafe --radius 1500`
-
-**"How do I walk from hotel to conference center?":**
-1. `directions "Hotel Name" --to "Conference Center" --mode walking`
-
-**"What restaurants are in downtown Seattle?":**
-1. `area "Downtown Seattle"` → get bounding box
-2. `bbox S W N E restaurant --limit 30`
-
-## Pitfalls
-
- Nominatim ToS: max 1 req/s (handled automatically by the script)
- `nearby` requires lat/lon OR `--near "<address>"` — one of the two is needed
- OSRM routing coverage is best for Europe and North America
- Overpass API can be slow during peak hours; the script automatically
-  falls back between mirrors (overpass-api.de → overpass.kumi.systems)
- `distance` and `directions` use `--to` flag for the destination (not positional)
- If a zip code alone gives ambiguous results globally, include country/state
-
-## Verification
-
-```bash
-python3 ~/.hermes/skills/maps/scripts/maps_client.py search "Statue of Liberty"
-# Should return lat ~40.689, lon ~-74.044
-
-python3 ~/.hermes/skills/maps/scripts/maps_client.py nearby --near "Times Square" --category restaurant --limit 3
-# Should return a list of restaurants within ~500m of Times Square
-```
--- a/skills/productivity/maps/scripts/maps_client.py
+++ b/skills/productivity/maps/scripts/maps_client.py
--- a/tests/run_agent/test_run_agent.py
+++ b/tests/run_agent/test_run_agent.py
@@ -1037,6 +1037,138 @@ class TestBuildAssistantMessage:
        result = agent._build_assistant_message(msg, "tool_calls")
        assert "extra_content" not in result["tool_calls"][0]

+    def test_tool_call_arguments_normalized_from_gemma4_whitespace(self, agent):
+        tc = _mock_tool_call(
+            name="read_file",
+            arguments='  \n  {"path": "README.md"}  \n  ',
+            call_id="c4",
+        )
+        msg = _mock_assistant_msg(content="", tool_calls=[tc])
+        result = agent._build_assistant_message(msg, "tool_calls")
+        assert result["tool_calls"][0]["function"]["arguments"] == '{"path":"README.md"}'
+
+    def test_tool_call_arguments_normalized_from_single_quotes_and_trailing_comma(self, agent):
+        tc = _mock_tool_call(
+            name="read_file",
+            arguments="{'path': 'README.md',}",
+            call_id="c5",
+        )
+        msg = _mock_assistant_msg(content="", tool_calls=[tc])
+        result = agent._build_assistant_message(msg, "tool_calls")
+        assert result["tool_calls"][0]["function"]["arguments"] == '{"path":"README.md"}'
+
+
+class TestNormalizeToolCallArguments:
+    @pytest.mark.parametrize(
+        ("raw_args", "expected"),
+        [
+            ('{"q":"test"}', '{"q":"test"}'),
+            ('  \n  {"q": "test"}  \n  ', '{"q":"test"}'),
+            ('{"q": "test",}', '{"q":"test"}'),
+            ("{'q': 'test'}", '{"q":"test"}'),
+            ("{'path': 'README.md', 'mode': 'read'}", '{"path":"README.md","mode":"read"}'),
+            ('"path": "README.md"', '{"path":"README.md"}'),
+            ('path: "README.md"', '{"path":"README.md"}'),
+            ('path: "README.md", mode: "read"', '{"path":"README.md","mode":"read"}'),
+            ({"path": "README.md"}, '{"path":"README.md"}'),
+            (["README.md", "docs.md"], '["README.md","docs.md"]'),
+            ('\t\n  ', '{}'),
+            ('{"nested": {"path": "README.md"}}', '{"nested":{"path":"README.md"}}'),
+        ],
+    )
+    def test_complete_args_are_normalized(self, raw_args, expected):
+        normalized, is_complete = AIAgent._normalize_tool_call_arguments(raw_args)
+        assert is_complete is True
+        assert normalized == expected
+
+    @pytest.mark.parametrize(
+        "raw_args",
+        [
+            '{"path": "README.md"',
+            '{"a": 1, "b"',
+            '{"path": [1, 2}',
+            "{'path': 'README.md'",
+            'path: "README.md", mode:',
+            '{"command": "echo hello",',
+        ],
+    )
+    def test_incomplete_args_are_not_marked_complete(self, raw_args):
+        normalized, is_complete = AIAgent._normalize_tool_call_arguments(raw_args)
+        assert is_complete is False
+        assert isinstance(normalized, str)
+        assert normalized == raw_args.strip()
+
+
+class TestSanitizeApiMessages:
+    def test_merges_consecutive_assistant_tool_call_messages(self):
+        messages = [
+            {
+                "role": "assistant",
+                "content": "first",
+                "tool_calls": [{"id": "c1", "type": "function", "function": {"name": "read_file", "arguments": '{"path":"a.py"}'}}],
+            },
+            {
+                "role": "assistant",
+                "content": "second",
+                "tool_calls": [{"id": "c2", "type": "function", "function": {"name": "search_files", "arguments": '{"pattern":"TODO"}'}}],
+            },
+            {"role": "tool", "tool_call_id": "c1", "content": "a.py"},
+            {"role": "tool", "tool_call_id": "c2", "content": "matches"},
+        ]
+
+        sanitized = AIAgent._sanitize_api_messages(messages)
+
+        assert len(sanitized) == 3
+        assert sanitized[0]["role"] == "assistant"
+        assert [tc["id"] for tc in sanitized[0]["tool_calls"]] == ["c1", "c2"]
+        assert sanitized[0]["content"] == "first\nsecond"
+
+    def test_does_not_merge_assistant_tool_call_messages_across_non_assistant_boundary(self):
+        messages = [
+            {
+                "role": "assistant",
+                "content": "",
+                "tool_calls": [{"id": "c1", "type": "function", "function": {"name": "read_file", "arguments": '{"path":"a.py"}'}}],
+            },
+            {"role": "tool", "tool_call_id": "c1", "content": "a.py"},
+            {
+                "role": "assistant",
+                "content": "",
+                "tool_calls": [{"id": "c2", "type": "function", "function": {"name": "read_file", "arguments": '{"path":"b.py"}'}}],
+            },
+            {"role": "tool", "tool_call_id": "c2", "content": "b.py"},
+        ]
+
+        sanitized = AIAgent._sanitize_api_messages(messages)
+
+        assistant_msgs = [m for m in sanitized if m.get("role") == "assistant"]
+        assert len(assistant_msgs) == 2
+        assert assistant_msgs[0]["tool_calls"][0]["id"] == "c1"
+        assert assistant_msgs[1]["tool_calls"][0]["id"] == "c2"
+
+    def test_merge_preserves_tool_call_order(self):
+        messages = [
+            {
+                "role": "assistant",
+                "content": "",
+                "tool_calls": [{"id": "c1", "type": "function", "function": {"name": "read_file", "arguments": '{"path":"a.py"}'}}],
+            },
+            {
+                "role": "assistant",
+                "content": "",
+                "tool_calls": [{"id": "c2", "type": "function", "function": {"name": "read_file", "arguments": '{"path":"b.py"}'}}],
+            },
+            {
+                "role": "assistant",
+                "content": "",
+                "tool_calls": [{"id": "c3", "type": "function", "function": {"name": "read_file", "arguments": '{"path":"c.py"}'}}],
+            },
+        ]
+
+        sanitized = AIAgent._sanitize_api_messages(messages)
+
+        assert [tc["id"] for tc in sanitized[0]["tool_calls"]] == ["c1", "c2", "c3"]
+

 class TestFormatToolsForSystemMessage:
    def test_no_tools_returns_empty_array(self, agent):
@@ -3467,6 +3599,59 @@ class TestStreamingApiCall:
        assert tc[0].function.arguments == '{"path":"x.txt","content":"hel'
        assert resp.choices[0].finish_reason == "length"

+    @pytest.mark.parametrize(
+        ("raw_arguments", "expected"),
+        [
+            ('  \n  {"path": "x.txt"}  \n  ', '{"path":"x.txt"}'),
+            ("{'path': 'x.txt',}", '{"path":"x.txt"}'),
+            ('path: "x.txt", mode: "read"', '{"path":"x.txt","mode":"read"}'),
+        ],
+    )
+    def test_repairable_tool_call_args_do_not_upgrade_finish_reason_to_length(self, agent, raw_arguments, expected):
+        chunks = [
+            _make_chunk(tool_calls=[_make_tc_delta(0, "call_1", "read_file", raw_arguments)]),
+            _make_chunk(finish_reason="tool_calls"),
+        ]
+        agent.client.chat.completions.create.return_value = iter(chunks)
+
+        resp = agent._interruptible_streaming_api_call({"messages": []})
+
+        tc = resp.choices[0].message.tool_calls
+        assert len(tc) == 1
+        assert tc[0].function.name == "read_file"
+        assert tc[0].function.arguments == expected
+        assert resp.choices[0].finish_reason == "tool_calls"
+
+    def test_streamed_tool_call_args_single_quotes_across_chunks_normalized(self, agent):
+        chunks = [
+            _make_chunk(tool_calls=[_make_tc_delta(0, "call_1", "read_file", "{'path':")]),
+            _make_chunk(tool_calls=[_make_tc_delta(0, None, None, " 'x.txt',}")]),
+            _make_chunk(finish_reason="tool_calls"),
+        ]
+        agent.client.chat.completions.create.return_value = iter(chunks)
+
+        resp = agent._interruptible_streaming_api_call({"messages": []})
+
+        tc = resp.choices[0].message.tool_calls
+        assert len(tc) == 1
+        assert tc[0].function.arguments == '{"path":"x.txt"}'
+        assert resp.choices[0].finish_reason == "tool_calls"
+
+    def test_streamed_split_json_chunks_still_reassemble(self, agent):
+        chunks = [
+            _make_chunk(tool_calls=[_make_tc_delta(0, "call_1", "read_file", '{"path":')]),
+            _make_chunk(tool_calls=[_make_tc_delta(0, None, None, ' "x.txt"}')]),
+            _make_chunk(finish_reason="tool_calls"),
+        ]
+        agent.client.chat.completions.create.return_value = iter(chunks)
+
+        resp = agent._interruptible_streaming_api_call({"messages": []})
+
+        tc = resp.choices[0].message.tool_calls
+        assert len(tc) == 1
+        assert tc[0].function.arguments == '{"path":"x.txt"}'
+        assert resp.choices[0].finish_reason == "tool_calls"
+
    def test_ollama_reused_index_separate_tool_calls(self, agent):
        """Ollama sends every tool call at index 0 with different ids.

--- a/tests/skills/test_maps_client.py
+++ b/tests/skills/test_maps_client.py
@@ -1,135 +0,0 @@
-"""Regression tests for the bundled maps skill."""
-
-from __future__ import annotations
-
-import importlib.util
-from pathlib import Path
-from types import SimpleNamespace
-
-SCRIPT_PATH = (
-    Path(__file__).resolve().parents[2]
-    / "skills/productivity/maps/scripts/maps_client.py"
-)
-SKILL_PATH = (
-    Path(__file__).resolve().parents[2]
-    / "skills/productivity/maps/SKILL.md"
-)
-
-
-def load_module():
-    assert SCRIPT_PATH.exists(), f"missing maps client script: {SCRIPT_PATH}"
-    spec = importlib.util.spec_from_file_location("maps_client_test", SCRIPT_PATH)
-    module = importlib.util.module_from_spec(spec)
-    assert spec.loader is not None
-    spec.loader.exec_module(module)
-    return module
-
-
-def test_maps_skill_files_exist():
-    assert SCRIPT_PATH.exists()
-    assert SKILL_PATH.exists()
-
-
-def test_category_tags_cover_guest_house_camp_site_and_dual_key_bakery():
-    module = load_module()
-
-    assert module.CATEGORY_TAGS["guest_house"] == ("tourism", "guest_house")
-    assert module.CATEGORY_TAGS["camp_site"] == ("tourism", "camp_site")
-    assert module.CATEGORY_TAGS["bakery"] == [
-        ("shop", "bakery"),
-        ("amenity", "bakery"),
-    ]
-    assert module._tags_for("bakery") == [
-        ("shop", "bakery"),
-        ("amenity", "bakery"),
-    ]
-
-
-def test_build_overpass_queries_include_all_supported_tags():
-    module = load_module()
-
-    bakery_query = module.build_overpass_nearby(
-        None,
-        None,
-        40.0,
-        -74.0,
-        500,
-        10,
-        tag_pairs=module._tags_for("bakery"),
-    )
-    assert 'node["shop"="bakery"]' in bakery_query
-    assert 'way["shop"="bakery"]' in bakery_query
-    assert 'node["amenity"="bakery"]' in bakery_query
-    assert 'way["amenity"="bakery"]' in bakery_query
-
-    guest_house_query = module.build_overpass_nearby(
-        None,
-        None,
-        40.0,
-        -74.0,
-        500,
-        10,
-        tag_pairs=module._tags_for("guest_house"),
-    )
-    assert 'node["tourism"="guest_house"]' in guest_house_query
-    assert 'way["tourism"="guest_house"]' in guest_house_query
-
-    camp_site_bbox = module.build_overpass_bbox(
-        None,
-        None,
-        39.0,
-        -75.0,
-        41.0,
-        -73.0,
-        10,
-        tag_pairs=module._tags_for("camp_site"),
-    )
-    assert 'node["tourism"="camp_site"]' in camp_site_bbox
-    assert 'way["tourism"="camp_site"]' in camp_site_bbox
-
-
-def test_cmd_nearby_dedupes_dual_tag_bakery_results(monkeypatch, capsys):
-    module = load_module()
-
-    duplicate_bakery = {
-        "elements": [
-            {
-                "type": "node",
-                "id": 101,
-                "lat": 40.0,
-                "lon": -74.0,
-                "tags": {"name": "Wild Flour", "shop": "bakery"},
-            },
-            {
-                "type": "node",
-                "id": 101,
-                "lat": 40.0,
-                "lon": -74.0,
-                "tags": {"name": "Wild Flour", "amenity": "bakery"},
-            },
-        ]
-    }
-
-    monkeypatch.setattr(module, "overpass_query", lambda query: duplicate_bakery)
-    args = SimpleNamespace(
-        lat="40.0",
-        lon="-74.0",
-        near=None,
-        category="bakery",
-        category_list=[],
-        radius=500,
-        limit=10,
-    )
-
-    module.cmd_nearby(args)
-    out = capsys.readouterr().out
-    assert '"count": 1' in out
-    assert '"Wild Flour"' in out
-
-
-def test_skill_doc_lists_new_categories_and_supersession():
-    text = SKILL_PATH.read_text(encoding="utf-8")
-    assert "guest_house" in text
-    assert "camp_site" in text
-    assert "bakery" in text
-    assert "supersedes: [find-nearby]" in text