feat: browser screenshot sharing via MEDIA: on all messaging platforms

browser_vision now saves screenshots persistently to ~/.hermes/browser_screenshots/ and returns the screenshot_path in its JSON response. The model can include MEDIA:<path> in its response to share screenshots as native photos. Changes: - browser_tool.py: Save screenshots persistently, return screenshot_path, auto-cleanup files older than 24 hours, mkdir moved inside try/except - telegram.py: Add send_image_file() — sends local images via bot.send_photo() - discord.py: Add send_image_file() — sends local images via discord.File - slack.py: Add send_image_file() — sends local images via files_upload_v2() (WhatsApp already had send_image_file — no changes needed) - prompt_builder.py: Updated Telegram hint to list image extensions, added Discord and Slack MEDIA: platform hints - browser.md: Document screenshot sharing and 24h cleanup - send_file_integration_map.md: Updated to reflect send_image_file is now implemented on Telegram/Discord/Slack - test_send_image_file.py: 19 tests covering MEDIA: .png extraction, send_image_file on all platforms, and screenshot cleanup Partially addresses #466 (Phase 0: platform adapter gaps for send_image_file).
2026-03-07 22:57:05 -08:00
parent a680367568
commit b8c3bc7841
8 changed files with 489 additions and 21 deletions
--- a/agent/prompt_builder.py
+++ b/agent/prompt_builder.py
@@ -103,12 +103,24 @@ PLATFORM_HINTS = {
        "You are on a text messaging communication platform, Telegram. "
        "Please do not use markdown as it does not render. "
        "You can send media files natively: to deliver a file to the user, "
-        "include MEDIA:/absolute/path/to/file in your response. Audio "
-        "(.ogg) sends as voice bubbles. You can also include image URLs "
-        "in markdown format ![alt](url) and they will be sent as native photos."
+        "include MEDIA:/absolute/path/to/file in your response. Images "
+        "(.png, .jpg, .webp) appear as photos, audio (.ogg) sends as voice "
+        "bubbles, and videos (.mp4) play inline. You can also include image "
+        "URLs in markdown format ![alt](url) and they will be sent as native photos."
    ),
    "discord": (
-        "You are in a Discord server or group chat communicating with your user."
+        "You are in a Discord server or group chat communicating with your user. "
+        "You can send media files natively: include MEDIA:/absolute/path/to/file "
+        "in your response. Images (.png, .jpg, .webp) are sent as photo "
+        "attachments, audio as file attachments. You can also include image URLs "
+        "in markdown format ![alt](url) and they will be sent as attachments."
+    ),
+    "slack": (
+        "You are in a Slack workspace communicating with your user. "
+        "You can send media files natively: include MEDIA:/absolute/path/to/file "
+        "in your response. Images (.png, .jpg, .webp) are uploaded as photo "
+        "attachments, audio as file attachments. You can also include image URLs "
+        "in markdown format ![alt](url) and they will be uploaded as attachments."
    ),
    "cli": (
        "You are a CLI AI Agent. Try not to use markdown but simple text "
--- a/docs/send_file_integration_map.md
+++ b/docs/send_file_integration_map.md
@@ -115,8 +115,9 @@
 - `edit_message(chat_id, message_id, content)` — edit sent messages

 ### What's missing:
- **Telegram:** No override for `send_document` or `send_image_file` — falls back to text!
- **Discord:** No override for `send_document` — falls back to text!
+- **Telegram:** No override for `send_document` — falls back to text! (`send_image_file` ✅ added)
+- **Discord:** No override for `send_document` — falls back to text! (`send_image_file` ✅ added)
+- **Slack:** No override for `send_document` — falls back to text! (`send_image_file` ✅ added)
 - **WhatsApp:** Has `send_document` and `send_image_file` via bridge — COMPLETE.
 - The base class defaults just send "📎 File: /path" as text — useless for actual file delivery.

@@ -126,13 +127,13 @@
 - `send()` — MarkdownV2 text with fallback to plain
 - `send_voice()` — `.ogg`/`.opus` as `send_voice()`, others as `send_audio()`
 - `send_image()` — URL-based via `send_photo()`
+- `send_image_file()` — local file via `send_photo(photo=open(path, 'rb'))` ✅
 - `send_animation()` — GIF via `send_animation()`
 - `send_typing()` — "typing" chat action
 - `edit_message()` — edit text messages

 ### MISSING:
 - **`send_document()` NOT overridden** — Need to add `self._bot.send_document(chat_id, document=open(file_path, 'rb'), ...)`
- **`send_image_file()` NOT overridden** — Need to add `self._bot.send_photo(chat_id, photo=open(path, 'rb'), ...)`
 - **`send_video()` NOT overridden** — Need to add `self._bot.send_video(...)`

 ## 8. gateway/platforms/discord.py — Send Method Analysis
@@ -141,12 +142,12 @@
 - `send()` — text messages with chunking
 - `send_voice()` — discord.File attachment
 - `send_image()` — downloads URL, creates discord.File attachment
+- `send_image_file()` — local file via discord.File attachment ✅
 - `send_typing()` — channel.typing()
 - `edit_message()` — edit text messages

 ### MISSING:
 - **`send_document()` NOT overridden** — Need to add discord.File attachment
- **`send_image_file()` NOT overridden** — Need to add discord.File from local path
 - **`send_video()` NOT overridden** — Need to add discord.File attachment

 ## 9. gateway/run.py — User File Attachment Handling
--- a/gateway/platforms/discord.py
+++ b/gateway/platforms/discord.py
@@ -267,6 +267,43 @@ class DiscordAdapter(BasePlatformAdapter):
            print(f"[{self.name}] Failed to send audio: {e}")
            return await super().send_voice(chat_id, audio_path, caption, reply_to)
    
+    async def send_image_file(
+        self,
+        chat_id: str,
+        image_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send a local image file natively as a Discord file attachment."""
+        if not self._client:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import io
+            
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            if not channel:
+                return SendResult(success=False, error=f"Channel {chat_id} not found")
+            
+            if not os.path.exists(image_path):
+                return SendResult(success=False, error=f"Image file not found: {image_path}")
+            
+            filename = os.path.basename(image_path)
+            
+            with open(image_path, "rb") as f:
+                file = discord.File(io.BytesIO(f.read()), filename=filename)
+                msg = await channel.send(
+                    content=caption if caption else None,
+                    file=file,
+                )
+                return SendResult(success=True, message_id=str(msg.id))
+        
+        except Exception as e:
+            print(f"[{self.name}] Failed to send local image: {e}")
+            return await super().send_image_file(chat_id, image_path, caption, reply_to)
+
    async def send_image(
        self,
        chat_id: str,
--- a/gateway/platforms/slack.py
+++ b/gateway/platforms/slack.py
@@ -179,6 +179,35 @@ class SlackAdapter(BasePlatformAdapter):
        """Slack doesn't have a direct typing indicator API for bots."""
        pass

+    async def send_image_file(
+        self,
+        chat_id: str,
+        image_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send a local image file to Slack by uploading it."""
+        if not self._app:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            import os
+            if not os.path.exists(image_path):
+                return SendResult(success=False, error=f"Image file not found: {image_path}")
+
+            result = await self._app.client.files_upload_v2(
+                channel=chat_id,
+                file=image_path,
+                filename=os.path.basename(image_path),
+                initial_comment=caption or "",
+                thread_ts=reply_to,
+            )
+            return SendResult(success=True, raw_response=result)
+
+        except Exception as e:
+            print(f"[{self.name}] Failed to send local image: {e}")
+            return await super().send_image_file(chat_id, image_path, caption, reply_to)
+
    async def send_image(
        self,
        chat_id: str,
--- a/gateway/platforms/telegram.py
+++ b/gateway/platforms/telegram.py
@@ -306,6 +306,34 @@ class TelegramAdapter(BasePlatformAdapter):
            print(f"[{self.name}] Failed to send voice/audio: {e}")
            return await super().send_voice(chat_id, audio_path, caption, reply_to)
    
+    async def send_image_file(
+        self,
+        chat_id: str,
+        image_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send a local image file natively as a Telegram photo."""
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import os
+            if not os.path.exists(image_path):
+                return SendResult(success=False, error=f"Image file not found: {image_path}")
+            
+            with open(image_path, "rb") as image_file:
+                msg = await self._bot.send_photo(
+                    chat_id=int(chat_id),
+                    photo=image_file,
+                    caption=caption[:1024] if caption else None,
+                    reply_to_message_id=int(reply_to) if reply_to else None,
+                )
+            return SendResult(success=True, message_id=str(msg.message_id))
+        except Exception as e:
+            print(f"[{self.name}] Failed to send local image: {e}")
+            return await super().send_image_file(chat_id, image_path, caption, reply_to)
+
    async def send_image(
        self,
        chat_id: str,
--- a/tests/gateway/test_send_image_file.py
+++ b/tests/gateway/test_send_image_file.py
@@ -0,0 +1,335 @@
+"""
+Tests for send_image_file() on Telegram, Discord, and Slack platforms,
+and MEDIA: .png extraction/routing in the base platform adapter.
+
+Covers: local image file sending, file-not-found handling, fallback on error,
+        MEDIA: tag extraction for image extensions, and routing to send_image_file.
+"""
+
+import asyncio
+import os
+import sys
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from gateway.config import PlatformConfig
+from gateway.platforms.base import BasePlatformAdapter, SendResult
+
+
+# ---------------------------------------------------------------------------
+# MEDIA: extraction tests for image files
+# ---------------------------------------------------------------------------
+
+
+class TestExtractMediaImages:
+    """Test that MEDIA: tags with image extensions are correctly extracted."""
+
+    def test_png_image_extracted(self):
+        content = "Here is the screenshot:\nMEDIA:/home/user/.hermes/browser_screenshots/shot.png"
+        media, cleaned = BasePlatformAdapter.extract_media(content)
+        assert len(media) == 1
+        assert media[0][0] == "/home/user/.hermes/browser_screenshots/shot.png"
+        assert "MEDIA:" not in cleaned
+        assert "Here is the screenshot" in cleaned
+
+    def test_jpg_image_extracted(self):
+        content = "MEDIA:/tmp/photo.jpg"
+        media, cleaned = BasePlatformAdapter.extract_media(content)
+        assert len(media) == 1
+        assert media[0][0] == "/tmp/photo.jpg"
+
+    def test_webp_image_extracted(self):
+        content = "MEDIA:/tmp/image.webp"
+        media, _ = BasePlatformAdapter.extract_media(content)
+        assert len(media) == 1
+
+    def test_mixed_audio_and_image(self):
+        content = "MEDIA:/audio.ogg\nMEDIA:/screenshot.png"
+        media, _ = BasePlatformAdapter.extract_media(content)
+        assert len(media) == 2
+        paths = [m[0] for m in media]
+        assert "/audio.ogg" in paths
+        assert "/screenshot.png" in paths
+
+
+# ---------------------------------------------------------------------------
+# Telegram send_image_file tests
+# ---------------------------------------------------------------------------
+
+
+def _ensure_telegram_mock():
+    """Install mock telegram modules so TelegramAdapter can be imported."""
+    if "telegram" in sys.modules and hasattr(sys.modules["telegram"], "__file__"):
+        return
+
+    telegram_mod = MagicMock()
+    telegram_mod.ext.ContextTypes.DEFAULT_TYPE = type(None)
+    telegram_mod.constants.ParseMode.MARKDOWN_V2 = "MarkdownV2"
+    telegram_mod.constants.ChatType.GROUP = "group"
+    telegram_mod.constants.ChatType.SUPERGROUP = "supergroup"
+    telegram_mod.constants.ChatType.CHANNEL = "channel"
+    telegram_mod.constants.ChatType.PRIVATE = "private"
+
+    for name in ("telegram", "telegram.ext", "telegram.constants"):
+        sys.modules.setdefault(name, telegram_mod)
+
+
+_ensure_telegram_mock()
+
+from gateway.platforms.telegram import TelegramAdapter  # noqa: E402
+
+
+class TestTelegramSendImageFile:
+    @pytest.fixture
+    def adapter(self):
+        config = PlatformConfig(enabled=True, token="fake-token")
+        a = TelegramAdapter(config)
+        a._bot = MagicMock()
+        return a
+
+    def test_sends_local_image_as_photo(self, adapter, tmp_path):
+        """send_image_file should call bot.send_photo with the opened file."""
+        img = tmp_path / "screenshot.png"
+        img.write_bytes(b"\x89PNG\r\n\x1a\n" + b"\x00" * 100)  # Minimal PNG-like
+
+        mock_msg = MagicMock()
+        mock_msg.message_id = 42
+        adapter._bot.send_photo = AsyncMock(return_value=mock_msg)
+
+        result = asyncio.get_event_loop().run_until_complete(
+            adapter.send_image_file(chat_id="12345", image_path=str(img))
+        )
+        assert result.success
+        assert result.message_id == "42"
+        adapter._bot.send_photo.assert_awaited_once()
+
+        # Verify photo arg was a file object (opened in rb mode)
+        call_kwargs = adapter._bot.send_photo.call_args
+        assert call_kwargs.kwargs["chat_id"] == 12345
+
+    def test_returns_error_when_file_missing(self, adapter):
+        """send_image_file should return error for nonexistent file."""
+        result = asyncio.get_event_loop().run_until_complete(
+            adapter.send_image_file(chat_id="12345", image_path="/nonexistent/image.png")
+        )
+        assert not result.success
+        assert "not found" in result.error
+
+    def test_returns_error_when_not_connected(self, adapter):
+        """send_image_file should return error when bot is None."""
+        adapter._bot = None
+        result = asyncio.get_event_loop().run_until_complete(
+            adapter.send_image_file(chat_id="12345", image_path="/tmp/img.png")
+        )
+        assert not result.success
+        assert "Not connected" in result.error
+
+    def test_caption_truncated_to_1024(self, adapter, tmp_path):
+        """Telegram captions have a 1024 char limit."""
+        img = tmp_path / "shot.png"
+        img.write_bytes(b"\x89PNG" + b"\x00" * 50)
+
+        mock_msg = MagicMock()
+        mock_msg.message_id = 1
+        adapter._bot.send_photo = AsyncMock(return_value=mock_msg)
+
+        long_caption = "A" * 2000
+        asyncio.get_event_loop().run_until_complete(
+            adapter.send_image_file(chat_id="12345", image_path=str(img), caption=long_caption)
+        )
+
+        call_kwargs = adapter._bot.send_photo.call_args.kwargs
+        assert len(call_kwargs["caption"]) == 1024
+
+
+# ---------------------------------------------------------------------------
+# Discord send_image_file tests
+# ---------------------------------------------------------------------------
+
+
+def _ensure_discord_mock():
+    """Install mock discord module so DiscordAdapter can be imported."""
+    if "discord" in sys.modules and hasattr(sys.modules["discord"], "__file__"):
+        return
+
+    discord_mod = MagicMock()
+    discord_mod.Intents.default.return_value = MagicMock()
+    discord_mod.Client = MagicMock
+    discord_mod.File = MagicMock
+
+    for name in ("discord", "discord.ext", "discord.ext.commands"):
+        sys.modules.setdefault(name, discord_mod)
+
+
+_ensure_discord_mock()
+
+import discord as discord_mod_ref  # noqa: E402
+from gateway.platforms.discord import DiscordAdapter  # noqa: E402
+
+
+class TestDiscordSendImageFile:
+    @pytest.fixture
+    def adapter(self):
+        config = PlatformConfig(enabled=True, token="fake-token")
+        a = DiscordAdapter(config)
+        a._client = MagicMock()
+        return a
+
+    def test_sends_local_image_as_attachment(self, adapter, tmp_path):
+        """send_image_file should create discord.File and send to channel."""
+        img = tmp_path / "screenshot.png"
+        img.write_bytes(b"\x89PNG" + b"\x00" * 50)
+
+        mock_channel = MagicMock()
+        mock_msg = MagicMock()
+        mock_msg.id = 99
+        mock_channel.send = AsyncMock(return_value=mock_msg)
+        adapter._client.get_channel = MagicMock(return_value=mock_channel)
+
+        result = asyncio.get_event_loop().run_until_complete(
+            adapter.send_image_file(chat_id="67890", image_path=str(img))
+        )
+        assert result.success
+        assert result.message_id == "99"
+        mock_channel.send.assert_awaited_once()
+
+    def test_returns_error_when_file_missing(self, adapter):
+        result = asyncio.get_event_loop().run_until_complete(
+            adapter.send_image_file(chat_id="67890", image_path="/nonexistent.png")
+        )
+        assert not result.success
+        assert "not found" in result.error
+
+    def test_returns_error_when_not_connected(self, adapter):
+        adapter._client = None
+        result = asyncio.get_event_loop().run_until_complete(
+            adapter.send_image_file(chat_id="67890", image_path="/tmp/img.png")
+        )
+        assert not result.success
+        assert "Not connected" in result.error
+
+    def test_handles_missing_channel(self, adapter):
+        adapter._client.get_channel = MagicMock(return_value=None)
+        adapter._client.fetch_channel = AsyncMock(return_value=None)
+
+        result = asyncio.get_event_loop().run_until_complete(
+            adapter.send_image_file(chat_id="99999", image_path="/tmp/img.png")
+        )
+        assert not result.success
+        assert "not found" in result.error
+
+
+# ---------------------------------------------------------------------------
+# Slack send_image_file tests
+# ---------------------------------------------------------------------------
+
+
+def _ensure_slack_mock():
+    """Install mock slack_bolt module so SlackAdapter can be imported."""
+    if "slack_bolt" in sys.modules and hasattr(sys.modules["slack_bolt"], "__file__"):
+        return
+
+    slack_mod = MagicMock()
+    for name in ("slack_bolt", "slack_bolt.async_app", "slack_sdk", "slack_sdk.web.async_client"):
+        sys.modules.setdefault(name, slack_mod)
+
+
+_ensure_slack_mock()
+
+from gateway.platforms.slack import SlackAdapter  # noqa: E402
+
+
+class TestSlackSendImageFile:
+    @pytest.fixture
+    def adapter(self):
+        config = PlatformConfig(enabled=True, token="xoxb-fake")
+        a = SlackAdapter(config)
+        a._app = MagicMock()
+        return a
+
+    def test_sends_local_image_via_upload(self, adapter, tmp_path):
+        """send_image_file should call files_upload_v2 with the local path."""
+        img = tmp_path / "screenshot.png"
+        img.write_bytes(b"\x89PNG" + b"\x00" * 50)
+
+        mock_result = MagicMock()
+        adapter._app.client.files_upload_v2 = AsyncMock(return_value=mock_result)
+
+        result = asyncio.get_event_loop().run_until_complete(
+            adapter.send_image_file(chat_id="C12345", image_path=str(img))
+        )
+        assert result.success
+        adapter._app.client.files_upload_v2.assert_awaited_once()
+
+        call_kwargs = adapter._app.client.files_upload_v2.call_args.kwargs
+        assert call_kwargs["file"] == str(img)
+        assert call_kwargs["filename"] == "screenshot.png"
+        assert call_kwargs["channel"] == "C12345"
+
+    def test_returns_error_when_file_missing(self, adapter):
+        result = asyncio.get_event_loop().run_until_complete(
+            adapter.send_image_file(chat_id="C12345", image_path="/nonexistent.png")
+        )
+        assert not result.success
+        assert "not found" in result.error
+
+    def test_returns_error_when_not_connected(self, adapter):
+        adapter._app = None
+        result = asyncio.get_event_loop().run_until_complete(
+            adapter.send_image_file(chat_id="C12345", image_path="/tmp/img.png")
+        )
+        assert not result.success
+        assert "Not connected" in result.error
+
+
+# ---------------------------------------------------------------------------
+# browser_vision screenshot cleanup tests
+# ---------------------------------------------------------------------------
+
+
+class TestScreenshotCleanup:
+    def test_cleanup_removes_old_screenshots(self, tmp_path):
+        """_cleanup_old_screenshots should remove files older than max_age_hours."""
+        import time
+        from tools.browser_tool import _cleanup_old_screenshots
+
+        # Create a "fresh" file
+        fresh = tmp_path / "browser_screenshot_fresh.png"
+        fresh.write_bytes(b"new")
+
+        # Create an "old" file and backdate its mtime
+        old = tmp_path / "browser_screenshot_old.png"
+        old.write_bytes(b"old")
+        old_time = time.time() - (25 * 3600)  # 25 hours ago
+        os.utime(str(old), (old_time, old_time))
+
+        _cleanup_old_screenshots(tmp_path, max_age_hours=24)
+
+        assert fresh.exists(), "Fresh screenshot should not be removed"
+        assert not old.exists(), "Old screenshot should be removed"
+
+    def test_cleanup_ignores_non_screenshot_files(self, tmp_path):
+        """Only files matching browser_screenshot_*.png should be cleaned."""
+        import time
+        from tools.browser_tool import _cleanup_old_screenshots
+
+        other_file = tmp_path / "important_data.txt"
+        other_file.write_bytes(b"keep me")
+        old_time = time.time() - (48 * 3600)
+        os.utime(str(other_file), (old_time, old_time))
+
+        _cleanup_old_screenshots(tmp_path, max_age_hours=24)
+
+        assert other_file.exists(), "Non-screenshot files should not be touched"
+
+    def test_cleanup_handles_empty_dir(self, tmp_path):
+        """Cleanup should not fail on empty directory."""
+        from tools.browser_tool import _cleanup_old_screenshots
+        _cleanup_old_screenshots(tmp_path, max_age_hours=24)  # Should not raise
+
+    def test_cleanup_handles_nonexistent_dir(self):
+        """Cleanup should not fail if directory doesn't exist."""
+        from pathlib import Path
+        from tools.browser_tool import _cleanup_old_screenshots
+        _cleanup_old_screenshots(Path("/nonexistent/dir"), max_age_hours=24)  # Should not raise
--- a/tools/browser_tool.py
+++ b/tools/browser_tool.py
@@ -424,7 +424,7 @@ BROWSER_TOOL_SCHEMAS = [
    },
    {
        "name": "browser_vision",
-        "description": "Take a screenshot of the current page and analyze it with vision AI. Use this when you need to visually understand what's on the page - especially useful for CAPTCHAs, visual verification challenges, complex layouts, or when the text snapshot doesn't capture important visual information. Requires browser_navigate to be called first.",
+        "description": "Take a screenshot of the current page and analyze it with vision AI. Use this when you need to visually understand what's on the page - especially useful for CAPTCHAs, visual verification challenges, complex layouts, or when the text snapshot doesn't capture important visual information. Returns both the AI analysis and a screenshot_path that you can share with the user by including MEDIA:<screenshot_path> in your response. Requires browser_navigate to be called first.",
        "parameters": {
            "type": "object",
            "properties": {
@@ -1289,15 +1289,17 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
    text-based snapshot may not capture (CAPTCHAs, verification challenges,
    images, complex layouts, etc.).
    
+    The screenshot is saved persistently and its file path is returned alongside
+    the analysis, so it can be shared with users via MEDIA:<path> in the response.
+    
    Args:
        question: What you want to know about the page visually
        task_id: Task identifier for session isolation
        
    Returns:
-        JSON string with vision analysis results
+        JSON string with vision analysis results and screenshot_path
    """
    import base64
-    import tempfile
    import uuid as uuid_mod
    from pathlib import Path
    
@@ -1311,11 +1313,17 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
                     "Set OPENROUTER_API_KEY or configure Nous Portal to enable browser vision."
        }, ensure_ascii=False)
    
-    # Create a temporary file for the screenshot
-    temp_dir = Path(tempfile.gettempdir())
-    screenshot_path = temp_dir / f"browser_screenshot_{uuid_mod.uuid4().hex}.png"
+    # Save screenshot to persistent location so it can be shared with users
+    hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
+    screenshots_dir = hermes_home / "browser_screenshots"
+    screenshot_path = screenshots_dir / f"browser_screenshot_{uuid_mod.uuid4().hex}.png"
    
    try:
+        screenshots_dir.mkdir(parents=True, exist_ok=True)
+        
+        # Prune old screenshots (older than 24 hours) to prevent unbounded disk growth
+        _cleanup_old_screenshots(screenshots_dir, max_age_hours=24)
+        
        # Take screenshot using agent-browser
        result = _run_browser_command(
            effective_task_id, 
@@ -1372,21 +1380,35 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
        return json.dumps({
            "success": True,
            "analysis": analysis,
+            "screenshot_path": str(screenshot_path),
        }, ensure_ascii=False)
    
    except Exception as e:
-        return json.dumps({
-            "success": False,
-            "error": f"Error during vision analysis: {str(e)}"
-        }, ensure_ascii=False)
-    
-    finally:
-        # Clean up screenshot file
+        # Clean up screenshot on failure
        if screenshot_path.exists():
            try:
                screenshot_path.unlink()
            except Exception:
                pass
+        return json.dumps({
+            "success": False,
+            "error": f"Error during vision analysis: {str(e)}"
+        }, ensure_ascii=False)
+
+
+def _cleanup_old_screenshots(screenshots_dir, max_age_hours=24):
+    """Remove browser screenshots older than max_age_hours to prevent disk bloat."""
+    import time
+    try:
+        cutoff = time.time() - (max_age_hours * 3600)
+        for f in screenshots_dir.glob("browser_screenshot_*.png"):
+            try:
+                if f.stat().st_mtime < cutoff:
+                    f.unlink()
+            except Exception:
+                pass
+    except Exception:
+        pass  # Non-critical — don't fail the screenshot operation


 # ============================================================================
--- a/website/docs/user-guide/features/browser.md
+++ b/website/docs/user-guide/features/browser.md
@@ -134,10 +134,14 @@ List all images on the current page with their URLs and alt text. Useful for fin

 Take a screenshot and analyze it with vision AI. Use this when text snapshots don't capture important visual information — especially useful for CAPTCHAs, complex layouts, or visual verification challenges.

+The screenshot is saved persistently and the file path is returned alongside the AI analysis. On messaging platforms (Telegram, Discord, Slack, WhatsApp), you can ask the agent to share the screenshot — it will be sent as a native photo attachment via the `MEDIA:` mechanism.
+
 ```
 What does the chart on this page show?
 ```

+Screenshots are stored in `~/.hermes/browser_screenshots/` and automatically cleaned up after 24 hours.
+
 ### `browser_close`

 Close the browser session and release resources. Call this when done to free up Browserbase session quota.