# send_file Integration Map — Hermes Agent Codebase Deep Dive

## 1. environments/tool_context.py — Base64 File Transfer Implementation

### upload_file() (lines 153-205)
- Reads local file as raw bytes, base64-encodes to ASCII string
- Creates parent dirs in sandbox via `self.terminal(f"mkdir -p {parent}")`
- **Chunk size:** 60,000 chars (~60KB per shell command)
- **Small files (<=60KB b64):** Single `printf '%s' '{b64}' | base64 -d > {remote_path}`
- **Large files:** Writes chunks to `/tmp/_hermes_upload.b64` via `printf >> append`, then `base64 -d` to target
- **Error handling:** Checks local file exists; returns `{exit_code, output}`
- **Size limits:** No explicit limit, but shell arg limit ~2MB means chunking is necessary for files >~45KB raw
- **No theoretical max** — but very large files would be slow (many terminal round trips)

### download_file() (lines 234-278)
- Runs `base64 {remote_path}` inside sandbox, captures stdout
- Strips output, base64-decodes to raw bytes
- Writes to host filesystem with parent dir creation
- **Error handling:** Checks exit code, empty output, decode errors
- Returns `{success: bool, bytes: int}` or `{success: false, error: str}`
- **Size limit:** Bounded by terminal output buffer (practical limit ~few MB via base64 terminal output)

### Promotion potential:
- These methods work via `self.terminal()` — they're environment-agnostic
- Could be directly lifted into a new tool that operates on the agent's current sandbox
- For send_file, this `download_file()` pattern is the key: it extracts files from sandbox → host

## 2. tools/environments/base.py — BaseEnvironment Interface

### Current methods:
- `execute(command, cwd, timeout, stdin_data)` → `{output, returncode}`
- `cleanup()` — release resources
- `stop()` — alias for cleanup
- `_prepare_command()` — sudo transformation
- `_build_run_kwargs()` — subprocess kwargs
- `_timeout_result()` — standard timeout dict

### What would need to be added for file transfer:
- **Nothing required at this level.** File transfer can be implemented via `execute()` (base64 over terminal, like ToolContext does) or via environment-specific methods.
- Optional: `upload_file(local_path, remote_path)` and `download_file(remote_path, local_path)` methods could be added to BaseEnvironment for optimized per-backend transfers, but the base64-over-terminal approach already works universally.

## 3. tools/environments/docker.py — Docker Container Details

### Container ID tracking:
- `self._container_id` stored at init from `self._inner.container_id`
- Inner is `minisweagent.environments.docker.DockerEnvironment`
- Container ID is a standard Docker container hash

### docker cp feasibility:
- **YES**, `docker cp` could be used for optimized file transfer:
  - `docker cp {container_id}:{remote_path} {local_path}` (download)
  - `docker cp {local_path} {container_id}:{remote_path}` (upload)
- Much faster than base64-over-terminal for large files
- Container ID is directly accessible via `env._container_id` or `env._inner.container_id`

### Volumes mounted:
- **Persistent mode:** Bind mounts at `~/.hermes/sandboxes/docker/{task_id}/workspace` → `/workspace` and `.../home` → `/root`
- **Ephemeral mode:** tmpfs at `/workspace` (10GB), `/home` (1GB), `/root` (1GB)
- **User volumes:** From `config.yaml docker_volumes` (arbitrary `-v` mounts)
- **Security tmpfs:** `/tmp` (512MB), `/var/tmp` (256MB), `/run` (64MB)

### Direct host access for persistent mode:
- If persistent, files at `/workspace/foo.txt` are just `~/.hermes/sandboxes/docker/{task_id}/workspace/foo.txt` on host — no transfer needed!

## 4. tools/environments/ssh.py — SSH Connection Management

### Connection management:
- Uses SSH ControlMaster for persistent connection
- Control socket at `/tmp/hermes-ssh/{user}@{host}:{port}.sock`
- ControlPersist=300 (5 min keepalive)
- BatchMode=yes (non-interactive)
- Stores: `self.host`, `self.user`, `self.port`, `self.key_path`

### SCP/SFTP feasibility:
- **YES**, SCP can piggyback on the ControlMaster socket:
  - `scp -o ControlPath={socket} {user}@{host}:{remote} {local}` (download)
  - `scp -o ControlPath={socket} {local} {user}@{host}:{remote}` (upload)
- Same SSH key and connection reuse — zero additional auth
- Would be much faster than base64-over-terminal for large files

## 5. tools/environments/modal.py — Modal Sandbox Filesystem

### Filesystem API exposure:
- **Not directly.** The inner `SwerexModalEnvironment` wraps Modal's sandbox
- The sandbox object is accessible at: `env._inner.deployment._sandbox`
- Modal's Python SDK exposes `sandbox.open()` for file I/O — but only via async API
- Currently only used for `snapshot_filesystem()` during cleanup
- **Could use:** `sandbox.open(path, "rb")` to read files or `sandbox.open(path, "wb")` to write
- **Alternative:** Base64-over-terminal already works via `execute()` — simpler, no SDK dependency

## 6. gateway/platforms/base.py — MEDIA: Tag Flow (Complete)

### extract_media() (lines 587-620):
- **Pattern:** `MEDIA:\S+` — extracts file paths after MEDIA: prefix
- **Voice flag:** `[[audio_as_voice]]` global directive sets `is_voice=True` for all media in message
- Returns `List[Tuple[str, bool]]` (path, is_voice) and cleaned content

### _process_message_background() media routing (lines 752-786):
- After extracting MEDIA tags, routes by file extension:
  - `.ogg .opus .mp3 .wav .m4a` → `send_voice()`
  - `.mp4 .mov .avi .mkv .3gp` → `send_video()`
  - `.jpg .jpeg .png .webp .gif` → `send_image_file()`
  - **Everything else** → `send_document()`
- This routing already supports arbitrary files!

### send_* method inventory (base class):
- `send(chat_id, content, reply_to, metadata)` — ABSTRACT, text
- `send_image(chat_id, image_url, caption, reply_to)` — URL-based images
- `send_animation(chat_id, animation_url, caption, reply_to)` — GIF animations
- `send_voice(chat_id, audio_path, caption, reply_to)` — voice messages
- `send_video(chat_id, video_path, caption, reply_to)` — video files
- `send_document(chat_id, file_path, caption, file_name, reply_to)` — generic files
- `send_image_file(chat_id, image_path, caption, reply_to)` — local image files
- `send_typing(chat_id)` — typing indicator
- `edit_message(chat_id, message_id, content)` — edit sent messages

### What's missing:
- **Telegram:** No override for `send_document` — falls back to text! (`send_image_file` ✅ added)
- **Discord:** No override for `send_document` — falls back to text! (`send_image_file` ✅ added)
- **Slack:** No override for `send_document` — falls back to text! (`send_image_file` ✅ added)
- **WhatsApp:** Has `send_document` and `send_image_file` via bridge — COMPLETE.
- The base class defaults just send "📎 File: /path" as text — useless for actual file delivery.

## 7. gateway/platforms/telegram.py — Send Method Analysis

### Implemented send methods:
- `send()` — MarkdownV2 text with fallback to plain
- `send_voice()` — `.ogg`/`.opus` as `send_voice()`, others as `send_audio()`
- `send_image()` — URL-based via `send_photo()`
- `send_image_file()` — local file via `send_photo(photo=open(path, 'rb'))` ✅
- `send_animation()` — GIF via `send_animation()`
- `send_typing()` — "typing" chat action
- `edit_message()` — edit text messages

### MISSING:
- **`send_document()` NOT overridden** — Need to add `self._bot.send_document(chat_id, document=open(file_path, 'rb'), ...)`
- **`send_video()` NOT overridden** — Need to add `self._bot.send_video(...)`

## 8. gateway/platforms/discord.py — Send Method Analysis

### Implemented send methods:
- `send()` — text messages with chunking
- `send_voice()` — discord.File attachment
- `send_image()` — downloads URL, creates discord.File attachment
- `send_image_file()` — local file via discord.File attachment ✅
- `send_typing()` — channel.typing()
- `edit_message()` — edit text messages

### MISSING:
- **`send_document()` NOT overridden** — Need to add discord.File attachment
- **`send_video()` NOT overridden** — Need to add discord.File attachment

## 9. gateway/run.py — User File Attachment Handling

### Current attachment flow:
1. **Telegram photos** (line 509-529): Download via `photo.get_file()` → `cache_image_from_bytes()` → vision auto-analysis
2. **Telegram voice** (line 532-541): Download → `cache_audio_from_bytes()` → STT transcription
3. **Telegram audio** (line 542-551): Same pattern
4. **Telegram documents** (line 553-617): Extension validation against `SUPPORTED_DOCUMENT_TYPES`, 20MB limit, content injection for text files
5. **Discord attachments** (line 717-751): Content-type detection, image/audio caching, URL fallback for other types
6. **Gateway run.py** (lines 818-883): Auto-analyzes images with vision, transcribes audio, enriches document messages with context notes

### Key insight: Files are always cached to host filesystem first, then processed. The agent sees local file paths.

## 10. tools/terminal_tool.py — Terminal Tool & Environment Interaction

### How it manages environments:
- Global dict `_active_environments: Dict[str, Any]` keyed by task_id
- Per-task creation locks prevent duplicate sandbox creation
- Auto-cleanup thread kills idle environments after `TERMINAL_LIFETIME_SECONDS`
- `_get_env_config()` reads all TERMINAL_* env vars for backend selection
- `_create_environment()` factory creates the right backend type

### Could send_file piggyback?
- **YES.** send_file needs access to the same environment to extract files from sandboxes.
- It can reuse `_active_environments[task_id]` to get the environment, then:
  - Docker: Use `docker cp` via `env._container_id`
  - SSH: Use `scp` via `env.control_socket`
  - Local: Just read the file directly
  - Modal: Use base64-over-terminal via `env.execute()`
- The file_tools.py module already does this with `ShellFileOperations` — read_file/write_file/search/patch all share the same env instance.

## 11. tools/tts_tool.py — Working Example of File Delivery

### Flow:
1. Generate audio file to `~/.hermes/audio_cache/tts_TIMESTAMP.{ogg,mp3}`
2. Return JSON with `media_tag: "MEDIA:/path/to/file"`
3. For Telegram voice: prepend `[[audio_as_voice]]` directive
4. The LLM includes the MEDIA tag in its response text
5. `BasePlatformAdapter._process_message_background()` calls `extract_media()` to find the tag
6. Routes by extension → `send_voice()` for audio files
7. Platform adapter sends the file natively

### Key pattern: Tool saves file to host → returns MEDIA: path → LLM echoes it → gateway extracts → platform delivers

## 12. tools/image_generation_tool.py — Working Example of Image Delivery

### Flow:
1. Call FAL.ai API → get image URL
2. Return JSON with `image: "https://fal.media/..."` URL
3. The LLM includes the URL in markdown: `![description](URL)`
4. `BasePlatformAdapter.extract_images()` finds `![alt](url)` patterns
5. Routes through `send_image()` (URL) or `send_animation()` (GIF)
6. Platform downloads and sends natively

### Key difference from TTS: Images are URL-based, not local files. The gateway downloads at send time.

---

# INTEGRATION MAP: Where send_file Hooks In

## Architecture Decision: MEDIA: Tag Protocol vs. New Tool

The MEDIA: tag protocol is already the established pattern for file delivery. Two options:

### Option A: Pure MEDIA: Tag (Minimal Change)
- No new tool needed
- Agent downloads file from sandbox to host using terminal (base64)
- Saves to known location (e.g., `~/.hermes/file_cache/`)
- Includes `MEDIA:/path` in response text
- Existing routing in `_process_message_background()` handles delivery
- **Problem:** Agent has to manually do base64 dance + know about MEDIA: convention

### Option B: Dedicated send_file Tool (Recommended)
- New tool that the agent calls with `(file_path, caption?)`
- Tool handles the sandbox → host extraction automatically
- Returns MEDIA: tag that gets routed through existing pipeline
- Much cleaner agent experience

## Implementation Plan for Option B

### Files to CREATE:

1. **`tools/send_file_tool.py`** — The new tool
   - Accepts: `file_path` (path in sandbox), `caption` (optional)
   - Detects environment backend from `_active_environments`
   - Extracts file from sandbox:
     - **local:** `shutil.copy()` or direct path
     - **docker:** `docker cp {container_id}:{path} {local_cache}/` 
     - **ssh:** `scp -o ControlPath=... {user}@{host}:{path} {local_cache}/`
     - **modal:** base64-over-terminal via `env.execute("base64 {path}")`
   - Saves to `~/.hermes/file_cache/{uuid}_{filename}`
   - Returns: `MEDIA:/cached/path` in response for gateway to pick up
   - Register with `registry.register(name="send_file", toolset="file", ...)`

### Files to MODIFY:

2. **`gateway/platforms/telegram.py`** — Add missing send methods:
   ```python
   async def send_document(self, chat_id, file_path, caption=None, file_name=None, reply_to=None):
       with open(file_path, "rb") as f:
           msg = await self._bot.send_document(
               chat_id=int(chat_id), document=f,
               caption=caption, filename=file_name or os.path.basename(file_path))
       return SendResult(success=True, message_id=str(msg.message_id))
   
   async def send_image_file(self, chat_id, image_path, caption=None, reply_to=None):
       with open(image_path, "rb") as f:
           msg = await self._bot.send_photo(chat_id=int(chat_id), photo=f, caption=caption)
       return SendResult(success=True, message_id=str(msg.message_id))
   
   async def send_video(self, chat_id, video_path, caption=None, reply_to=None):
       with open(video_path, "rb") as f:
           msg = await self._bot.send_video(chat_id=int(chat_id), video=f, caption=caption)
       return SendResult(success=True, message_id=str(msg.message_id))
   ```

3. **`gateway/platforms/discord.py`** — Add missing send methods:
   ```python
   async def send_document(self, chat_id, file_path, caption=None, file_name=None, reply_to=None):
       channel = self._client.get_channel(int(chat_id)) or await self._client.fetch_channel(int(chat_id))
       with open(file_path, "rb") as f:
           file = discord.File(io.BytesIO(f.read()), filename=file_name or os.path.basename(file_path))
           msg = await channel.send(content=caption, file=file)
       return SendResult(success=True, message_id=str(msg.id))
   
   async def send_image_file(self, chat_id, image_path, caption=None, reply_to=None):
       # Same pattern as send_document with image filename
   
   async def send_video(self, chat_id, video_path, caption=None, reply_to=None):
       # Same pattern, discord renders video attachments inline
   ```

4. **`toolsets.py`** — Add `"send_file"` to `_HERMES_CORE_TOOLS` list

5. **`agent/prompt_builder.py`** — Update platform hints to mention send_file tool

### Code that can be REUSED (zero rewrite):

- `BasePlatformAdapter.extract_media()` — Already extracts MEDIA: tags
- `BasePlatformAdapter._process_message_background()` — Already routes by extension
- `ToolContext.download_file()` — Base64-over-terminal extraction pattern
- `tools/terminal_tool.py` _active_environments dict — Environment access
- `tools/registry.py` — Tool registration infrastructure
- `gateway/platforms/base.py` send_document/send_image_file/send_video signatures — Already defined

### Code that needs to be WRITTEN from scratch:

1. `tools/send_file_tool.py` (~150 lines):
   - File extraction from each environment backend type
   - Local file cache management
   - Registry registration
   
2. Telegram `send_document` + `send_image_file` + `send_video` overrides (~40 lines)
3. Discord `send_document` + `send_image_file` + `send_video` overrides (~50 lines)

### Total effort: ~240 lines of new code, ~5 lines of config changes

## Key Environment-Specific Extract Strategies

| Backend    | Extract Method                 | Speed    | Complexity |
|------------|-------------------------------|----------|------------|
| local      | shutil.copy / direct path     | Instant  | None       |
| docker     | `docker cp container:path .`  | Fast     | Low        |
| docker+vol | Direct host path access       | Instant  | None       |
| ssh        | `scp -o ControlPath=...`      | Fast     | Low        |
| modal      | base64-over-terminal          | Moderate | Medium     |
| singularity| Direct path (overlay mount)   | Fast     | Low        |

## Data Flow Summary

```
Agent calls send_file(file_path="/workspace/output.pdf", caption="Here's the report")
    │
    ▼
send_file_tool.py:
    1. Get environment from _active_environments[task_id]
    2. Detect backend type (docker/ssh/modal/local)
    3. Extract file to ~/.hermes/file_cache/{uuid}_{filename}
    4. Return: '{"success": true, "media_tag": "MEDIA:/home/user/.hermes/file_cache/abc123_output.pdf"}'
    │
    ▼
LLM includes MEDIA: tag in its response text
    │
    ▼
BasePlatformAdapter._process_message_background():
    1. extract_media(response) → finds MEDIA:/path
    2. Checks extension: .pdf → send_document()
    3. Calls platform-specific send_document(chat_id, file_path, caption)
    │
    ▼
TelegramAdapter.send_document() / DiscordAdapter.send_document():
    Opens file, sends via platform API as native document attachment
    User receives downloadable file in chat
```