EPIC: Prove Hermes upstream 74-commit release — deliverables for every new capability #1047

Open
opened 2026-04-23 12:11:19 +00:00 by Timmy · 0 comments
Owner

Source window

  • Upstream repo: NousResearch/hermes-agent
  • Release window under proof: 95d11dfd -> 722331a5
  • Total commits: 74
  • Focus: all new feature commits plus the high-impact fixes that change reliability, safety, operator UX, or deployment behavior

Why this epic exists

We just pulled a new Hermes upstream version into the Timmy Foundation fork. This epic is the proof agenda for the release: not just “it merged,” but evidence that the new capabilities actually work in our real environment.

The goal is concrete proof artifacts Alexander can inspect: transcripts, screenshots, demo logs, benchmark notes, issue comments, and passing test output.

Definition of done

  • Every track below has a PASS / FAIL verdict.
  • Every track produces a concrete deliverable.
  • Any failures become follow-up issues with exact breakpoints and reproduction steps.
  • Final review packet summarizes what is production-ready, what is partially working, and what should be ignored.

Track 1 — Discord media + command UX proof

Source commits

  • 41e2d61b feat(discord): add native send_animation for inline GIF playback
  • 4bcb2f2d feat(send_message): add native media attachment support for Discord
  • 10494b42 feat(discord): register skills under /skill command group with category subcommands
  • cfa24532 fix(discord): register native /restart slash command
  • 47e6ea84 fix: file handle bug, warning text, and tests for Discord media send

Deliverable

A proof packet with:

  1. one Discord transcript showing native GIF playback,
  2. one native file attachment send,
  3. screenshot or transcript of /skill grouped commands and /restart registration,
  4. short note confirming no file-handle regression.

Acceptance criteria

  • GIF plays inline in Discord rather than as a broken/plain upload.
  • Native media attachment succeeds through send_message.
  • /skill command group and /restart are visible and callable.
  • No file-handle or warning-text regression appears in logs.

Track 2 — Gateway resilience + restart proof

Source commits

  • 45595f48 feat(dashboard): add HTTP health probe for cross-container gateway detection
  • 82f364ff feat: add --all flag to gateway start and restart commands
  • e7475b15 feat: auto-continue interrupted agent work after gateway restart
  • 397386ca fix: gateway auto-recovers from unexpected SIGTERM via systemd
  • 2a980980 fix: hermes gateway restart waits for service to come back up
  • 6c893064 fix: break stuck session resume loops after repeated restarts
  • fa8c448f fix: notify active sessions on gateway shutdown + update health check
  • 673acf22 fix: override stale stopped state when health probe confirms gateway alive
  • 6ed682f1 fix: normalise GATEWAY_HEALTH_URL to base URL before probing

Deliverable

A restart / recovery transcript showing:

  1. health probe before restart,
  2. gateway restart --all,
  3. service returning healthy,
  4. interrupted session resuming or cleanly recovering,
  5. no stuck resume loop.

Acceptance criteria

  • Health endpoint reports correct live status before and after restart.
  • gateway restart --all works end-to-end.
  • At least one interrupted session auto-continues or cleanly resumes.
  • No repeated stuck-session loop is observed after multiple restart cycles.
  • Shutdown notification reaches active sessions.

Track 3 — Compressor + context stability proof

Source commits

  • 9855190f feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
  • c5688e7c fix(gateway): break compression-exhaustion infinite loop and auto-reset session
  • 92385679 fix: reset retry counters after compression and stop poisoning conversation history
  • a8b7db35 fix: interrupt agent immediately when user messages during active run
  • 93fe4ead fix: warn on invalid context_length format in config.yaml
  • 772cfb6c / 50c35dca fix: stale agent timeout, uv venv detection, empty response after tools, compression fallback

Deliverable

A before/after validation note with logs showing:

  1. a long session that triggers compression,
  2. compressor collapse behavior,
  3. interruption during active run,
  4. no infinite compression loop,
  5. final verdict on operator experience.

Acceptance criteria

  • Compression reduces context without obvious response derailment.
  • No anti-thrashing regression (no rapid compress-reset-compress loop).
  • User interrupt during active run is honored immediately.
  • Compression exhaustion leads to reset/recovery instead of endless looping.
  • Invalid context_length config produces a clear warning.

Track 4 — API server / Responses API proof

Source commits

  • d6c09ab9 feat(api-server): stream /v1/responses SSE tool events
  • 302554b1 fix(api-server): format responses tool outputs for Open WebUI
  • cf1d7188 fix: keep batch-path function_call_output.output as string per OpenAI spec
  • 5cbb45d9 fix: preserve session_id across previous_response_id chains in /v1/responses
  • a4e1842f fix: strip reasoning item IDs from Responses API input when store=False

Deliverable

A protocol proof artifact with:

  1. one SSE capture,
  2. one Open WebUI-compatible tool-output example,
  3. one previous_response_id continuation proving session continuity,
  4. one note on spec compliance.

Acceptance criteria

  • SSE tool events stream correctly from /v1/responses.
  • Tool outputs are accepted by Open WebUI / OpenAI-compatible clients.
  • previous_response_id preserves session continuity.
  • store=False requests do not leak stale reasoning item IDs.
  • Batch-path tool output remains valid string-form output.

Track 5 — CLI / setup / operator ergonomics proof

Source commits

  • 9932366f feat(doctor): add Command Installation check for hermes bin symlink
  • a9c78d0e feat(setup): add recommendation badges to tool provider selection
  • df7be3d8 fix(cli): /model picker shows curated models instead of full catalog
  • 029938fb fix(cli): defensive subparser routing for argparse bpo-9338
  • 847d7cbe fix: improve CLI text padding, word-wrap for responses and verbose tool output
  • 1e5e1e82 fix: ESC cancels secret/sudo prompts, clearer skip messaging
  • 722331a5 fix: replace hardcoded ~/.hermes with display_hermes_home() in agent-facing text

Deliverable

A CLI UX proof packet with screenshots/transcripts for:

  1. doctor command showing installation check,
  2. setup provider recommendations,
  3. curated /model picker,
  4. ESC-cancel flow for a secret/sudo prompt,
  5. home-path display fix.

Acceptance criteria

  • Doctor surfaces command-installation status correctly.
  • Setup UI displays provider recommendations.
  • /model picker is curated and readable.
  • ESC cleanly cancels secret/sudo prompts.
  • Agent-facing text shows profile-aware Hermes home, not hardcoded ~/.hermes.

Track 6 — Runtime / container / browser proof

Source commits

  • 8548893d feat: entry-level Podman support — find_docker() + rootless entrypoint
  • 7b2700c9 fix(browser): use 127.0.0.1 instead of localhost for CDP default
  • 56c34ac4 fix(browser): add termux PATH fallbacks
  • 677f1227 fix: remove @staticmethod from _context_completions crash path

Deliverable

A runtime compatibility note with:

  1. Podman smoke test,
  2. browser/CDP smoke test using 127.0.0.1,
  3. termux/browser-path validation note (real or simulated where necessary),
  4. crash-path regression check for @ context completions.

Acceptance criteria

  • Rootless Podman path is discoverable and usable for entry-level workflows.
  • Browser CDP default works via 127.0.0.1 without localhost-related failures.
  • Termux/browser PATH fallback behavior is documented with evidence.
  • @ context completions no longer hit the removed staticmethod crash.

Track 7 — Messaging + send_message target correctness proof

Source commits

  • 03446e06 fix(send_message): accept Matrix room IDs and user MXIDs as explicit targets
  • e69526be fix(send_message): URL-encode Matrix room IDs and add Matrix schema examples
  • 33ae4038 fix(gateway): fix Matrix lingering typing indicator
  • 2546b7ac fix(gateway): suppress duplicate replies on interrupt and streaming flood control
  • da8bab77 fix(cli): restore messaging toolset for gateway platforms
  • 1c4d3216 fix(cron): include job_id in delivery and guide models on removal workflow

Deliverable

A messaging correctness packet with:

  1. Matrix explicit-target send proof,
  2. lingering-typing regression check,
  3. duplicate-reply suppression check,
  4. cron delivery example showing job_id.

Acceptance criteria

  • Matrix room IDs and MXIDs work as explicit targets.
  • Matrix room IDs are encoded correctly.
  • Typing indicator clears normally.
  • Interrupt/flood-control path does not duplicate replies.
  • Cron deliveries include job_id in actionable form.

Track 8 — Safety / guardrail / operator-protection proof

Source commits

  • 15256249 fix: block agent from self-destructing gateway via terminal
  • da528a82 fix: detect and strip non-ASCII characters from API keys
  • ca0ae56c fix: add 402 billing error hint to gateway error handler
  • 4b2a1a43 fix(tools): auto-discover built-in tool modules
  • cda64a59 / c10fea8d fix(mcp): resolve toolsets from live registry, make aliases explicit

Deliverable

A safety regression note with:

  1. blocked self-destruct attempt transcript,
  2. malformed API key sanitation check,
  3. billing error UX proof,
  4. built-in tool auto-discovery proof,
  5. MCP alias/toolset resolution proof.

Acceptance criteria

  • Terminal guard blocks gateway self-destruction path.
  • Non-ASCII API-key corruption is detected and stripped.
  • 402 billing errors yield actionable operator guidance.
  • Built-in tool modules auto-discover correctly after startup.
  • MCP aliases/toolsets resolve from the live registry without stale mapping drift.

Track 9 — New capability spot checks

Source commits

  • b24e5ee4 feat(google-workspace): add --from flag for custom sender display name
  • 6448e1da feat(zai): add GLM-5V-Turbo support for coding plan
  • 55ce76b3 feat: add architecture-diagram skill (Cocoon AI port)
  • 3b508215 feat(xai): add xAI/Grok to provider prefix stripping

Deliverable

A lightweight capability bundle with one proof each:

  1. Google Workspace custom sender demo,
  2. GLM-5V-Turbo routing/config proof,
  3. architecture-diagram skill output artifact,
  4. xAI/Grok model normalization check.

Acceptance criteria

  • Google Workspace --from changes sender display name as intended.
  • GLM-5V-Turbo is visible/selectable where expected.
  • Architecture-diagram skill produces a usable artifact.
  • xAI/Grok provider prefix stripping behaves correctly in model handling.

Final release deliverable

Produce a single release-proof packet comment on this epic containing:

  • PASS / FAIL for each track
  • links to every artifact
  • what is immediately deployable in Timmy Foundation
  • what needs follow-up issues
  • what should be ignored as low-value or not relevant to our stack

Notes

  • This epic is about proof, not blind celebration.
  • Prefer real-world evidence from our stack over synthetic/unit-only validation.
  • If a feature is upstream-valid but irrelevant to our stack, say so explicitly and mark it N/A with justification.
## Source window - Upstream repo: `NousResearch/hermes-agent` - Release window under proof: `95d11dfd -> 722331a5` - Total commits: 74 - Focus: all new feature commits plus the high-impact fixes that change reliability, safety, operator UX, or deployment behavior ## Why this epic exists We just pulled a new Hermes upstream version into the Timmy Foundation fork. This epic is the proof agenda for the release: not just “it merged,” but evidence that the new capabilities actually work in our real environment. The goal is concrete proof artifacts Alexander can inspect: transcripts, screenshots, demo logs, benchmark notes, issue comments, and passing test output. ## Definition of done - [ ] Every track below has a PASS / FAIL verdict. - [ ] Every track produces a concrete deliverable. - [ ] Any failures become follow-up issues with exact breakpoints and reproduction steps. - [ ] Final review packet summarizes what is production-ready, what is partially working, and what should be ignored. --- ## Track 1 — Discord media + command UX proof ### Source commits - `41e2d61b` feat(discord): add native send_animation for inline GIF playback - `4bcb2f2d` feat(send_message): add native media attachment support for Discord - `10494b42` feat(discord): register skills under /skill command group with category subcommands - `cfa24532` fix(discord): register native /restart slash command - `47e6ea84` fix: file handle bug, warning text, and tests for Discord media send ### Deliverable A proof packet with: 1. one Discord transcript showing native GIF playback, 2. one native file attachment send, 3. screenshot or transcript of `/skill` grouped commands and `/restart` registration, 4. short note confirming no file-handle regression. ### Acceptance criteria - [ ] GIF plays inline in Discord rather than as a broken/plain upload. - [ ] Native media attachment succeeds through `send_message`. - [ ] `/skill` command group and `/restart` are visible and callable. - [ ] No file-handle or warning-text regression appears in logs. --- ## Track 2 — Gateway resilience + restart proof ### Source commits - `45595f48` feat(dashboard): add HTTP health probe for cross-container gateway detection - `82f364ff` feat: add `--all` flag to gateway start and restart commands - `e7475b15` feat: auto-continue interrupted agent work after gateway restart - `397386ca` fix: gateway auto-recovers from unexpected SIGTERM via systemd - `2a980980` fix: hermes gateway restart waits for service to come back up - `6c893064` fix: break stuck session resume loops after repeated restarts - `fa8c448f` fix: notify active sessions on gateway shutdown + update health check - `673acf22` fix: override stale `stopped` state when health probe confirms gateway alive - `6ed682f1` fix: normalise `GATEWAY_HEALTH_URL` to base URL before probing ### Deliverable A restart / recovery transcript showing: 1. health probe before restart, 2. `gateway restart --all`, 3. service returning healthy, 4. interrupted session resuming or cleanly recovering, 5. no stuck resume loop. ### Acceptance criteria - [ ] Health endpoint reports correct live status before and after restart. - [ ] `gateway restart --all` works end-to-end. - [ ] At least one interrupted session auto-continues or cleanly resumes. - [ ] No repeated stuck-session loop is observed after multiple restart cycles. - [ ] Shutdown notification reaches active sessions. --- ## Track 3 — Compressor + context stability proof ### Source commits - `9855190f` feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening - `c5688e7c` fix(gateway): break compression-exhaustion infinite loop and auto-reset session - `92385679` fix: reset retry counters after compression and stop poisoning conversation history - `a8b7db35` fix: interrupt agent immediately when user messages during active run - `93fe4ead` fix: warn on invalid context_length format in config.yaml - `772cfb6c` / `50c35dca` fix: stale agent timeout, uv venv detection, empty response after tools, compression fallback ### Deliverable A before/after validation note with logs showing: 1. a long session that triggers compression, 2. compressor collapse behavior, 3. interruption during active run, 4. no infinite compression loop, 5. final verdict on operator experience. ### Acceptance criteria - [ ] Compression reduces context without obvious response derailment. - [ ] No anti-thrashing regression (no rapid compress-reset-compress loop). - [ ] User interrupt during active run is honored immediately. - [ ] Compression exhaustion leads to reset/recovery instead of endless looping. - [ ] Invalid context_length config produces a clear warning. --- ## Track 4 — API server / Responses API proof ### Source commits - `d6c09ab9` feat(api-server): stream `/v1/responses` SSE tool events - `302554b1` fix(api-server): format responses tool outputs for Open WebUI - `cf1d7188` fix: keep batch-path `function_call_output.output` as string per OpenAI spec - `5cbb45d9` fix: preserve `session_id` across `previous_response_id` chains in `/v1/responses` - `a4e1842f` fix: strip reasoning item IDs from Responses API input when `store=False` ### Deliverable A protocol proof artifact with: 1. one SSE capture, 2. one Open WebUI-compatible tool-output example, 3. one `previous_response_id` continuation proving session continuity, 4. one note on spec compliance. ### Acceptance criteria - [ ] SSE tool events stream correctly from `/v1/responses`. - [ ] Tool outputs are accepted by Open WebUI / OpenAI-compatible clients. - [ ] `previous_response_id` preserves session continuity. - [ ] `store=False` requests do not leak stale reasoning item IDs. - [ ] Batch-path tool output remains valid string-form output. --- ## Track 5 — CLI / setup / operator ergonomics proof ### Source commits - `9932366f` feat(doctor): add Command Installation check for hermes bin symlink - `a9c78d0e` feat(setup): add recommendation badges to tool provider selection - `df7be3d8` fix(cli): `/model` picker shows curated models instead of full catalog - `029938fb` fix(cli): defensive subparser routing for argparse bpo-9338 - `847d7cbe` fix: improve CLI text padding, word-wrap for responses and verbose tool output - `1e5e1e82` fix: ESC cancels secret/sudo prompts, clearer skip messaging - `722331a5` fix: replace hardcoded `~/.hermes` with `display_hermes_home()` in agent-facing text ### Deliverable A CLI UX proof packet with screenshots/transcripts for: 1. doctor command showing installation check, 2. setup provider recommendations, 3. curated `/model` picker, 4. ESC-cancel flow for a secret/sudo prompt, 5. home-path display fix. ### Acceptance criteria - [ ] Doctor surfaces command-installation status correctly. - [ ] Setup UI displays provider recommendations. - [ ] `/model` picker is curated and readable. - [ ] ESC cleanly cancels secret/sudo prompts. - [ ] Agent-facing text shows profile-aware Hermes home, not hardcoded `~/.hermes`. --- ## Track 6 — Runtime / container / browser proof ### Source commits - `8548893d` feat: entry-level Podman support — `find_docker()` + rootless entrypoint - `7b2700c9` fix(browser): use `127.0.0.1` instead of `localhost` for CDP default - `56c34ac4` fix(browser): add termux PATH fallbacks - `677f1227` fix: remove `@staticmethod` from `_context_completions` crash path ### Deliverable A runtime compatibility note with: 1. Podman smoke test, 2. browser/CDP smoke test using 127.0.0.1, 3. termux/browser-path validation note (real or simulated where necessary), 4. crash-path regression check for `@` context completions. ### Acceptance criteria - [ ] Rootless Podman path is discoverable and usable for entry-level workflows. - [ ] Browser CDP default works via `127.0.0.1` without localhost-related failures. - [ ] Termux/browser PATH fallback behavior is documented with evidence. - [ ] `@` context completions no longer hit the removed staticmethod crash. --- ## Track 7 — Messaging + send_message target correctness proof ### Source commits - `03446e06` fix(send_message): accept Matrix room IDs and user MXIDs as explicit targets - `e69526be` fix(send_message): URL-encode Matrix room IDs and add Matrix schema examples - `33ae4038` fix(gateway): fix Matrix lingering typing indicator - `2546b7ac` fix(gateway): suppress duplicate replies on interrupt and streaming flood control - `da8bab77` fix(cli): restore messaging toolset for gateway platforms - `1c4d3216` fix(cron): include job_id in delivery and guide models on removal workflow ### Deliverable A messaging correctness packet with: 1. Matrix explicit-target send proof, 2. lingering-typing regression check, 3. duplicate-reply suppression check, 4. cron delivery example showing job_id. ### Acceptance criteria - [ ] Matrix room IDs and MXIDs work as explicit targets. - [ ] Matrix room IDs are encoded correctly. - [ ] Typing indicator clears normally. - [ ] Interrupt/flood-control path does not duplicate replies. - [ ] Cron deliveries include job_id in actionable form. --- ## Track 8 — Safety / guardrail / operator-protection proof ### Source commits - `15256249` fix: block agent from self-destructing gateway via terminal - `da528a82` fix: detect and strip non-ASCII characters from API keys - `ca0ae56c` fix: add 402 billing error hint to gateway error handler - `4b2a1a43` fix(tools): auto-discover built-in tool modules - `cda64a59` / `c10fea8d` fix(mcp): resolve toolsets from live registry, make aliases explicit ### Deliverable A safety regression note with: 1. blocked self-destruct attempt transcript, 2. malformed API key sanitation check, 3. billing error UX proof, 4. built-in tool auto-discovery proof, 5. MCP alias/toolset resolution proof. ### Acceptance criteria - [ ] Terminal guard blocks gateway self-destruction path. - [ ] Non-ASCII API-key corruption is detected and stripped. - [ ] 402 billing errors yield actionable operator guidance. - [ ] Built-in tool modules auto-discover correctly after startup. - [ ] MCP aliases/toolsets resolve from the live registry without stale mapping drift. --- ## Track 9 — New capability spot checks ### Source commits - `b24e5ee4` feat(google-workspace): add `--from` flag for custom sender display name - `6448e1da` feat(zai): add GLM-5V-Turbo support for coding plan - `55ce76b3` feat: add architecture-diagram skill (Cocoon AI port) - `3b508215` feat(xai): add xAI/Grok to provider prefix stripping ### Deliverable A lightweight capability bundle with one proof each: 1. Google Workspace custom sender demo, 2. GLM-5V-Turbo routing/config proof, 3. architecture-diagram skill output artifact, 4. xAI/Grok model normalization check. ### Acceptance criteria - [ ] Google Workspace `--from` changes sender display name as intended. - [ ] GLM-5V-Turbo is visible/selectable where expected. - [ ] Architecture-diagram skill produces a usable artifact. - [ ] xAI/Grok provider prefix stripping behaves correctly in model handling. --- ## Final release deliverable Produce a single release-proof packet comment on this epic containing: - PASS / FAIL for each track - links to every artifact - what is immediately deployable in Timmy Foundation - what needs follow-up issues - what should be ignored as low-value or not relevant to our stack ## Notes - This epic is about proof, not blind celebration. - Prefer real-world evidence from our stack over synthetic/unit-only validation. - If a feature is upstream-valid but irrelevant to our stack, say so explicitly and mark it N/A with justification.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/hermes-agent#1047