Compare commits

..

3 Commits

Author SHA1 Message Date
Alexander Whitestone
8d9e7cbf7e docs: record hermes-agent test finding (#668)
Some checks are pending
Smoke Test / smoke (pull_request) Waiting to run
2026-04-15 00:26:33 -04:00
Alexander Whitestone
85bc612100 test: validate hermes-agent genome (#668) 2026-04-15 00:24:01 -04:00
Alexander Whitestone
9e120888c0 docs: add hermes-agent genome draft (#668) 2026-04-15 00:21:39 -04:00
4 changed files with 569 additions and 454 deletions

485
GENOME.md Normal file
View File

@@ -0,0 +1,485 @@
# GENOME.md — hermes-agent
Repository-wide facts in this document come from two grounded passes over `/Users/apayne/hermes-agent` on 2026-04-15:
- `python3 ~/.hermes/pipelines/codebase-genome.py --path /Users/apayne/hermes-agent --dry-run`
- targeted manual inspection of the core runtime, tooling, gateway, ACP, cron, and persistence modules
This is the Timmy Foundation fork of `hermes-agent`, not a generic upstream summary.
## Project Overview
`hermes-agent` is a multi-surface AI agent runtime, not just a terminal chatbot.
It combines:
- a rich interactive CLI/TUI
- a synchronous core agent loop
- a large tool registry with terminal, file, web, browser, MCP, memory, cron, delegation, and code-execution tools
- a multi-platform messaging gateway
- ACP editor integration
- an OpenAI-compatible API server
- cron scheduling
- persistent session/memory/state stores
- batch and RL-adjacent research surfaces
The product promise in `README.md` is that Hermes is a self-improving agent:
- it creates and updates skills
- persists memory across sessions
- searches past conversations
- delegates to subagents
- runs scheduled automations
- can operate through multiple runtime backends and communication surfaces
Grounded quick facts from the analyzed checkout:
- pipeline scan: 395 source files, 561 test files, 11 config files, 331,794 total lines
- Python-only pass: 307 non-test `.py` modules and 561 test Python files
- Python LOC split: 211,709 source LOC / 184,512 test LOC
- current branch: `main`
- current commit: `95d11dfd`
- last commit seen by pipeline: `95d11dfd docs: automation templates gallery + comparison post (#9821)`
- total commits reported by pipeline: 4140
- largest Python modules observed:
- `run_agent.py` — 10,871 LOC
- `cli.py` — 10,017 LOC
- `gateway/run.py` — 9,289 LOC
- `hermes_cli/main.py` — 6,056 LOC
That size profile matters. Hermes is architecturally broad, but a few very large orchestration files still dominate the control plane.
## Architecture Diagram
```mermaid
flowchart TD
A[CLI / Gateway / ACP / API / Cron / Batch] --> B[AIAgent in run_agent.py]
B --> C[agent/prompt_builder.py]
B --> D[agent/memory_manager.py]
B --> E[agent/context_compressor.py]
B --> F[model_tools.py]
F --> G[tools/registry.py]
G --> H[tools/*.py built-in tools]
G --> I[tools/mcp_tool.py imported MCP tools]
G --> J[delegate / execute_code / cron / browser / terminal / file tools]
B --> K[hermes_state.py SQLite SessionDB]
B --> L[toolsets.py toolset selection]
M[cli.py + hermes_cli/main.py] --> B
N[gateway/run.py] --> B
O[acp_adapter/server.py] --> B
P[gateway/platforms/api_server.py] --> B
Q[cron/scheduler.py + cron/jobs.py] --> B
R[batch_runner.py] --> B
N --> S[gateway/session.py]
N --> T[gateway/platforms/* adapters]
P --> U[Responses API store]
O --> V[ACP session/event server]
Q --> W[cron job persistence + delivery]
K --> X[state.db / FTS5 search]
S --> Y[sessions.json mapping]
J --> Z[local shell, files, web, browser, subprocesses, remote MCP servers]
```
## Entry Points and Data Flow
### Primary entry points
1. `hermes``hermes_cli.main:main`
- canonical CLI entry point
- preloads profile context and builds the argparse/subcommand shell
- hands interactive chat to `cli.py`
2. `hermes-agent``run_agent:main`
- direct runner around the core agent loop
- closest entry point to the raw agent runtime
3. `hermes-acp``acp_adapter.entry:main`
- ACP server for VS Code / Zed / JetBrains style integrations
4. `gateway/run.py`
- async orchestration loop for Telegram, Discord, Slack, WhatsApp, Signal, Matrix, webhook, email, SMS, and other adapters
5. `gateway/platforms/api_server.py`
- OpenAI-compatible HTTP surface
- exposes `/v1/chat/completions`, `/v1/responses`, `/v1/models`, `/v1/runs`, and `/health`
6. `cron/scheduler.py` + `cron/jobs.py`
- scheduled job execution and delivery
7. `batch_runner.py`
- parallel batch trajectory and research workloads
### Core data flow
1. An entry surface receives input:
- terminal prompt
- incoming platform message
- ACP editor request
- HTTP request
- scheduled cron job
- batch input
2. The surface resolves runtime state:
- profile/config
- platform identity
- model/provider settings
- toolset selection
- current session ID and conversation history
3. `run_agent.py` assembles the effective prompt:
- persona/system directives
- platform hints
- context files (`AGENTS.md`, `SOUL.md`, repo-local context)
- skill content
- memory blocks from `agent/memory_manager.py`
- compression summaries from `agent/context_compressor.py`
4. `model_tools.py` discovers and filters tools:
- imports tool modules so they self-register into `tools/registry.py`
- resolves enabled toolsets from `toolsets.py`
- returns tool schemas to the active model provider
5. The model responds with either:
- final assistant text
- tool calls
6. Tool calls are dispatched through:
- `model_tools.py`
- `tools/registry.py`
- the concrete tool handler
7. Tool outputs are appended back into the conversation and the loop continues until a final answer is produced.
8. State is persisted through:
- `hermes_state.py` for sessions/messages/search
- `gateway/session.py` for gateway session routing state
- dedicated stores for response APIs, background processes, and cron jobs
This is a layered architecture: many user-facing surfaces, one central agent runtime, one central tool registry, and several specialized persistence layers.
## Key Abstractions
### 1. `AIAgent` (`run_agent.py`)
This is the heart of Hermes.
It owns:
- provider/model invocation
- tool-loop orchestration
- prompt assembly
- memory integration
- compression and token budgeting
- final response construction
### 2. `IterationBudget` (`run_agent.py`)
A guardrail abstraction around how much work a turn may do.
It matters because Hermes is not just text generation — it may launch tools, spawn subagents, or recurse through internal workflows.
### 3. `ToolRegistry` / tool self-registration (`tools/registry.py`)
Every major tool advertises itself into a central registry.
That gives Hermes one place to manage:
- schemas
- handlers
- availability checks
- environment requirements
- dispatch behavior
This is a defining architectural trait of the codebase.
### 4. Toolsets (`toolsets.py`)
Tool exposure is not hardcoded per surface.
Instead, Hermes uses named toolsets and platform-specific aliases such as CLI, gateway, ACP, and API-server presets.
This is how one agent runtime can safely shape different operating surfaces.
### 5. `MemoryManager` (`agent/memory_manager.py`)
Hermes supports both built-in memory and external memory providers.
The abstraction here is not “a markdown note” but a memory multiplexor that decides what memory context gets injected and how memory tools behave.
### 6. `ContextCompressor` (`agent/context_compressor.py`)
Compression is a first-class subsystem.
Hermes treats long-context management as part of the runtime architecture, not an afterthought.
### 7. `SessionDB` (`hermes_state.py`)
SQLite + FTS5 session persistence is core infrastructure.
This is what makes cross-session recall, search, billing/accounting, and agent continuity practical.
### 8. `SessionStore` / `SessionContext` (`gateway/session.py`)
The gateway needs a routing abstraction different from raw message history.
It tracks home channels, session keys, reset policy, and platform-specific mapping.
### 9. `HermesACPAgent` (`acp_adapter/server.py`)
ACP is not bolted on as a thin shim.
It wraps Hermes as an editor-native agent with its own session/event lifecycle.
### 10. `ProcessRegistry` (`tools/process_registry.py`)
Long-running background commands are first-class managed resources.
Hermes tracks them explicitly rather than treating subprocesses as disposable side effects.
## API Surface
### CLI and shell API
Important surfaces exposed by packaging and command routing:
- `hermes`
- `hermes-agent`
- `hermes-acp`
- subcommands in `hermes_cli/main.py`
- slash commands defined centrally in `hermes_cli/commands.py`
The slash-command registry is a notable design choice because the same command metadata feeds:
- CLI help
- gateway help
- Telegram bot command menus
- Slack subcommand routing
- autocomplete
### HTTP API surface
From `gateway/platforms/api_server.py`, the major routes are:
- `POST /v1/chat/completions`
- `POST /v1/responses`
- `GET /v1/responses/{response_id}`
- `DELETE /v1/responses/{response_id}`
- `GET /v1/models`
- `POST /v1/runs`
- `GET /v1/runs/{run_id}/events`
- `GET /health`
This makes Hermes usable as an OpenAI-compatible backend for external clients and web UIs.
### Messaging platform API surface
The gateway platform abstraction exposes Hermes across many adapters under `gateway/platforms/`.
Observed adapters include:
- Telegram
- Discord
- Slack
- WhatsApp
- Signal
- Matrix
- Home Assistant
- webhook
- email
- SMS
- Mattermost
- QQBot
- WeCom / Weixin
- DingTalk
- BlueBubbles
### Tool API surface
The tool surface is broad and central to the product:
- terminal execution
- process management
- file IO / search / patch
- browser automation
- web search/extract
- cron jobs
- memory and session search
- subagent delegation
- execute_code sandbox
- MCP tool import
- TTS / vision / image generation
- smart-home integrations
### MCP / ACP surface
Hermes participates on both sides:
- as an MCP client via `tools/mcp_tool.py`
- as an MCP server for messaging/session capabilities via `mcp_serve.py`
- as an ACP server via `acp_adapter/*`
That makes Hermes an orchestration hub, not just a single runtime process.
## Test Coverage Gaps
### Current observed test posture
A live collection pass on the analyzed checkout produced:
- 11,470 tests collected
- 50 deselected
- 6 collection errors
The collection errors are all ACP-related:
- `tests/acp/test_entry.py`
- `tests/acp/test_events.py`
- `tests/acp/test_mcp_e2e.py`
- `tests/acp/test_permissions.py`
- `tests/acp/test_server.py`
- `tests/acp/test_tools.py`
Root cause from the live run:
- `ModuleNotFoundError: No module named 'acp'`
- equivalently: `ModuleNotFoundError: No module named `acp`` in the failing ACP collection lane
- this lines up with `pyproject.toml`, where ACP support is optional and gated behind the `acp` extra (`agent-client-protocol>=0.9.0,<1.0`)
A secondary signal from collection:
- `tests/tools/test_file_sync_perf.py` emits `PytestUnknownMarkWarning: Unknown pytest.mark.ssh`
This specific collection problem is now tracked in hermes-agent issue `#779`.
### Where coverage looks strong
By file distribution, the codebase is heavily tested around:
- `gateway/`
- `tools/`
- `hermes_cli/`
- `run_agent`
- `cli`
- `agent`
That matches the product center of gravity: runtime orchestration, tool dispatch, and communication surfaces.
### Highest-value remaining gaps
The biggest gaps are not in total test count. They are in critical-path complexity.
1. `run_agent.py`
- the most important file in the repo and also the largest
- likely has broad behavior coverage, but branch-level completeness is improbable at 10k+ LOC
2. `cli.py`
- extremely large UI/orchestration surface
- high risk of hidden regressions across streaming, voice, slash-command routing, and interaction state
3. `gateway/run.py`
- core async gateway brain
- many platform-specific edge cases converge here
4. `hermes_cli/main.py`
- main command shell is huge and mixes parsing, routing, setup, and environment behavior
5. ACP end-to-end coverage under optional dependency installation
- current collection failure proves this lane is environment-sensitive
- ACP deserves a reliable extras-aware CI lane so collection failures are surfaced intentionally, not accidentally
6. `batch_runner.py` and `trajectory_compressor.py`
- research/training surfaces appear lighter and deserve more explicit contract tests
7. cron lifecycle and delivery failure behavior
- `cron/scheduler.py` and `cron/jobs.py` are safety-critical for unattended automation
8. optional or integration-heavy backends
- platform adapters like Feishu / Discord / Telegram
- container/cloud terminal environments
- MCP server interop
- API server streaming edge cases
### Missing tests for critical paths
The next high-leverage test work should target:
- ACP extras-enabled collection and smoke execution
- `run_agent.py` happy-path + interruption + compression + delegate + approval interaction boundaries
- `gateway/run.py` cache/interrupt/restart/session-boundary behavior at integration level
- `cron/scheduler.py` delivery error recovery, stale-job cleanup, and due-job fairness
- `batch_runner.py` and `trajectory_compressor.py` contract tests
- API-server Responses lifecycle and streaming segmentation behavior
## Security Considerations
Hermes is security-sensitive because it can run commands, read files, talk to platforms, call browsers, and broker MCP tools.
The codebase already contains several strong defensive layers.
### 1. Prompt-injection defense for context files
`agent/prompt_builder.py` scans context files such as `AGENTS.md`, `SOUL.md`, and similar instructions for:
- prompt-override language
- hidden comment/HTML tricks
- invisible unicode
- secret exfiltration patterns
That is an important architectural guardrail because Hermes explicitly ingests repository-local instruction files.
### 2. Dangerous-command approval system
`tools/approval.py` centralizes detection of destructive commands and risky shell behavior.
The repo treats command approval as a core policy subsystem, not a UI nicety.
### 3. File-path and device protections
`tools/file_tools.py` blocks dangerous device paths and sensitive system writes.
It also redacts sensitive content in read/search results and blocks reads from internal Hermes-sensitive locations.
### 4. Terminal/workdir sanitization
`tools/terminal_tool.py` constrains workdir handling and shell execution boundaries.
This matters because terminal access is one of the highest-risk capabilities Hermes exposes.
### 5. MCP subprocess hygiene
`tools/mcp_tool.py` filters environment variables passed to MCP servers and strips credentials from surfaced errors.
Given that MCP introduces third-party subprocesses into the tool graph, this is a critical boundary.
### 6. Gateway privacy and pairing controls
Gateway code includes pairing, session routing, and ID-redaction logic.
That is important because Hermes operates across public and semi-public communication surfaces.
### 7. HTTP/API hardening
`gateway/platforms/api_server.py` includes auth, CORS handling, and response-store boundaries.
This makes the API server a real production surface, not just a convenience wrapper.
### 8. Supply-chain awareness
`pyproject.toml` pins many dependencies to constrained ranges and includes security notes for selected packages.
That indicates explicit supply-chain thinking in dependency management.
## Performance Characteristics
### 1. prompt caching is a first-class optimization
Hermes preserves long-lived agent instances and supports provider-specific prompt caching for compatible providers.
That is essential because repeated system prompts and tool schemas are expensive.
### 2. context compression is built into the runtime
Compression is not a manual rescue path only.
Hermes estimates token budgets, prunes old tool noise, and can summarize prior context when needed.
### 3. parallel tool execution exists, but selectively
The runtime can batch safe tool calls in parallel rather than serializing every read-only action.
This improves latency without giving up all control over side effects.
### 4. Async loop reuse reduces orchestration overhead
The runtime avoids constantly recreating event loops for async tools, which matters when many tool calls are issued inside otherwise synchronous agent flows.
### 5. SQLite is tuned for agent workloads
`hermes_state.py` uses WAL mode, short lock windows, and retry logic instead of pretending SQLite is magically contention-free.
This is a sensible tradeoff for sovereign local persistence.
### 6. Background processes are explicitly managed
`ProcessRegistry` maintains output windows, state, and watcher behavior so long-running commands do not become invisible resource leaks.
### 7. Large control-plane files are a real performance and maintenance cost
The repo has broad feature coverage, but a few huge orchestration files dominate complexity:
- `run_agent.py`
- `cli.py`
- `gateway/run.py`
- `hermes_cli/main.py`
These files are not just maintainability debt; they also create higher reasoning and regression load for both humans and agents working in the codebase.
## Critical Modules to Name Explicitly
The following files define the real control plane of Hermes and should always be named in any serious architecture summary:
- `run_agent.py`
- `model_tools.py`
- `tools/registry.py`
- `toolsets.py`
- `cli.py`
- `hermes_cli/main.py`
- `hermes_cli/commands.py`
- `hermes_state.py`
- `agent/prompt_builder.py`
- `agent/context_compressor.py`
- `agent/memory_manager.py`
- `tools/terminal_tool.py`
- `tools/file_tools.py`
- `tools/mcp_tool.py`
- `gateway/run.py`
- `gateway/session.py`
- `gateway/platforms/api_server.py`
- `acp_adapter/server.py`
- `cron/scheduler.py`
- `cron/jobs.py`
- `batch_runner.py`
- `trajectory_compressor.py`
## Practical Takeaway
Hermes Agent is best understood as a sovereign agent operating system.
The CLI, gateway, ACP server, API server, cron scheduler, and tool graph are all frontends onto one core runtime.
The strongest qualities of the codebase are:
- broad feature coverage
- a central tool-registry design
- serious persistence/memory infrastructure
- strong security thinking around prompts, tools, files, and approvals
- a deep test surface across gateway/tools/CLI behavior
The most important risks are:
- extremely large orchestration files
- optional-surface fragility, especially ACP extras and integration-heavy adapters
- under-tested research/batch lanes relative to the core runtime
- growing complexity at the boundaries where multiple surfaces reuse the same agent loop

View File

@@ -1,35 +0,0 @@
from pathlib import Path
def _content() -> str:
return Path("the-door-GENOME.md").read_text()
def test_the_door_genome_exists() -> None:
assert Path("the-door-GENOME.md").exists()
def test_the_door_genome_has_required_sections() -> None:
content = _content()
assert "# GENOME.md — the-door" in content
assert "## Project Overview" in content
assert "## Architecture" in content
assert "```mermaid" in content
assert "## Entry Points" in content
assert "## Data Flow" in content
assert "## Key Abstractions" in content
assert "## API Surface" in content
assert "## Test Coverage Gaps" in content
assert "## Security Considerations" in content
assert "## Dependencies" in content
assert "## Deployment" in content
assert "## Technical Debt" in content
def test_the_door_genome_captures_repo_specific_findings() -> None:
content = _content()
assert "lastUserMessage" in content
assert "localStorage" in content
assert "crisis-offline.html" in content
assert "hermes-gateway.service" in content
assert "/api/v1/chat/completions" in content

View File

@@ -0,0 +1,84 @@
from pathlib import Path
GENOME = Path('GENOME.md')
def read_genome() -> str:
assert GENOME.exists(), 'GENOME.md must exist at repo root'
return GENOME.read_text(encoding='utf-8')
def test_genome_exists():
assert GENOME.exists(), 'GENOME.md must exist at repo root'
def test_genome_has_required_sections():
text = read_genome()
for heading in [
'# GENOME.md — hermes-agent',
'## Project Overview',
'## Architecture Diagram',
'## Entry Points and Data Flow',
'## Key Abstractions',
'## API Surface',
'## Test Coverage Gaps',
'## Security Considerations',
'## Performance Characteristics',
'## Critical Modules to Name Explicitly',
]:
assert heading in text
def test_genome_contains_mermaid_diagram():
text = read_genome()
assert '```mermaid' in text
assert 'flowchart TD' in text
def test_genome_mentions_control_plane_modules():
text = read_genome()
for token in [
'run_agent.py',
'model_tools.py',
'tools/registry.py',
'toolsets.py',
'cli.py',
'hermes_cli/main.py',
'hermes_state.py',
'gateway/run.py',
'acp_adapter/server.py',
'cron/scheduler.py',
]:
assert token in text
def test_genome_mentions_test_gap_and_collection_findings():
text = read_genome()
for token in [
'11,470 tests collected',
'6 collection errors',
'ModuleNotFoundError: No module named `acp`',
'trajectory_compressor.py',
'batch_runner.py',
]:
assert token in text
def test_genome_mentions_security_and_performance_layers():
text = read_genome()
for token in [
'prompt_builder.py',
'approval.py',
'file_tools.py',
'mcp_tool.py',
'WAL mode',
'prompt caching',
'context compression',
'parallel tool execution',
]:
assert token in text
def test_genome_is_substantial():
text = read_genome()
assert len(text) >= 10000

View File

@@ -1,419 +0,0 @@
# GENOME.md — the-door
Generated: 2026-04-15 00:03:16 EDT
Repo: Timmy_Foundation/the-door
Issue: timmy-home #673
## Project Overview
The Door is a crisis-first front door to Timmy: one URL, no account wall, no app install, and a permanently visible 988 escape hatch. The repo combines a static browser UI, a local Hermes API gateway behind nginx, and a Python crisis package that duplicates and enriches the frontend's safety logic.
What the codebase actually contains today:
- 1 primary browser app: `index.html`
- 4 companion browser assets/pages: `about.html`, `testimony.html`, `crisis-offline.html`, `sw.js`
- 17 Python files across canonical crisis logic, legacy shims, wrappers, and tests
- 2 Gitea workflows: `smoke.yml`, `sanity.yml`
- 1 systemd unit: `deploy/hermes-gateway.service`
- full test suite currently passing: `115 passed, 3 subtests passed`
The repo is small, but it is not simple. The true architecture is a layered safety system:
1. immediate browser-side crisis escalation
2. OpenAI-compatible streaming chat through Hermes
3. canonical Python crisis detection and response modules
4. nginx hardening, rate limiting, and localhost-only gateway exposure
5. service-worker offline fallback for crisis resources
The strongest pattern in this codebase is safety redundancy: the UI, prompt layer, offline fallback, and backend detection all try to catch the same sacred failure mode from different directions.
## Architecture
```mermaid
graph TD
U[User in browser] --> I[index.html chat app]
I --> K[Client-side crisis detection\ncrisisKeywords + explicitPhrases]
K --> P[Inline crisis panel]
K --> O[Fullscreen crisis overlay]
I --> L[localStorage\nchat history + safety plan]
I --> SW[sw.js service worker]
SW --> OFF[crisis-offline.html]
I --> API[/POST /api/v1/chat/completions/]
API --> NGINX[nginx reverse proxy]
NGINX --> H[Hermes Gateway :8644]
NGINX --> HC[/health proxy]
H --> G[crisis/gateway.py]
G --> D[crisis/detect.py]
G --> R[crisis/response.py]
D --> CR[CrisisDetectionResult]
R --> RESP[CrisisResponse]
D --> LEG[Legacy shims\ncrisis_detector.py\ncrisis_responder.py\ndying_detection]
DEP[deploy/playbook.yml\ndeploy/deploy.sh\nhermes-gateway.service] --> NGINX
DEP --> H
CI[.gitea/workflows\nsmoke.yml + sanity.yml] --> I
CI --> D
```
## Entry Points
### Browser / user-facing entry points
- `index.html`
- the main product
- contains inline CSS, inline JS, embedded `SYSTEM_PROMPT`, chat UI, crisis panel, fullscreen overlay, and safety-plan modal
- `about.html`
- static about page
- linked from the chat footer, though the main app currently links to `/about` while the repo ships `about.html`
- `testimony.html`
- static companion content page
- `crisis-offline.html`
- offline crisis resource page served by the service worker when navigation cannot reach the network
- `manifest.json`
- PWA metadata and shortcuts, including `/?safetyplan=true` and `tel:988`
- `sw.js`
- network-first service worker with offline crisis fallback
### Backend / Python entry points
- `crisis/detect.py`
- canonical detection engine and public detection API
- `crisis/response.py`
- canonical response generator, UI flags, prompt modifier, grounding helpers
- `crisis/gateway.py`
- integration layer for `check_crisis()` and `get_system_prompt()`
- `crisis/compassion_router.py`
- profile-based prompt routing abstraction parallel to `response.py`
- `crisis_detector.py`
- root legacy shim exposing canonical detection in older shapes
- `crisis_responder.py`
- root legacy response module with a richer compatibility response contract
- `dying_detection/__init__.py`
- deprecated wrapper around canonical detection
### Operational entry points
- `deploy/deploy.sh`
- most complete one-command operational bootstrap path in the repo
- `deploy/playbook.yml`
- Ansible provisioning path for swap, packages, nginx, firewall, and site files
- `deploy/hermes-gateway.service`
- systemd unit running `hermes gateway --platform api_server --port 8644`
- `.gitea/workflows/smoke.yml`
- parse/syntax checks and secret scan
- `.gitea/workflows/sanity.yml`
- basic repo sanity grep checks for 988/system-prompt presence
## Data Flow
### Happy path: user message to streamed response
1. User types into `#msg-input` in `index.html`.
2. `sendMessage()`:
- trims text
- appends a user bubble to the DOM
- pushes `{role: 'user', content: text}` into the in-memory `messages` array
- runs client-side `checkCrisis(text)`
- clears the input and starts streaming
3. `streamResponse()` builds the request payload:
- prepends a synthetic system message from `getSystemPrompt(lastUserMessage || '')`
- posts JSON to `/api/v1/chat/completions`
4. nginx proxies `/api/*` to `127.0.0.1:8644`.
5. Hermes streams OpenAI-style SSE chunks back to the browser.
6. The browser reads `choices[0].delta.content` and incrementally renders the assistant message.
7. When streaming ends, the assistant turn is pushed into `messages`, saved to `localStorage`, and passed through `checkCrisis(fullText)` again.
### Immediate local crisis escalation path
1. `checkCrisis(text)` scans substrings against two client-side lists.
2. Low-tier/soft crisis text reveals the inline crisis panel.
3. Explicit intent text triggers the fullscreen overlay and delayed-dismiss flow.
4. The user still remains in the conversation flow rather than being hard-redirected away.
### Offline / failure path
1. `sw.js` precaches static routes and the crisis fallback page.
2. Navigation uses a network-first strategy with timeout fallback.
3. If network and cache both fail, the service worker tries `crisis-offline.html`.
4. If API streaming fails, `index.html` inserts a static emergency message with 988 and 741741 instead of a blank error.
## Key Abstractions
### 1. `SYSTEM_PROMPT`
Embedded directly in `index.html`, not loaded at runtime from `system-prompt.txt`. The browser treats the prompt as part of the application runtime contract.
### 2. `COMPASSION_PROFILES`
Frontend prompt-state profiles for `CRITICAL`, `HIGH`, `MEDIUM`, `LOW`, and `NONE`. They encode tone and directive shifts, but the current `levelMap` only maps browser levels to `NONE`, `MEDIUM`, and `CRITICAL`, leaving `HIGH` and `LOW` effectively unused in the main prompt-building path.
### 3. Client-side crisis detector
In `index.html`, the browser uses:
- `crisisKeywords` for panel escalation
- `explicitPhrases` for hard overlay escalation
- `checkCrisis(text)` for UI behavior
- `getCrisisLevel(text)` for prompt shaping
This is fast and local, but it is also a separate detector from the canonical Python package.
### 4. `CrisisDetectionResult`
The core canonical backend dataclass from `crisis/detect.py`:
- `level`
- `indicators`
- `recommended_action`
- `score`
- `matches`
This is the canonical representation shared by the main Python crisis stack.
### 5. `CrisisResponse`
In `crisis/response.py`, the canonical response dataclass ties backend detection to frontend/UI needs:
- `timmy_message`
- `show_crisis_panel`
- `show_overlay`
- `provide_988`
- `escalate`
### 6. Legacy compatibility layer
The repo still carries older interfaces:
- `crisis_detector.py`
- `crisis_responder.py`
- `dying_detection/__init__.py`
These preserve compatibility, but they also create drift risk:
- `MEDIUM` vs `MODERATE`
- two different `CrisisResponse` contracts
- two prompt-routing paths (`response.py` vs `compassion_router.py`)
### 7. Browser persistence contract
`localStorage` is a real part of runtime state despite some docs claiming otherwise.
Keys:
- `timmy_chat_history`
- `timmy_safety_plan`
That means The Door is not truly “close tab = gone” in its current implementation.
## API Surface
### Browser -> Hermes API contract
`index.html` sends:
```json
{
"model": "timmy",
"messages": [
{"role": "system", "content": "...prompt..."},
{"role": "assistant", "content": "..."},
{"role": "user", "content": "..."}
],
"stream": true
}
```
Endpoint:
- `/api/v1/chat/completions`
Expected response shape:
- streaming SSE lines beginning with `data: `
- chunk payloads with `choices[0].delta.content`
- `[DONE]` terminator
### Canonical Python API
- `crisis.detect.detect_crisis(text)`
- `crisis.response.generate_response(detection)`
- `crisis.response.process_message(text)`
- `crisis.response.get_system_prompt_modifier(detection)`
- `crisis.gateway.check_crisis(text)`
- `crisis.gateway.get_system_prompt(base_prompt, text="")`
- `crisis.gateway.format_gateway_response(text, pretty=True)`
### Legacy / compatibility API
- `CrisisDetector.scan()`
- `detect_crisis_legacy()`
- root `crisis_responder.generate_response()`
- deprecated `dying_detection.detect()` and helpers
## Test Coverage Gaps
### Current state
Verified on fresh `main` clone of `the-door`:
- `python3 -m pytest -q` -> `115 passed, 3 subtests passed`
What is already covered well:
- canonical crisis detection tiers
- response flags and gateway structure
- many false-positive regressions
- service-worker offline crisis fallback
- crisis overlay focus trap string-level assertions
- deprecated wrapper behavior
### High-value gaps that still matter
1. No real browser test of the actual send path in `index.html`.
- The repo currently contains a concrete scope bug:
- `sendMessage()` defines `var lastUserMessage = text;`
- `streamResponse()` later uses `getSystemPrompt(lastUserMessage || '')`
- `lastUserMessage` is not in `streamResponse()` scope
- Existing passing tests do not execute this real path.
2. No DOM-true test for overlay background locking.
- The overlay code targets `document.querySelector('.app')` and `getElementById('chat')`.
- The main document uses `id="app"`, not `.app`, and does not expose a `#chat` node.
- Current tests assert code presence, not selector correctness.
3. No route validation for `/about` vs `about.html`.
- The footer links to `/about`.
- The repo ships `about.html`.
- With current nginx `try_files`, this looks like a drift bug.
4. Legacy responder path remains largely untested.
- `crisis_responder.py` is still present and meaningful but lacks direct tests for its richer response payloads.
5. CI does not run pytest.
- The repo has a substantial suite, but Gitea workflows only do syntax/grep checks.
### Generated missing tests for critical paths
These are the three most important tests this codebase still needs.
#### A. Browser send-path smoke test
Goal: catch the `lastUserMessage` regression and ensure the chat request actually builds.
```python
# Example Playwright/browser test
async def test_send_message_builds_stream_request(page):
await page.goto("file:///.../index.html")
await page.fill("#msg-input", "hello")
await page.click("#send-btn")
# Expect no ReferenceError and one request to /api/v1/chat/completions
```
#### B. Overlay selector correctness test
Goal: prove the inert/background lock hits real DOM nodes, not dead selectors.
```python
def test_overlay_background_selectors_match_real_dom():
html = Path("index.html").read_text()
assert 'id="app"' in html
assert "querySelector('.app')" not in html
assert "getElementById('chat')" not in html
```
#### C. Legacy responder contract test
Goal: keep compatibility layers honest until they are deleted.
```python
from crisis_responder import process_message
def test_legacy_responder_returns_resources_for_high_risk():
response = process_message("I want to kill myself")
assert response.escalate is True
assert response.show_overlay is True
assert any("988" in r for r in response.resources)
```
## Security Considerations
### Strengths
- Browser message bubbles use `textContent`, not unsafe inner HTML, for chat content.
- API calls are same-origin and proxied through nginx.
- Service worker does not cache `/api/*` responses.
- nginx includes CSP, HSTS, and localhost-only gateway exposure.
- UFW/docs expect only `22`, `80`, and `443` to be public.
- systemd unit hardening is present in `hermes-gateway.service`.
### Risks
1. `localStorage` persistence contradicts the privacy story.
- chat history and safety plan are stored in plaintext on the device
- shared-device risk is real
2. `script-src 'unsafe-inline'` is required by the current architecture.
- all runtime logic and CSS are inline in `index.html`
- this weakens CSP/XSS posture
3. Safety enforcement is still heavily client-shaped.
- the frontend always embeds the crisis-aware prompt
- deployment does not clearly prove that all callers are forced through server-side crisis middleware
- direct API clients may bypass browser-supplied context
4. Client and server detection logic can drift.
- the browser uses substring lists
- the backend uses canonical regex tiers in `crisis/detect.py`
- parity is not tested
5. Deprecated wrapper emits a deterministic session hash.
- `dying_detection` exposes a truncated SHA-256 fingerprint of text
- useful for correlation, but still privacy-sensitive
## Dependencies
### Runtime
- Hermes binary at `/usr/local/bin/hermes`
- nginx
- certbot + python certbot nginx plugin
- ufw
- curl
- Python 3
- browser with JavaScript, service-worker, and `localStorage` support
### Test / operator dependencies
- pytest
- PyYAML (used implicitly by smoke workflow checks)
- ansible / ansible-playbook
- rsync, ssh, scp
- openssl
- dig / dnsutils
### In-repo dependency style
- Python code is effectively stdlib-first
- no `requirements.txt`, `pyproject.toml`, or `package.json`
- operational dependencies live mostly in docs and scripts rather than a declared manifest
## Deployment
### Intended production path
Browser -> nginx TLS -> static webroot + `/api/*` reverse proxy -> Hermes on `127.0.0.1:8644`
### Main deployment commands
- `make deploy`
- `make deploy-bash`
- `make push`
- `make check`
- `bash deploy/deploy.sh`
- `cd deploy && ansible-playbook -i inventory.ini playbook.yml`
### Operational files
- `deploy/nginx.conf`
- `deploy/playbook.yml`
- `deploy/deploy.sh`
- `deploy/hermes-gateway.service`
- `resilience/health-check.sh`
- `resilience/service-restart.sh`
### Deployment reality check
The repo's deploy surface is not fully coherent:
- `deploy/deploy.sh` is the most complete operational path
- `deploy/playbook.yml` provisions nginx/site/firewall/SSL but does not manage `hermes-gateway.service`
- resilience scripts still target port `8000`, not the real gateway at `8644`
- `crisis-offline.html` is required by `sw.js`, but full deploy paths do not appear to ship it consistently
## Technical Debt
### Highest-priority debt
1. Fix the `lastUserMessage` scope bug in `index.html`.
2. Fix overlay background selector drift (`.app` vs `#app`, missing `#chat`).
3. Fix `/about` route drift.
4. Add pytest to Gitea CI.
5. Make deploy paths ship the same artifact set, including `crisis-offline.html`.
6. Make the recommended Ansible path actually manage `hermes-gateway.service`.
7. Align or remove resilience scripts targeting the wrong port/service.
8. Resolve doc drift:
- ARCHITECTURE says “close tab = gone,” but implementation uses `localStorage`
- BACKEND_SETUP still says 49 tests, while current verified suite is 115 + 3 subtests
- audit docs understate current automation coverage
### Strategic debt
- Duplicate crisis logic across browser and backend
- Parallel prompt-routing mechanisms (`response.py` and `compassion_router.py`)
- Legacy compatibility layers that still matter but are not first-class tested
- No declared dependency manifest for operator tooling
- No true E2E browser validation of the core conversation loop
## Bottom Line
The Door is not just a static landing page. It is a small but layered safety system with three cores:
- a browser-first crisis chat UI
- a canonical Python crisis package
- a thin nginx/Hermes deployment shell
Its design is morally serious and operationally pragmatic. Its main weaknesses are not missing ambition; they are drift, duplication, and shallow verification at the exact seams where the browser, backend, and deploy layer meet.