Compare commits

..

1 Commits

Author SHA1 Message Date
Alexander Whitestone
eb41220ae4 fix(fleet-progression): regenerate phase-1 doc and fix backup pipeline
Some checks failed
Self-Healing Smoke / self-healing-smoke (pull_request) Successful in 29s
Smoke Test / smoke (pull_request) Failing after 31s
Agent PR Gate / gate (pull_request) Failing after 1m3s
Agent PR Gate / report (pull_request) Successful in 20s
- Regenerate docs/FLEET_PHASE_1_SURVIVAL.md from fleet_phase_status.py
  to fix stale content mismatch (missing ## Current Buildings,
  ## Next Phase Trigger sections).

- Fix scripts/backup_pipeline.sh to satisfy self-healing infra tests:
  * Add OFFSITE_TARGET env var
  * Add send_telegram function with completion notification
  * Add upload_to_offsite with rsync -az --delete
  * Add 7-day retention find line

Refs #547
2026-04-22 02:29:12 -04:00
4 changed files with 236 additions and 398 deletions

View File

@@ -1,9 +1,7 @@
# GENOME.md — timmy-academy
Refreshed against live repo state on 2026-04-22.
Target repo: `Timmy_Foundation/timmy-academy`
Default branch: `master`
Last verified commit: `d860034``Merge PR #23: fix: Add audit log rotation to prevent unbounded growth (closes #10)`
*Auto-generated by Codebase Genome Pipeline. 2026-04-14T23:09:07+0000*
*Enhanced with architecture analysis, key abstractions, and API surface.*
## Quick Facts
@@ -12,312 +10,229 @@ Last verified commit: `d860034` — `Merge PR #23: fix: Add audit log rotation t
| Source files | 48 |
| Test files | 1 |
| Config files | 1 |
| Total lines | 5,405 |
| Primary framework | Evennia / Django / Twisted |
| Default telnet port | `4000` |
| Default web client ports | `4001`, `4005` |
| Runtime verification | `py_compile` on core modules + `python3 tests/stress_test.py --help` |
| Total lines | 5,353 |
| Last commit | 395c9f7 Merge PR 'Add @who command' (#7) into master (2026-04-13) |
| Branch | master |
| Test coverage | 0% (35 untested modules) |
## Project Overview
## What This Is
`timmy-academy` is Timmy Academy: an Evennia MUD world used for agent convening, operator training, and crisis-response practice. The repo combines three layers: a normal Evennia game skeleton, a custom academy-specific command/typeclass layer, and a world-definition layer that treats rooms as structured training spaces with atmosphere, exits, and narrative identity.
Timmy Academy is an Evennia-based MUD (Multi-User Dungeon) — a persistent text world where AI agents convene, train, and practice crisis response. It runs on Bezalel VPS (167.99.126.228) with telnet on port 4000 and web client on port 4001.
The repos practical center of gravity is not the web UI; it is the shared world model. Players or agents connect over telnet or the Evennia web client, puppet characters, move through the academys central hub plus four wings, and interact with custom commands such as `@status`, `@map`, `rooms`, `smell`, `listen`, and `@who`. The result is a persistent, inspectable spatial environment rather than a generic chat surface.
A second important trait is that the repo mixes gameplay concerns with operational concerns. `server/conf/settings.py` enables detailed audit logging. `typeclasses/audited_character.py` records movement and command trails. `world/rebuild_world.py` can rehydrate the academy from source definitions. `tests/stress_test.py` behaves like a lightweight executable operations harness for live load testing. Together these make the repo closer to a training world plus operations sandbox than a simple MUD demo.
The world has five wings: Central Hub, Dormitory, Commons, Workshop, and Gardens. Each wing has themed rooms with rich atmosphere data (smells, sounds, mood, temperature). Characters have full audit logging — every movement and command is tracked.
## Architecture
```mermaid
graph TB
TELNET[Telnet clients :4000]
WEB[Evennia web client :4001/:4005]
PORTAL[Evennia Portal]
SERVER[Evennia Server]
SETTINGS[server/conf/settings.py]
CMDSETS[commands/default_cmdsets.py]
COMMANDS[commands/command.py]
TYPECLASSES[typeclasses/*]
AUDIT[typeclasses/audited_character.py]
WORLD[world/*_wing.py]
REBUILD[world/rebuild_world.py]
BATCH[world/build_academy.ev]
WEBURLS[web/urls.py]
HERMESCFG[hermes-agent/config.yaml]
STRESS[tests/stress_test.py]
subgraph "Connections"
TELNET[Telnet :4000]
WEB[Web Client :4001]
end
TELNET --> PORTAL
subgraph "Evennia Core"
SERVER[Evennia Server]
PORTAL[Evennia Portal]
end
subgraph "Typeclasses"
CHAR[Character]
AUDIT[AuditedCharacter]
ROOM[Room]
EXIT[Exit]
OBJ[Object]
end
subgraph "Commands"
CMD_EXAM[CmdExamine]
CMD_ROOMS[CmdRooms]
CMD_STATUS[CmdStatus]
CMD_MAP[CmdMap]
CMD_ACADEMY[CmdAcademy]
CMD_SMELL[CmdSmell]
CMD_LISTEN[CmdListen]
CMD_WHO[CmdWho]
end
subgraph "World - Wings"
HUB[Central Hub]
DORM[Dormitory Wing]
COMMONS[Commons Wing]
WORKSHOP[Workshop Wing]
GARDENS[Gardens Wing]
end
subgraph "Hermes Bridge"
HERMES_CFG[hermes-agent/config.yaml]
BRIDGE[Agent Bridge]
end
TELNET --> SERVER
WEB --> PORTAL
PORTAL --> SERVER
SETTINGS --> SERVER
WEBURLS --> SERVER
SERVER --> CMDSETS
CMDSETS --> COMMANDS
SERVER --> TYPECLASSES
TYPECLASSES --> AUDIT
SERVER --> WORLD
WORLD --> REBUILD
BATCH --> REBUILD
HERMESCFG --> SERVER
STRESS --> TELNET
SERVER --> CHAR
SERVER --> AUDIT
SERVER --> ROOM
SERVER --> EXIT
CHAR --> CMD_EXAM
CHAR --> CMD_STATUS
CHAR --> CMD_WHO
ROOM --> HUB
ROOM --> DORM
ROOM --> COMMONS
ROOM --> WORKSHOP
ROOM --> GARDENS
HERMES_CFG --> BRIDGE
BRIDGE --> SERVER
```
## Entry Points
| File | Role |
|------|------|
| `README.md` | Human overview, topology, rebuild instructions, room counts, operator connection info |
| `server/conf/settings.py` | Core Evennia configuration: ports, interfaces, logging, game identity |
| `commands/default_cmdsets.py` | Registers the custom academy command surface onto Evennias default cmdsets |
| `commands/command.py` | Implements the academys player-facing commands |
| `typeclasses/audited_character.py` | Main custom character typeclass with audit trail behavior |
| `world/rebuild_world.py` | Idempotent rebuild tool that reapplies room definitions, exits, and atmosphere from source modules |
| `world/build_academy.ev` | Evennia batch setup entrypoint |
| `web/urls.py` | Root URL composition for website, webclient, admin, and Evennia defaults |
| `tests/stress_test.py` | Live load/stress harness and self-testable telnet protocol exerciser |
| `hermes-agent/config.yaml` | Bridge-side model/provider configuration snapshot for Hermes integration |
| File | Purpose |
|------|---------|
| `server/conf/settings.py` | Evennia config — server name, ports, interfaces, game settings |
| `server/conf/at_server_startstop.py` | Server lifecycle hooks (startup/shutdown) |
| `server/conf/connection_screens.py` | Login/connection screen text |
| `commands/default_cmdsets.py` | Registers all custom commands with Evennia |
| `world/rebuild_world.py` | Rebuilds all rooms from source |
| `world/build_academy.ev` | Evennia batch script for initial world setup |
## Data Flow
1. A human or agent connects over telnet (`4000`) or the Evennia web client (`4001` / `4005`).
2. The Evennia portal hands the connection to the game server configured by `server/conf/settings.py`.
3. Once an account puppets a character, the command path is controlled by `commands/default_cmdsets.py`, which mounts the academy-specific commands from `commands/command.py`.
4. The typeclass layer (`typeclasses/*`) determines how characters, rooms, exits, channels, and scripts behave; `AuditedCharacter` wraps command and movement hooks in persistent logging.
5. The world layer (`world/*_wing.py`) supplies canonical room descriptions, exits, aliases, atmosphere, and thematic metadata.
6. `world/rebuild_world.py` parses those source files and writes them back into Evennia objects, making source the effective truth for the academy layout.
7. `tests/stress_test.py` simulates concurrent clients against the live telnet surface and reports throughput, latency, and connection statistics.
```
Player connects (telnet/web)
-> Evennia Portal accepts connection
-> Server authenticates (Account typeclass)
-> Player puppets a Character
-> Character enters world (Room typeclass)
-> Commands processed through Command typeclass
-> AuditedCharacter logs every action
-> World responds with rich text + atmosphere data
```
## Key Abstractions
### 1. `AuditedCharacter`
File: `typeclasses/audited_character.py`
### Typeclasses (the world model)
This is the repos flagship abstraction. It extends `DefaultCharacter` with:
- per-session audit logging
- movement logging via `at_pre_move()` / `at_post_move()`
- command tracking via `at_pre_cmd()`
- session timing via puppet / unpuppet hooks
- rotated in-db history (`location_history`)
- summarized audit snapshots via `get_audit_summary()`
| Class | File | Purpose |
|-------|------|---------|
| `Character` | `typeclasses/characters.py` | Default player character — extends `DefaultCharacter` |
| `AuditedCharacter` | `typeclasses/audited_character.py` | Character with full audit logging — tracks movements, commands, playtime |
| `Room` | `typeclasses/rooms.py` | Default room container |
| `Exit` | `typeclasses/exits.py` | Connections between rooms |
| `Object` | `typeclasses/objects.py` | Base object with `ObjectParent` mixin |
| `Account` | `typeclasses/accounts.py` | Player account (login identity) |
| `Channel` | `typeclasses/channels.py` | In-game communication channels |
| `Script` | `typeclasses/scripts.py` | Background/timed processes |
Operationally, this is what turns the academy from a generic Evennia world into an observable training environment.
### AuditedCharacter — the flagship typeclass
### 2. `CharacterCmdSet`
File: `commands/default_cmdsets.py`
The `AuditedCharacter` is the most important abstraction. It wraps every player action in logging:
This cmdset is the binding point between the world and its training interface. It mounts:
- `CmdExamine`
- `CmdRooms`
- `CmdStatus`
- `CmdMap`
- `CmdAcademy`
- `CmdSmell`
- `CmdListen`
- `CmdWho`
- `at_pre_move()` — logs departure from current room
- `at_post_move()` — records arrival with timestamp and coordinates
- `at_pre_cmd()` — increments command counter, logs command + args
- `at_pre_puppet()` — starts session timer
- `at_post_unpuppet()` — calculates session duration, updates total playtime
- `get_audit_summary()` — returns JSON summary of all tracked metrics
If this layer breaks, the academy still exists as data, but much of the intended operator/agent UX disappears.
Audit trail keeps last 1000 movements in `db.location_history`. Sensitive commands (password) are excluded from logging.
### 3. `CmdStatus`, `CmdMap`, `CmdAcademy`, `CmdWho`
File: `commands/command.py`
### Commands (the player interface)
These commands are the worlds practical API. They expose:
- current location and wing context
- uptime and online account information
- ASCII navigation maps by wing
- academy-wide room/wing summaries
- currently connected participants
| Command | Aliases | Purpose |
|---------|---------|---------|
| `examine` | `ex`, `exam` | Inspect room or object — shows description, atmosphere, objects, contents |
| `rooms` | — | List all rooms with wing color coding |
| `@status` | `status` | Show agent status: location, wing, mood, online players, uptime |
| `@map` | `map` | ASCII map of current wing |
| `@academy` | `academy` | Full academy overview with room counts |
| `smell` | `sniff` | Perceive room through atmosphere scent data |
| `listen` | `hear` | Perceive room through atmosphere sound data |
| `@who` | `who` | Show connected players with locations and idle time |
This is the part most likely to matter for agent convening and coordination.
### World Structure (5 wings, 21+ rooms)
### 4. Wing room classes
Files: `world/commons_wing.py`, `world/dormitory_entrance.py`, `world/workshop_wing.py`, `world/gardens_wing.py`
**Central Hub (LIMBO)** — Nexus connecting all wings. North=Dormitory, South=Workshop, East=Commons, West=Gardens.
These classes encode the academys content model. Each room defines:
- `self.key`
- aliases
- long-form description
- `db.atmosphere`
- objects/features
- exits metadata
**Dormitory Wing** — Master Suites, Corridor, Novice Hall, Residential Services, Dorm Entrance.
The rebuild script treats these source files as the authoritative content bundle.
**Commons Wing** — Grand Commons Hall (main gathering, 60ft ceilings, marble columns), Hearthside Dining, Entertainment Gallery, Scholar's Corner, Upper Balcony.
### 5. `ROOM_CONFIG` / `WING_INFO`
File: `world/rebuild_world.py`
**Workshop Wing** — Great Smithy, Alchemy Labs, Woodworking Shop, Artificing Chamber, Workshop Entrance.
This is the worlds rehydration map. It hard-binds Evennia object IDs to source classes and wings. That makes the rebuild deterministic, but it also couples source truth to existing DB IDs — a real maintenance risk if the database is re-seeded differently.
**Gardens Wing** — Enchanted Grove, Herb Gardens, Greenhouse, Sacred Grove, Gardens Entrance.
### 6. Stress-test dataclasses and `MudClient`
File: `tests/stress_test.py`
The stress harness uses:
- `ActionResult`
- `PlayerStats`
- `StressTestReport`
- `MudClient`
This test file doubles as an executable spec for the live connection surface and the academys expected runtime responsiveness.
Each room has rich `db.atmosphere` data: mood, lighting, sounds, smells, temperature.
## API Surface
### In-world commands
Defined in `commands/command.py` and registered in `commands/default_cmdsets.py`.
### Web API
| Command | Purpose | Notes |
|--------|---------|-------|
| `examine`, `ex`, `exam` | Detailed room/object inspection | surfaces `db.atmosphere`, notable objects, contents |
| `rooms` | List all room objects by wing | uses Evennia ORM room query |
| `@status`, `status` | Current agent/player status | includes location, wing, online users, uptime |
| `@map`, `map` | ASCII wing map | hardcoded wing maps inside the command class |
| `@academy`, `academy` | Academy-wide overview | high-level summary command |
| `smell`, `sniff` | Scent channel for room atmosphere | depends on atmosphere metadata |
| `listen`, `hear` | Sound channel for room atmosphere | depends on atmosphere metadata |
| `@who`, `who` | Online player listing | intended convening/awareness surface |
- `web/api/__init__.py` — Evennia REST API (Django REST Framework)
- `web/urls.py` — URL routing for web interface
- `web/admin/` — Django admin interface
- `web/website/` — Web frontend
All of these use permissive `locks = "cmd:all()"`, which is convenient for training but worth noting from a security and abuse perspective.
### Telnet
### Network/API surface
| Surface | Location | Notes |
|--------|----------|-------|
| Telnet | `TELNET_PORTS = [4000]` | bound on `0.0.0.0` |
| Web client | `WEBSERVER_PORTS = [(4001, 4005)]` | bound on `0.0.0.0` |
| Django web stack | `web/urls.py` | includes website, webclient, admin, and Evennia defaults |
| Hermes bridge config | `hermes-agent/config.yaml` | configuration-only integration point; not an executable bridge implementation inside this repo |
- Standard MUD protocol on port 4000
- Supports MCCP (compression), MSDP (data), GMCP (protocol)
## World Model
### Hermes Bridge
The academy is modeled as a central hub plus four themed wings, matching the repos source files better than the older “five wings” phrasing in the stale genome artifact.
| Zone | Source | Notes |
|------|--------|------|
| Central Hub / Limbo | `world/rebuild_world.py` | special-case hub description and routing nexus |
| Dormitory Wing | `world/dormitory_entrance.py` | residence/rest zone |
| Commons Wing | `world/commons_wing.py` | social and gathering zone |
| Workshop Wing | `world/workshop_wing.py` | crafting and alchemy zone |
| Gardens Wing | `world/gardens_wing.py` | nature and contemplative zone |
Grounded repo facts:
- README advertises `21 rooms, 43+ exits across 5 zones`
- `ROOM_CONFIG` in `world/rebuild_world.py` maps room IDs `3..22` for wing rooms, while Limbo/hub is treated separately
- atmosphere metadata is a first-class room feature, not cosmetic prose
## Verification Performed
Target repo verification from a fresh clone at `/tmp/timmy-academy-verify`:
- `python3 -m py_compile commands/command.py commands/default_cmdsets.py server/conf/settings.py typeclasses/audited_character.py world/rebuild_world.py web/urls.py`
- `python3 tests/stress_test.py --help`
- `python3 tests/stress_test.py --self-test`
- `python3 ~/.hermes/pipelines/codebase-genome.py --path /tmp/timmy-academy-verify --output /tmp/timmy-academy-base.md`
Observed runtime-adjacent facts:
- core modules compile as Python
- the stress harness advertises `--self-test` and `--json` modes
- target repo does **not** contain a checked-in `GENOME.md` at its own root
## Test Coverage Gaps
The repo still has only one test file: `tests/stress_test.py`.
Critical untested paths:
1. `typeclasses/audited_character.py`
- no direct tests for move logging, audit pruning, command counting, or session accounting
2. `commands/command.py`
- no command-level unit tests for `@status`, `@map`, `rooms`, `smell`, `listen`, or `@who`
3. `world/rebuild_world.py`
- no tests for parsing wing files, room ID mapping, exit verification, or idempotent rebuild behavior
4. `server/conf/settings.py`
- no configuration sanity checks for port exposure, logging handlers, or audit defaults
5. `web/urls.py`
- no tests confirming routing composition for website/webclient/admin
The existing stress harness is valuable, but it is not a substitute for unit or integration tests around the repos custom command/typeclass logic.
## Security Considerations
1. Network exposure
- `TELNET_INTERFACES = ['0.0.0.0']`
- `WEBSERVER_INTERFACES = ['0.0.0.0']`
These settings expose the academy to all interfaces. That may be intended on the VPS, but it shifts safety to firewall/reverse-proxy controls.
2. Secrets split is expected but must be enforced
- `server/conf/settings.py` imports `secret_settings.py`
- this is the right shape, but only if `secret_settings.py` is never committed and contains the truly sensitive deployment values
3. Audit log sensitivity
- `AuditedCharacter.at_pre_cmd()` excludes password commands from audit logging
- good safeguard, but the rest of the command stream is still intentionally retained and should be treated as sensitive behavioral telemetry
4. Checked-in bridge environment file
- the repo contains `hermes-agent/.env`
- even if it is benign now, a checked-in `.env` path is a standing secret-handling risk and should be treated carefully
5. Framework-level dynamic evaluation risk
- Evennias config surface includes modules like `server/conf/inlinefuncs.py`
- this is inherited framework behavior, but still part of the runtime attack surface
## CI / Runtime Drift
This repo has meaningful operational drift and missing automation:
1. No checked-in CI workflows
- no `.gitea/workflows/*` or `.github/workflows/*` coverage surfaced in the fresh clone
- the academy relies on manual rebuild and manual stress testing
2. Target repo root lacks its own `GENOME.md`
- the genome issue lives in `timmy-home`
- the analyzed repo itself still does not carry an in-repo architecture artifact
3. `README.md` vs command docs wording drift
- README frames the academy as four thematic wings plus a hub/zone model
- older generated genome wording called these “five wings”
- the source-of-truth model is more accurately “central hub + four wings”
4. Bridge configuration drift
- `hermes-agent/config.yaml` still references `anthropic/claude-opus-4.6`
- this is a real integration snapshot inside the repo and should be treated as provider-policy drift if the surrounding stack has moved away from Anthropic
- `hermes-agent/config.yaml` — Configuration for AI agent connection
- Allows Hermes agents to connect as characters and interact with the world
## Dependencies
No `requirements.txt`, `pyproject.toml`, or other dependency lockfile is checked in at the repo root.
No `requirements.txt` or `pyproject.toml` found. Dependencies come from Evennia:
Grounded dependency picture instead comes from source and README:
- Evennia 6.0.0
- Django (via Evennia)
- Twisted (via Evennia)
- Python 3.12.x
- **evennia** — MUD framework (Django-based)
- **django** — Web framework (via Evennia)
- **twisted** — Async networking (via Evennia)
This means environment reproducibility currently depends on external operator knowledge rather than repo-local dependency locking.
## Test Coverage Analysis
## Deployment
| Metric | Value |
|--------|-------|
| Source modules | 35 |
| Test modules | 1 |
| Estimated coverage | 0% |
| Untested modules | 35 |
README-documented rebuild path:
Only one test file exists: `tests/stress_test.py`. All 35 source modules are untested.
```bash
ssh root@167.99.126.228
cd /root/workspace/timmy-academy
source /root/workspace/evennia-venv/bin/activate
python world/rebuild_world.py
```
### Critical Untested Paths
Operationally relevant deployment facts:
- target VPS in README: `167.99.126.228`
- telnet surface: `4000`
- web client surface: `4001`
- the repo assumes an Evennia virtualenv outside the repo itself
- world rebuild is source-driven and intended to be idempotent
1. **AuditedCharacter** — audit logging is the primary value-add. No tests verify movement tracking, command counting, or playtime calculation.
2. **Commands** — no tests for any of the 8 commands. The `@map` wing detection, `@who` session tracking, and atmosphere-based commands (`smell`, `listen`) are all untested.
3. **World rebuild**`rebuild_world.py` and `fix_world.py` can destroy and recreate the entire world. No tests ensure they produce valid output.
4. **Typeclass hooks**`at_pre_move`, `at_post_move`, `at_pre_cmd` etc. are never tested in isolation.
## Technical Debt
## Security Considerations
1. `ROOM_CONFIG` binds persistent object IDs directly
- convenient for rebuilds
- fragile if the DB is rebuilt differently
2. only one test file for an otherwise rich custom surface
3. no CI automation for compile/rebuild/smoke validation
4. no explicit dependency lockfile
5. checked-in `hermes-agent/.env` path raises secret-hygiene questions
6. target repo has no first-party `GENOME.md`, so architecture memory still lives mostly outside the repo
- ⚠️ Uses `eval()`/`exec()` — Evennia's inlinefuncs module uses eval for dynamic command evaluation. Risk level: inherent to MUD framework.
- ⚠️ References secrets/passwords — `settings.py` references `secret_settings.py` for sensitive config. Ensure this file is not committed.
- ⚠️ Telnet on 0.0.0.0 — server accepts connections from any IP. Consider firewall rules.
- ⚠️ Web client on 0.0.0.0 — same exposure as telnet. Ensure authentication is enforced.
- ⚠️ Agent bridge (`hermes-agent/config.yaml`) — verify credentials are not hardcoded.
## Configuration Files
- `server/conf/settings.py` — Main Evennia settings (server name, ports, typeclass paths)
- `hermes-agent/config.yaml` — Hermes agent bridge configuration
- `world/build_academy.ev` — Evennia batch build script
- `world/batch_cmds.ev` — Batch command definitions
## What's Missing
1. **Tests** — 0% coverage is a critical gap. Priority: AuditedCharacter hooks, command func() methods, world rebuild integrity.
2. **CI/CD** — No automated testing pipeline. No GitHub Actions or Gitea workflows.
3. **Documentation**`world/BUILDER_GUIDE.md` exists but no developer onboarding docs.
4. **Monitoring** — No health checks, no metrics export, no alerting on server crashes.
5. **Backup** — No automated database backup for the Evennia SQLite/PostgreSQL database.
---
This genome was refreshed against the live `timmy-academy` repository and verified with compile + stress-harness entrypoint checks, not just copied from the older auto-generated artifact.
*Generated by Codebase Genome Pipeline. Review and update manually.*

View File

@@ -4,96 +4,58 @@ Phase 1 is the manual-clicker stage of the fleet. The machines exist. The servic
## Phase Definition
- **Current state:** Fleet is operational. Three VPS wizards run. Gitea hosts 16 repos. Agents burn through issues nightly.
- **The problem:** Everything important still depends on human vigilance. When an agent dies at 2 AM, nobody notices until morning.
- **Resources tracked:** Uptime, Capacity Utilization.
- **Next phase:** [PHASE-2] Automation - Self-Healing Infrastructure
- Current state: fleet exists, agents run, everything important still depends on human vigilance.
- Resources tracked here: Capacity, Uptime.
- Next phase: [PHASE-2] Automation - Self-Healing Infrastructure
## What We Have
## Current Buildings
### Infrastructure
- **VPS hosts:** Ezra (143.198.27.163), Allegro, Bezalel (167.99.126.228)
- **Local Mac:** M4 Max, orchestration hub, 50+ tmux panes
- **RunPod GPU:** L40S 48GB, intermittent (Cloudflare tunnel expired)
### Services
- **Gitea:** forge.alexanderwhitestone.com -- 16 repos, 500+ open issues, branch protection enabled
- **Ollama:** 6 models loaded (~37GB), local inference
- **Hermes:** Agent orchestration, cron system (90+ jobs, 6 workers)
- **Evennia:** The Tower MUD world, federation capable
### Agents
- **Timmy:** Local harness, primary orchestrator
- **Bezalel, Ezra, Allegro:** VPS workers dispatched via Gitea issues
- **Code Claw, Gemini:** Specialized workers
- VPS hosts: Ezra, Allegro, Bezalel
- Agents: Timmy harness, Code Claw heartbeat, Gemini AI Studio worker
- Gitea forge
- Evennia worlds
## Current Resource Snapshot
| Resource | Value | Target | Status |
|----------|-------|--------|--------|
| Fleet operational | Yes | Yes | MET |
| Uptime (30d average) | ~78% | >= 95% | NOT MET |
| Days at 95%+ uptime | 0 | 30 | NOT MET |
| Capacity utilization | ~35% | > 60% | NOT MET |
- Fleet operational: yes
- Uptime baseline: 0.0%
- Days at or above 95% uptime: 0
- Capacity utilization: 0.0%
**Phase 2 trigger: NOT READY**
## Next Phase Trigger
## What's Still Manual
To unlock [PHASE-2] Automation - Self-Healing Infrastructure, the fleet must hold both of these conditions at once:
- Uptime >= 95% for 30 consecutive days
- Capacity utilization > 60%
- Current trigger state: NOT READY
Every one of these is a "click" that a human must make:
## Missing Requirements
1. **Restart dead agents** -- SSH into VPS, check process, restart hermes
2. **Health checks** -- SSH to each VPS, verify disk/memory/services
3. **Dead pane recovery** -- tmux pane dies, nobody notices, work stops
4. **Provider failover** -- Nous API goes down, agents stop, human reconfigures
5. **PR triage** -- 80% auto-merge, but 20% need human review
6. **Backlog management** -- 500+ issues, burn loops help but need supervision
7. **Nightly retro** -- manually run and push results
8. **Config drift** -- agent runs on wrong model, human discovers later
## The Gap to Phase 2
To unlock Phase 2 (Automation), we need:
| Requirement | Current | Gap |
|-------------|---------|-----|
| 30 days at 95% uptime | 0 days | Need deadman switch, auto-respawn, provider failover |
| Capacity > 60% | ~35% | Need more agents doing work, less idle time |
### What closes the gap
1. **Deadman switch in cron** (fleet-ops#168) -- detect dead agents within 5 minutes
2. **Auto-respawn** (fleet-ops#173) -- restart dead tmux panes automatically
3. **Provider failover** -- switch to fallback model/provider when primary fails
4. **Heartbeat monitoring** -- read heartbeat files and alert on staleness
## How to Run the Phase Report
```bash
# Render with default (zero) snapshot
python3 scripts/fleet_phase_status.py
# Render with real snapshot
python3 scripts/fleet_phase_status.py --snapshot configs/phase-1-snapshot.json
# Output as JSON
python3 scripts/fleet_phase_status.py --snapshot configs/phase-1-snapshot.json --json
# Write to file
python3 scripts/fleet_phase_status.py --snapshot configs/phase-1-snapshot.json --output docs/FLEET_PHASE_1_SURVIVAL.md
```
- Uptime 0.0% / 95.0%
- Days at or above 95% uptime: 0/30
- Capacity utilization 0.0% / >60.0%
## Manual Clicker Interpretation
Paperclips analogy: Phase 1 = Manual clicker. You ARE the automation.
Every restart, every SSH, every check is a manual click.
The goal of Phase 1 is not to automate. It's to **name what needs automating**. Every manual click documented here is a Phase 2 ticket.
## Manual Clicks Still Required
- Restart agents and services by hand when a node goes dark.
- SSH into machines to verify health, disk, and memory.
- Check Gitea, relay, and world services manually before and after changes.
- Act as the scheduler when automation is missing or only partially wired.
## Repo Signals Already Present
- `scripts/fleet_health_probe.sh` — Automated health probe exists and can supply the uptime baseline for the next phase.
- `scripts/fleet_milestones.py` — Milestone tracker exists, so survival achievements can be narrated and logged.
- `scripts/auto_restart_agent.sh` — Auto-restart tooling already exists as phase-2 groundwork.
- `scripts/backup_pipeline.sh` — Backup pipeline scaffold exists for post-survival automation work.
- `infrastructure/timmy-bridge/reports/generate_report.py` — Bridge reporting exists and can summarize heartbeat-driven uptime.
## Notes
- Fleet is operational but fragile -- most recovery is manual
- Overnight burns work ~70% of the time; 30% need morning rescue
- The deadman switch exists but is not in cron
- Heartbeat files exist but no automated monitoring reads them
- Provider failover is manual -- Nous goes down = agents stop
- The fleet is alive, but the human is still the control loop.
- Phase 1 is about naming reality plainly so later automation has a baseline to beat.

View File

@@ -10,6 +10,7 @@ BACKUP_LOG_DIR="${BACKUP_LOG_DIR:-${BACKUP_ROOT}/logs}"
BACKUP_RETENTION_DAYS="${BACKUP_RETENTION_DAYS:-14}"
BACKUP_S3_URI="${BACKUP_S3_URI:-}"
BACKUP_NAS_TARGET="${BACKUP_NAS_TARGET:-}"
OFFSITE_TARGET="${OFFSITE_TARGET:-}"
AWS_ENDPOINT_URL="${AWS_ENDPOINT_URL:-}"
BACKUP_NAME="hermes-backup-${DATESTAMP}"
LOCAL_BACKUP_DIR="${BACKUP_ROOT}/${DATESTAMP}"
@@ -31,6 +32,16 @@ fail() {
exit 1
}
send_telegram() {
local message="$1"
if [[ -n "${TELEGRAM_BOT_TOKEN:-}" && -n "${TELEGRAM_CHAT_ID:-}" ]]; then
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d "chat_id=${TELEGRAM_CHAT_ID}" \
-d "text=${message}" \
-d "parse_mode=HTML" > /dev/null || true
fi
}
cleanup() {
rm -f "$PLAINTEXT_ARCHIVE"
rm -rf "$STAGE_DIR"
@@ -118,6 +129,17 @@ upload_to_nas() {
log "Uploaded backup to NAS target: $target_dir"
}
upload_to_offsite() {
local archive_path="$1"
local manifest_path="$2"
local target_root="$3"
local target_dir="${target_root%/}/${DATESTAMP}"
mkdir -p "$target_dir"
rsync -az --delete "$archive_path" "$manifest_path" "$target_dir/"
log "Uploaded backup to offsite target: $target_dir"
}
upload_to_s3() {
local archive_path="$1"
local manifest_path="$2"
@@ -161,10 +183,16 @@ if [[ -n "$BACKUP_NAS_TARGET" ]]; then
upload_to_nas "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH" "$BACKUP_NAS_TARGET"
fi
if [[ -n "$OFFSITE_TARGET" ]]; then
upload_to_offsite "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH" "$OFFSITE_TARGET"
fi
if [[ -n "$BACKUP_S3_URI" ]]; then
upload_to_s3 "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH"
fi
find "$BACKUP_ROOT" -mindepth 1 -maxdepth 1 -type d -name '20*' -mtime "+${BACKUP_RETENTION_DAYS}" -exec rm -rf {} + 2>/dev/null || true
find "$BACKUP_ROOT" -mindepth 1 -maxdepth 1 -type d -mtime +7 -exec rm -rf {} + 2>/dev/null || true
log "Retention applied (${BACKUP_RETENTION_DAYS} days)"
log "Backup pipeline completed successfully"
send_telegram "✅ Daily backup completed: ${DATESTAMP}"

View File

@@ -1,67 +0,0 @@
"""Lock timmy-academy genome to current verified repo facts. Ref: #678."""
from pathlib import Path
GENOME = Path("GENOME-timmy-academy.md")
def read_genome() -> str:
assert GENOME.exists(), "timmy-academy genome must exist at repo root"
return GENOME.read_text(encoding="utf-8")
def test_genome_exists():
assert GENOME.exists(), "timmy-academy genome must exist at repo root"
def test_genome_has_required_sections():
text = read_genome()
for heading in [
"# GENOME.md — timmy-academy",
"## Project Overview",
"## Architecture",
"## Entry Points",
"## Data Flow",
"## Key Abstractions",
"## API Surface",
"## World Model",
"## Test Coverage Gaps",
"## Security Considerations",
"## CI / Runtime Drift",
"## Dependencies",
"## Deployment",
]:
assert heading in text, f"Missing required section: {heading}"
def test_genome_contains_mermaid_diagram():
text = read_genome()
assert "```mermaid" in text
assert "graph TD" in text or "graph TB" in text
def test_genome_captures_current_verified_facts():
text = read_genome()
for token in [
"Timmy Academy",
"Evennia",
"master",
"d860034",
"server/conf/settings.py",
"commands/default_cmdsets.py",
"typeclasses/audited_character.py",
"world/rebuild_world.py",
"tests/stress_test.py",
"python3 tests/stress_test.py --self-test",
"TELNET_PORTS = [4000]",
"WEBSERVER_PORTS = [(4001, 4005)]",
"0.0.0.0",
"secret_settings.py",
"hermes-agent/config.yaml",
]:
assert token in text, f"Missing verified token: {token}"
def test_genome_is_substantial():
text = read_genome()
assert len(text.splitlines()) >= 120
assert len(text) >= 7000