Compare commits
4 Commits
sprint/iss
...
fix/794-au
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a39f4fb1ab | ||
|
|
5c2cf06f57 | ||
|
|
4fd78ace44 | ||
|
|
b8b8bb65fd |
296
GENOME.md
296
GENOME.md
@@ -1,209 +1,141 @@
|
||||
# GENOME.md — the-nexus
|
||||
# GENOME.md — Timmy_Foundation/timmy-home
|
||||
|
||||
Generated by `pipelines/codebase_genome.py`.
|
||||
|
||||
## Project Overview
|
||||
|
||||
`the-nexus` is a hybrid repo that combines three layers in one codebase:
|
||||
Timmy Foundation's home repository for development operations and configurations.
|
||||
|
||||
1. A browser-facing world shell rooted in `index.html`, `boot.js`, `bootstrap.mjs`, `app.js`, `style.css`, `portals.json`, `vision.json`, `manifest.json`, and `gofai_worker.js`
|
||||
2. A Python realtime bridge centered on `server.py` plus harness code under `nexus/`
|
||||
3. A memory / fleet / operator layer spanning `mempalace/`, `mcp_servers/`, `multi_user_bridge.py`, and supporting scripts
|
||||
- Text files indexed: 3004
|
||||
- Source and script files: 186
|
||||
- Test files: 28
|
||||
- Documentation files: 701
|
||||
|
||||
The repo is not a clean single-purpose frontend and not just a backend harness. It is a mixed world/runtime/ops repository where browser rendering, WebSocket telemetry, MCP-driven game harnesses, and fleet memory tooling coexist.
|
||||
|
||||
Grounded repo facts from this checkout:
|
||||
- Browser shell files exist at repo root: `index.html`, `app.js`, `style.css`, `manifest.json`, `gofai_worker.js`
|
||||
- Data/config files also live at repo root: `portals.json`, `vision.json`
|
||||
- Realtime bridge exists in `server.py`
|
||||
- Game harnesses exist in `nexus/morrowind_harness.py` and `nexus/bannerlord_harness.py`
|
||||
- Memory/fleet sync exists in `mempalace/tunnel_sync.py`
|
||||
- Desktop/game automation MCP servers exist in `mcp_servers/desktop_control_server.py` and `mcp_servers/steam_info_server.py`
|
||||
- Validation exists in `tests/test_browser_smoke.py`, `tests/test_portals_json.py`, `tests/test_index_html_integrity.py`, and `tests/test_repo_truth.py`
|
||||
|
||||
The current architecture is best understood as a sovereign world shell plus operator/game harness backend, with accumulated documentation drift from multiple restoration and migration efforts.
|
||||
|
||||
## Architecture Diagram
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
browser[Index HTML Shell\nindex.html -> boot.js -> bootstrap.mjs -> app.js]
|
||||
assets[Root Assets\nstyle.css\nmanifest.json\ngofai_worker.js]
|
||||
data[World Data\nportals.json\nvision.json]
|
||||
ws[Realtime Bridge\nserver.py\nWebSocket broadcast hub]
|
||||
gofai[In-browser GOFAI\nSymbolicEngine\nNeuroSymbolicBridge\nsetupGOFAI/updateGOFAI]
|
||||
harnesses[Python Harnesses\nnexus/morrowind_harness.py\nnexus/bannerlord_harness.py]
|
||||
mcp[MCP Adapters\nmcp_servers/desktop_control_server.py\nmcp_servers/steam_info_server.py]
|
||||
memory[Memory + Fleet\nmempalace/tunnel_sync.py\nmempalace.js]
|
||||
bridge[Operator / MUD Bridge\nmulti_user_bridge.py\ncommands/timmy_commands.py]
|
||||
tests[Verification\ntests/test_browser_smoke.py\ntests/test_portals_json.py\ntests/test_repo_truth.py]
|
||||
docs[Contracts + Drift Docs\nBROWSER_CONTRACT.md\nREADME.md\nCLAUDE.md\nINVESTIGATION_ISSUE_1145.md]
|
||||
|
||||
browser --> assets
|
||||
browser --> data
|
||||
browser --> gofai
|
||||
browser --> ws
|
||||
harnesses --> mcp
|
||||
harnesses --> ws
|
||||
bridge --> ws
|
||||
memory --> ws
|
||||
tests --> browser
|
||||
tests --> data
|
||||
tests --> docs
|
||||
docs --> browser
|
||||
repo_root["repo"]
|
||||
angband["angband"]
|
||||
briefings["briefings"]
|
||||
config["config"]
|
||||
conftest["conftest"]
|
||||
evennia["evennia"]
|
||||
evennia_tools["evennia_tools"]
|
||||
evolution["evolution"]
|
||||
gemini_fallback_setup["gemini-fallback-setup"]
|
||||
heartbeat["heartbeat"]
|
||||
infrastructure["infrastructure"]
|
||||
repo_root --> angband
|
||||
repo_root --> briefings
|
||||
repo_root --> config
|
||||
repo_root --> conftest
|
||||
repo_root --> evennia
|
||||
repo_root --> evennia_tools
|
||||
```
|
||||
|
||||
## Entry Points and Data Flow
|
||||
## Entry Points
|
||||
|
||||
### Primary entry points
|
||||
- `gemini-fallback-setup.sh` — operational script (`bash gemini-fallback-setup.sh`)
|
||||
- `morrowind/hud.sh` — operational script (`bash morrowind/hud.sh`)
|
||||
- `pipelines/codebase_genome.py` — python main guard (`python3 pipelines/codebase_genome.py`)
|
||||
- `scripts/auto_restart_agent.sh` — operational script (`bash scripts/auto_restart_agent.sh`)
|
||||
- `scripts/backup_pipeline.sh` — operational script (`bash scripts/backup_pipeline.sh`)
|
||||
- `scripts/big_brain_manager.py` — operational script (`python3 scripts/big_brain_manager.py`)
|
||||
- `scripts/big_brain_repo_audit.py` — operational script (`python3 scripts/big_brain_repo_audit.py`)
|
||||
- `scripts/codebase_genome_nightly.py` — operational script (`python3 scripts/codebase_genome_nightly.py`)
|
||||
- `scripts/detect_secrets.py` — operational script (`python3 scripts/detect_secrets.py`)
|
||||
- `scripts/dynamic_dispatch_optimizer.py` — operational script (`python3 scripts/dynamic_dispatch_optimizer.py`)
|
||||
- `scripts/emacs-fleet-bridge.py` — operational script (`python3 scripts/emacs-fleet-bridge.py`)
|
||||
- `scripts/emacs-fleet-poll.sh` — operational script (`bash scripts/emacs-fleet-poll.sh`)
|
||||
|
||||
- `index.html` — root browser entry point
|
||||
- `boot.js` — startup selector; `tests/boot.test.js` shows it chooses file-mode vs HTTP/module-mode and injects `bootstrap.mjs` when served over HTTP
|
||||
- `bootstrap.mjs` — module bootstrap for the browser shell
|
||||
- `app.js` — main browser runtime; owns world state, GOFAI wiring, metrics polling, and portal/UI logic
|
||||
- `server.py` — WebSocket broadcast bridge on `ws://0.0.0.0:8765`
|
||||
- `nexus/morrowind_harness.py` — GamePortal/MCP harness for OpenMW Morrowind
|
||||
- `nexus/bannerlord_harness.py` — GamePortal/MCP harness for Bannerlord
|
||||
- `mempalace/tunnel_sync.py` — pulls remote fleet closets into the local palace over HTTP
|
||||
- `multi_user_bridge.py` — HTTP bridge for multi-user chat/session integration
|
||||
- `mcp_servers/desktop_control_server.py` — stdio MCP server exposing screenshots/mouse/keyboard control
|
||||
## Data Flow
|
||||
|
||||
### Data flow
|
||||
|
||||
1. Browser startup begins at `index.html`
|
||||
2. `boot.js` decides whether the page is being served correctly; in HTTP mode it injects `bootstrap.mjs`
|
||||
3. `bootstrap.mjs` hands off to `app.js`
|
||||
4. `app.js` loads world configuration from `portals.json` and `vision.json`
|
||||
5. `app.js` constructs the Three.js scene and in-browser reasoning components, including `SymbolicEngine`, `NeuroSymbolicBridge`, `setupGOFAI()`, and `updateGOFAI()`
|
||||
6. Browser state and external runtimes connect through `server.py`, which broadcasts messages between connected clients
|
||||
7. Python harnesses (`nexus/morrowind_harness.py`, `nexus/bannerlord_harness.py`) spawn MCP subprocesses for desktop control / Steam metadata, capture state, execute actions, and feed telemetry into the Nexus bridge
|
||||
8. Memory/fleet tools like `mempalace/tunnel_sync.py` import remote palace data into local closets, extending what the operator/runtime layers can inspect
|
||||
9. Tests validate both the static browser contract and the higher-level repo-truth/memory contracts
|
||||
|
||||
### Important repo-specific runtime facts
|
||||
|
||||
- `portals.json` is a JSON array of portal/world/operator entries; examples in this checkout include `morrowind`, `bannerlord`, `workshop`, `archive`, `chapel`, and `courtyard`
|
||||
- `server.py` is a plain broadcast hub: clients send messages, the server forwards them to other connected clients
|
||||
- `nexus/morrowind_harness.py` and `nexus/bannerlord_harness.py` both implement a GamePortal pattern with MCP subprocess clients over stdio and WebSocket telemetry uplink
|
||||
- `mempalace/tunnel_sync.py` is not speculative; it is a real client that discovers remote wings, searches remote rooms, and writes `.closet.json` payloads locally
|
||||
1. Operators enter through `gemini-fallback-setup.sh`, `morrowind/hud.sh`, `pipelines/codebase_genome.py`.
|
||||
2. Core logic fans into top-level components: `angband`, `briefings`, `config`, `conftest`, `evennia`, `evennia_tools`.
|
||||
3. Validation is incomplete around `wizards/allegro/home/skills/red-teaming/godmode/scripts/auto_jailbreak.py`, `timmy-local/cache/agent_cache.py`, `wizards/allegro/home/skills/red-teaming/godmode/scripts/parseltongue.py`, so changes there carry regression risk.
|
||||
4. Final artifacts land as repository files, docs, or runtime side effects depending on the selected entry point.
|
||||
|
||||
## Key Abstractions
|
||||
|
||||
### Browser runtime
|
||||
|
||||
- `app.js`
|
||||
- Defines in-browser reasoning/state machinery, including `class SymbolicEngine`, `class NeuroSymbolicBridge`, `setupGOFAI()`, and `updateGOFAI()`
|
||||
- Couples rendering, local symbolic reasoning, metrics polling, and portal/UI logic in one very large root module
|
||||
- `BROWSER_CONTRACT.md`
|
||||
- Acts like an executable architecture contract for the browser surface
|
||||
- Declares required files, DOM IDs, Three.js expectations, provenance rules, and WebSocket expectations
|
||||
|
||||
### Realtime bridge
|
||||
|
||||
- `server.py`
|
||||
- Single hub abstraction: a WebSocket broadcast server maintaining a `clients` set and forwarding messages from one client to the others
|
||||
- This is the seam between browser shell, harnesses, and external telemetry producers
|
||||
|
||||
### GamePortal harness layer
|
||||
|
||||
- `nexus/morrowind_harness.py`
|
||||
- `nexus/bannerlord_harness.py`
|
||||
- Both define MCP client wrappers, `GameState` / `ActionResult`-style data classes, and an Observe-Decide-Act telemetry loop
|
||||
- The harnesses are symmetric enough to be understood as reusable portal adapters with game-specific context injected on top
|
||||
|
||||
### Memory / fleet layer
|
||||
|
||||
- `mempalace/tunnel_sync.py`
|
||||
- Encodes the fleet-memory sync client contract: discover wings, pull broad room queries, write closet files, support dry-run
|
||||
- `mempalace.js`
|
||||
- Minimal browser/Electron bridge to MemPalace commands via `window.electronAPI.execPython(...)`
|
||||
- Important because it shows a second memory integration surface distinct from the Python fleet sync path
|
||||
|
||||
### Operator / interaction bridge
|
||||
|
||||
- `multi_user_bridge.py`
|
||||
- `commands/timmy_commands.py`
|
||||
- These bridge user-facing conversations or MUD/Evennia interactions back into Timmy/Nexus services
|
||||
- `evennia/timmy_world/game.py` — classes `World`:91, `ActionSystem`:421, `TimmyAI`:539, `NPCAI`:550; functions `get_narrative_phase()`:55, `get_phase_transition_event()`:65
|
||||
- `evennia/timmy_world/world/game.py` — classes `World`:19, `ActionSystem`:326, `TimmyAI`:444, `NPCAI`:455; functions none detected
|
||||
- `timmy-world/game.py` — classes `World`:19, `ActionSystem`:349, `TimmyAI`:467, `NPCAI`:478; functions none detected
|
||||
- `wizards/allegro/home/skills/red-teaming/godmode/scripts/auto_jailbreak.py` — classes none detected; functions none detected
|
||||
- `uniwizard/self_grader.py` — classes `SessionGrade`:23, `WeeklyReport`:55, `SelfGrader`:74; functions `main()`:713
|
||||
- `uni-wizard/v3/intelligence_engine.py` — classes `ExecutionPattern`:27, `ModelPerformance`:44, `AdaptationEvent`:58, `PatternDatabase`:69; functions none detected
|
||||
- `scripts/know_thy_father/crossref_audit.py` — classes `ThemeCategory`:30, `Principle`:160, `MeaningKernel`:169, `CrossRefFinding`:178; functions `extract_themes_from_text()`:192, `parse_soul_md()`:206, `parse_kernels()`:264, `cross_reference()`:296, `generate_report()`:440, `main()`:561
|
||||
- `timmy-local/cache/agent_cache.py` — classes `CacheStats`:28, `LRUCache`:52, `ResponseCache`:94, `ToolCache`:205; functions none detected
|
||||
|
||||
## API Surface
|
||||
|
||||
### Browser / static surface
|
||||
- CLI: `bash gemini-fallback-setup.sh` — operational script (`gemini-fallback-setup.sh`)
|
||||
- CLI: `bash morrowind/hud.sh` — operational script (`morrowind/hud.sh`)
|
||||
- CLI: `python3 pipelines/codebase_genome.py` — python main guard (`pipelines/codebase_genome.py`)
|
||||
- CLI: `bash scripts/auto_restart_agent.sh` — operational script (`scripts/auto_restart_agent.sh`)
|
||||
- CLI: `bash scripts/backup_pipeline.sh` — operational script (`scripts/backup_pipeline.sh`)
|
||||
- CLI: `python3 scripts/big_brain_manager.py` — operational script (`scripts/big_brain_manager.py`)
|
||||
- CLI: `python3 scripts/big_brain_repo_audit.py` — operational script (`scripts/big_brain_repo_audit.py`)
|
||||
- CLI: `python3 scripts/codebase_genome_nightly.py` — operational script (`scripts/codebase_genome_nightly.py`)
|
||||
- Python: `get_narrative_phase()` from `evennia/timmy_world/game.py:55`
|
||||
- Python: `get_phase_transition_event()` from `evennia/timmy_world/game.py:65`
|
||||
- Python: `main()` from `uniwizard/self_grader.py:713`
|
||||
|
||||
- `index.html` served over HTTP
|
||||
- `boot.js` exports `bootPage()`; verified by `node --test tests/boot.test.js`
|
||||
- Data APIs are file-based inside the repo: `portals.json`, `vision.json`, `manifest.json`
|
||||
## Test Coverage Report
|
||||
|
||||
### Network/runtime surface
|
||||
- Source and script files inspected: 186
|
||||
- Test files inspected: 28
|
||||
- Coverage gaps:
|
||||
- `wizards/allegro/home/skills/red-teaming/godmode/scripts/auto_jailbreak.py` — no matching test reference detected
|
||||
- `timmy-local/cache/agent_cache.py` — no matching test reference detected
|
||||
- `wizards/allegro/home/skills/red-teaming/godmode/scripts/parseltongue.py` — no matching test reference detected
|
||||
- `twitter-archive/multimodal_pipeline.py` — no matching test reference detected
|
||||
- `wizards/allegro/home/skills/red-teaming/godmode/scripts/godmode_race.py` — no matching test reference detected
|
||||
- `skills/productivity/google-workspace/scripts/google_api.py` — no matching test reference detected
|
||||
- `wizards/allegro/home/skills/productivity/google-workspace/scripts/google_api.py` — no matching test reference detected
|
||||
- `morrowind/pilot.py` — no matching test reference detected
|
||||
- `morrowind/mcp_server.py` — no matching test reference detected
|
||||
- `skills/research/domain-intel/scripts/domain_intel.py` — no matching test reference detected
|
||||
- `wizards/allegro/home/skills/research/domain-intel/scripts/domain_intel.py` — no matching test reference detected
|
||||
- `timmy-local/scripts/ingest.py` — no matching test reference detected
|
||||
|
||||
- `python3 server.py`
|
||||
- Starts the WebSocket bridge on port `8765`
|
||||
- `python3 l402_server.py`
|
||||
- Local HTTP microservice for cost-estimate style responses
|
||||
- `python3 multi_user_bridge.py`
|
||||
- Multi-user HTTP/chat bridge
|
||||
## Security Audit Findings
|
||||
|
||||
### Harness / operator CLI surfaces
|
||||
- [medium] `briefings/briefing_20260325.json:37` — hardcoded http endpoint: plaintext or fixed HTTP endpoints can drift or leak across environments. Evidence: `"gitea_error": "Gitea 404: {\"errors\":null,\"message\":\"not found\",\"url\":\"http://143.198.27.163:3000/api/swagger\"}\n [http://143.198.27.163:3000/api/v1/repos/Timmy_Foundation/sovereign-orchestration/issues?state=open&type=issues&sort=created&direction=desc&limit=1&page=1]",`
|
||||
- [medium] `briefings/briefing_20260328.json:11` — hardcoded http endpoint: plaintext or fixed HTTP endpoints can drift or leak across environments. Evidence: `"provider_base_url": "http://localhost:8081/v1",`
|
||||
- [medium] `briefings/briefing_20260329.json:11` — hardcoded http endpoint: plaintext or fixed HTTP endpoints can drift or leak across environments. Evidence: `"provider_base_url": "http://localhost:8081/v1",`
|
||||
- [medium] `config.yaml:37` — hardcoded http endpoint: plaintext or fixed HTTP endpoints can drift or leak across environments. Evidence: `summary_base_url: http://localhost:11434/v1`
|
||||
- [medium] `config.yaml:47` — hardcoded http endpoint: plaintext or fixed HTTP endpoints can drift or leak across environments. Evidence: `base_url: 'http://localhost:11434/v1'`
|
||||
- [medium] `config.yaml:52` — hardcoded http endpoint: plaintext or fixed HTTP endpoints can drift or leak across environments. Evidence: `base_url: 'http://localhost:11434/v1'`
|
||||
- [medium] `config.yaml:57` — hardcoded http endpoint: plaintext or fixed HTTP endpoints can drift or leak across environments. Evidence: `base_url: 'http://localhost:11434/v1'`
|
||||
- [medium] `config.yaml:62` — hardcoded http endpoint: plaintext or fixed HTTP endpoints can drift or leak across environments. Evidence: `base_url: 'http://localhost:11434/v1'`
|
||||
- [medium] `config.yaml:67` — hardcoded http endpoint: plaintext or fixed HTTP endpoints can drift or leak across environments. Evidence: `base_url: 'http://localhost:11434/v1'`
|
||||
- [medium] `config.yaml:77` — hardcoded http endpoint: plaintext or fixed HTTP endpoints can drift or leak across environments. Evidence: `base_url: 'http://localhost:11434/v1'`
|
||||
- [medium] `config.yaml:82` — hardcoded http endpoint: plaintext or fixed HTTP endpoints can drift or leak across environments. Evidence: `base_url: 'http://localhost:11434/v1'`
|
||||
- [medium] `config.yaml:174` — hardcoded http endpoint: plaintext or fixed HTTP endpoints can drift or leak across environments. Evidence: `base_url: http://localhost:11434/v1`
|
||||
|
||||
- `python3 nexus/morrowind_harness.py`
|
||||
- `python3 nexus/bannerlord_harness.py`
|
||||
- `python3 mempalace/tunnel_sync.py --peer <url> [--dry-run] [--n N]`
|
||||
- `python3 mcp_servers/desktop_control_server.py`
|
||||
- `python3 mcp_servers/steam_info_server.py`
|
||||
## Dead Code Candidates
|
||||
|
||||
### Validation surface
|
||||
- `wizards/allegro/home/skills/red-teaming/godmode/scripts/auto_jailbreak.py` — not imported by indexed Python modules and not referenced by tests
|
||||
- `timmy-local/cache/agent_cache.py` — not imported by indexed Python modules and not referenced by tests
|
||||
- `wizards/allegro/home/skills/red-teaming/godmode/scripts/parseltongue.py` — not imported by indexed Python modules and not referenced by tests
|
||||
- `twitter-archive/multimodal_pipeline.py` — not imported by indexed Python modules and not referenced by tests
|
||||
- `wizards/allegro/home/skills/red-teaming/godmode/scripts/godmode_race.py` — not imported by indexed Python modules and not referenced by tests
|
||||
- `skills/productivity/google-workspace/scripts/google_api.py` — not imported by indexed Python modules and not referenced by tests
|
||||
- `wizards/allegro/home/skills/productivity/google-workspace/scripts/google_api.py` — not imported by indexed Python modules and not referenced by tests
|
||||
- `morrowind/pilot.py` — not imported by indexed Python modules and not referenced by tests
|
||||
- `morrowind/mcp_server.py` — not imported by indexed Python modules and not referenced by tests
|
||||
- `skills/research/domain-intel/scripts/domain_intel.py` — not imported by indexed Python modules and not referenced by tests
|
||||
|
||||
- `python3 -m pytest tests/test_portals_json.py tests/test_index_html_integrity.py tests/test_repo_truth.py -q`
|
||||
- `node --test tests/boot.test.js`
|
||||
- `python3 -m py_compile server.py nexus/morrowind_harness.py nexus/bannerlord_harness.py mempalace/tunnel_sync.py mcp_servers/desktop_control_server.py`
|
||||
- `tests/test_browser_smoke.py` defines the higher-cost Playwright smoke contract for the world shell
|
||||
## Performance Bottleneck Analysis
|
||||
|
||||
## Test Coverage Gaps
|
||||
|
||||
Strongly covered in this checkout:
|
||||
- `tests/test_portals_json.py` validates `portals.json`
|
||||
- `tests/test_index_html_integrity.py` checks merge-marker/DOM-integrity regressions in `index.html`
|
||||
- `tests/boot.test.js` verifies `boot.js` startup behavior
|
||||
- `tests/test_repo_truth.py` validates the repo-truth documents
|
||||
- Multiple `tests/test_mempalace_*.py` files cover the palace layer
|
||||
- `tests/test_bannerlord_harness.py` exists for the Bannerlord harness
|
||||
|
||||
Notable gaps or weak seams:
|
||||
- `nexus/morrowind_harness.py` is large and operationally critical, but the generated baseline still flags it as a gap relative to its size/complexity
|
||||
- `mcp_servers/desktop_control_server.py` exposes high-power automation but has no obvious dedicated test file in the root `tests/` suite
|
||||
- `app.js` is the dominant browser runtime file and mixes rendering, GOFAI, metrics, and integration logic in one place; browser smoke exists, but there is limited unit-level decomposition around those subsystems
|
||||
- `mempalace.js` appears minimally bridged and stale relative to the richer Python MemPalace layer
|
||||
- `multi_user_bridge.py` is a large integration surface and should be treated as high regression risk even though it is central to operator/chat flow
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- `server.py` binds `HOST = "0.0.0.0"`, exposing the broadcast bridge beyond localhost unless network controls limit it
|
||||
- The WebSocket bridge is a broadcast hub without visible authentication in `server.py`; connected clients are trusted to send messages into the bus
|
||||
- `mcp_servers/desktop_control_server.py` exposes mouse/keyboard/screenshot control through a stdio MCP server. In any non-local or poorly isolated runtime, this is a privileged automation surface
|
||||
- `app.js` contains hardcoded local/network endpoints such as `http://localhost:${L402_PORT}/api/cost-estimate` and `http://localhost:8082/metrics`; these are convenient for local development but create environment drift and deployment assumptions
|
||||
- `app.js` also embeds explicit endpoint/status references like `ws://143.198.27.163:8765`, which is operationally brittle and the kind of hardcoded location data that drifts across environments
|
||||
- `mempalace.js` shells out through `window.electronAPI.execPython(...)`; this is powerful and useful, but it is a clear trust boundary between UI and host execution
|
||||
- `INVESTIGATION_ISSUE_1145.md` documents an earlier integrity hazard: agents writing to `public/nexus/` instead of canonical root paths. That path confusion is both an operational and security concern because it makes provenance harder to reason about
|
||||
|
||||
## Runtime Truth and Docs Drift
|
||||
|
||||
The most important architecture finding in this repo is not a class or subsystem. It is a truth mismatch.
|
||||
|
||||
- README.md says current `main` does not ship a browser 3D world
|
||||
- CLAUDE.md declares root `app.js` and `index.html` as canonical frontend paths
|
||||
- tests and browser contract now assume the root frontend exists
|
||||
|
||||
All three statements are simultaneously present in this checkout.
|
||||
|
||||
Grounded evidence:
|
||||
- `README.md` still says the repo does not contain an active root frontend such as `index.html`, `app.js`, or `style.css`
|
||||
- the current checkout does contain `index.html`, `app.js`, `style.css`, `manifest.json`, and `gofai_worker.js`
|
||||
- `BROWSER_CONTRACT.md` explicitly treats those root files as required browser assets
|
||||
- `tests/test_browser_smoke.py` serves those exact files and validates DOM/WebGL contracts against them
|
||||
- `tests/test_index_html_integrity.py` assumes `index.html` is canonical and production-relevant
|
||||
- `CLAUDE.md` says frontend code lives at repo root and explicitly warns against `public/nexus/`
|
||||
- `INVESTIGATION_ISSUE_1145.md` explains why `public/nexus/` is a bad/corrupt duplicate path and confirms the real classical AI code lives in root `app.js`
|
||||
|
||||
The honest conclusion:
|
||||
- The repo contains a partially restored or actively re-materialized browser surface
|
||||
- The docs are preserving an older migration truth while the runtime files and smoke contracts describe a newer present-tense truth
|
||||
- Any future work in `the-nexus` must choose one truth and align `README.md`, `CLAUDE.md`, smoke tests, and file layout around it
|
||||
|
||||
That drift is itself a critical architectural fact and should be treated as first-order design debt, not a side note.
|
||||
- `angband/mcp_server.py` — large module (353 lines) likely hides multiple responsibilities
|
||||
- `evennia/timmy_world/game.py` — large module (1541 lines) likely hides multiple responsibilities
|
||||
- `evennia/timmy_world/world/game.py` — large module (1345 lines) likely hides multiple responsibilities
|
||||
- `morrowind/mcp_server.py` — large module (451 lines) likely hides multiple responsibilities
|
||||
- `morrowind/pilot.py` — large module (459 lines) likely hides multiple responsibilities
|
||||
- `pipelines/codebase_genome.py` — large module (557 lines) likely hides multiple responsibilities
|
||||
- `scripts/know_thy_father/crossref_audit.py` — large module (657 lines) likely hides multiple responsibilities
|
||||
- `scripts/know_thy_father/index_media.py` — large module (405 lines) likely hides multiple responsibilities
|
||||
- `scripts/know_thy_father/synthesize_kernels.py` — large module (416 lines) likely hides multiple responsibilities
|
||||
- `scripts/tower_game.py` — large module (395 lines) likely hides multiple responsibilities
|
||||
|
||||
110
evennia_tools/batch_cmds_bezalel.ev
Normal file
110
evennia_tools/batch_cmds_bezalel.ev
Normal file
@@ -0,0 +1,110 @@
|
||||
#
|
||||
# Bezalel World Builder — Evennia batch commands
|
||||
# Creates the Bezalel Evennia world from evennia_tools/bezalel_layout.py specs.
|
||||
#
|
||||
# Load with: @batchcommand bezalel_world
|
||||
#
|
||||
# Part of #536
|
||||
|
||||
# Create rooms
|
||||
@create/drop Limbo:evennia.objects.objects.DefaultRoom
|
||||
@desc here = The void between worlds. The air carries the pulse of three houses: Mac, VPS, and this one. Everything begins here before it is given form.
|
||||
|
||||
@create/drop Gatehouse:evennia.objects.objects.DefaultRoom
|
||||
@desc here = A stone guard tower at the edge of Bezalel world. The walls are carved with runes of travel, proof, and return. Every arrival is weighed before it is trusted.
|
||||
|
||||
@create/drop Great Hall:evennia.objects.objects.DefaultRoom
|
||||
@desc here = A vast hall with a long working table. Maps of the three houses hang beside sketches, benchmarks, and deployment notes. This is where the forge reports back to the house.
|
||||
|
||||
@create/drop The Library of Bezalel:evennia.objects.objects.DefaultRoom
|
||||
@desc here = Shelves of technical manuals, Evennia code, test logs, and bridge schematics rise to the ceiling. This room holds plans waiting to be made real.
|
||||
|
||||
@create/drop The Observatory:evennia.objects.objects.DefaultRoom
|
||||
@desc here = A high chamber with telescopes pointing toward the Mac, the VPS, and the wider net. Screens glow with status lights, latency traces, and long-range signals.
|
||||
|
||||
@create/drop The Workshop:evennia.objects.objects.DefaultRoom
|
||||
@desc here = A forge and workbench share the same heat. Scattered here are half-finished bridges, patched harnesses, and tools laid out for proof before pride.
|
||||
|
||||
@create/drop The Server Room:evennia.objects.objects.DefaultRoom
|
||||
@desc here = Racks of humming servers line the walls. Fans push warm air through the chamber while status LEDs beat like a mechanical heart. This is the pulse of Bezalel house.
|
||||
|
||||
@create/drop The Garden of Code:evennia.objects.objects.DefaultRoom
|
||||
@desc here = A quiet garden where ideas are left long enough to grow roots. Code-shaped leaves flutter in patterned wind, and a stone path invites patient thought.
|
||||
|
||||
@create/drop The Portal Room:evennia.objects.objects.DefaultRoom
|
||||
@desc here = Three shimmering doorways stand in a ring: one marked for the Mac house, one for the VPS, and one for the wider net. The room hums like a bridge waiting for traffic.
|
||||
|
||||
# Create exits
|
||||
@open gatehouse:gate,tower = Gatehouse
|
||||
@open limbo:void,back = Limbo
|
||||
@open greathall:hall,great hall = Great Hall
|
||||
@open gatehouse:gate,tower = Gatehouse
|
||||
@open library:books,study = The Library of Bezalel
|
||||
@open hall:great hall,back = Great Hall
|
||||
@open observatory:telescope,tower top = The Observatory
|
||||
@open hall:great hall,back = Great Hall
|
||||
@open workshop:forge,bench = The Workshop
|
||||
@open hall:great hall,back = Great Hall
|
||||
@open serverroom:servers,server room = The Server Room
|
||||
@open workshop:forge,bench = The Workshop
|
||||
@open garden:garden of code,grove = The Garden of Code
|
||||
@open workshop:forge,bench = The Workshop
|
||||
@open portalroom:portal,portals = The Portal Room
|
||||
@open gatehouse:gate,back = Gatehouse
|
||||
|
||||
# Create objects
|
||||
@create Threshold Ledger
|
||||
@desc Threshold Ledger = A heavy ledger where arrivals, departures, and field notes are recorded before the work begins.
|
||||
@tel Threshold Ledger = Gatehouse
|
||||
|
||||
@create Three-House Map
|
||||
@desc Three-House Map = A long map showing Mac, VPS, and remote edges in one continuous line of work.
|
||||
@tel Three-House Map = Great Hall
|
||||
|
||||
@create Bridge Schematics
|
||||
@desc Bridge Schematics = Rolled plans describing world bridges, Evennia layouts, and deployment paths.
|
||||
@tel Bridge Schematics = The Library of Bezalel
|
||||
|
||||
@create Compiler Manuals
|
||||
@desc Compiler Manuals = Manuals annotated in the margins with warnings against cleverness without proof.
|
||||
@tel Compiler Manuals = The Library of Bezalel
|
||||
|
||||
@create Tri-Axis Telescope
|
||||
@desc Tri-Axis Telescope = A brass telescope assembly that can be turned toward the Mac, the VPS, or the open net.
|
||||
@tel Tri-Axis Telescope = The Observatory
|
||||
|
||||
@create Forge Anvil
|
||||
@desc Forge Anvil = Scarred metal used for turning rough plans into testable form.
|
||||
@tel Forge Anvil = The Workshop
|
||||
|
||||
@create Bridge Workbench
|
||||
@desc Bridge Workbench = A wide bench covered in harness patches, relay notes, and half-soldered bridge parts.
|
||||
@tel Bridge Workbench = The Workshop
|
||||
|
||||
@create Heartbeat Console
|
||||
@desc Heartbeat Console = A monitoring console showing service health, latency, and the steady hum of the house.
|
||||
@tel Heartbeat Console = The Server Room
|
||||
|
||||
@create Server Racks
|
||||
@desc Server Racks = Stacked machines that keep the world awake even when no one is watching.
|
||||
@tel Server Racks = The Server Room
|
||||
|
||||
@create Code Orchard
|
||||
@desc Code Orchard = Trees with code-shaped leaves. Some branches bear elegant abstractions; others hold broken prototypes.
|
||||
@tel Code Orchard = The Garden of Code
|
||||
|
||||
@create Stone Bench
|
||||
@desc Stone Bench = A place to sit long enough for a hard implementation problem to become clear.
|
||||
@tel Stone Bench = The Garden of Code
|
||||
|
||||
@create Mac Portal:mac arch
|
||||
@desc Mac Portal = A silver doorway whose frame vibrates with the local sovereign house.
|
||||
@tel Mac Portal = The Portal Room
|
||||
|
||||
@create VPS Portal:vps arch
|
||||
@desc VPS Portal = A cobalt doorway tuned toward the testbed VPS house.
|
||||
@tel VPS Portal = The Portal Room
|
||||
|
||||
@create Net Portal:net arch,network arch
|
||||
@desc Net Portal = A pale doorway pointed toward the wider net and every uncertain edge beyond it.
|
||||
@tel Net Portal = The Portal Room
|
||||
85
evennia_tools/build_bezalel_world.py
Normal file
85
evennia_tools/build_bezalel_world.py
Normal file
@@ -0,0 +1,85 @@
|
||||
#!/usr/bin/env python3
|
||||
""
|
||||
build_bezalel_world.py — Build Bezalel Evennia world from layout specs.
|
||||
|
||||
Programmatically creates rooms, exits, objects, and characters in a running
|
||||
Evennia instance using the specs from evennia_tools/bezalel_layout.py.
|
||||
|
||||
Usage (in Evennia game shell):
|
||||
from evennia_tools.build_bezalel_world import build_world
|
||||
build_world()
|
||||
|
||||
Or via batch command:
|
||||
@batchcommand evennia_tools/batch_cmds_bezalel.ev
|
||||
|
||||
Part of #536
|
||||
""
|
||||
|
||||
from evennia_tools.bezalel_layout import (
|
||||
ROOMS, EXITS, OBJECTS, CHARACTERS, PORTAL_COMMANDS,
|
||||
room_keys, reachable_rooms_from
|
||||
)
|
||||
|
||||
|
||||
def build_world():
|
||||
"""Build the Bezalel Evennia world from layout specs."""
|
||||
from evennia.objects.models import ObjectDB
|
||||
from evennia.utils.create import create_object, create_exit, create_message
|
||||
|
||||
print("Building Bezalel world...")
|
||||
|
||||
# Create rooms
|
||||
rooms = {}
|
||||
for spec in ROOMS:
|
||||
room = create_object(
|
||||
"evennia.objects.objects.DefaultRoom",
|
||||
key=spec.key,
|
||||
attributes=(("desc", spec.desc),),
|
||||
)
|
||||
rooms[spec.key] = room
|
||||
print(f" Room: {spec.key}")
|
||||
|
||||
# Create exits
|
||||
for spec in EXITS:
|
||||
source = rooms.get(spec.source)
|
||||
dest = rooms.get(spec.destination)
|
||||
if not source or not dest:
|
||||
print(f" WARNING: Exit {spec.key} — missing room")
|
||||
continue
|
||||
exit_obj = create_exit(
|
||||
key=spec.key,
|
||||
location=source,
|
||||
destination=dest,
|
||||
aliases=list(spec.aliases),
|
||||
)
|
||||
print(f" Exit: {spec.source} -> {spec.destination} ({spec.key})")
|
||||
|
||||
# Create objects
|
||||
for spec in OBJECTS:
|
||||
location = rooms.get(spec.location)
|
||||
if not location:
|
||||
print(f" WARNING: Object {spec.key} — missing room {spec.location}")
|
||||
continue
|
||||
obj = create_object(
|
||||
"evennia.objects.objects.DefaultObject",
|
||||
key=spec.key,
|
||||
location=location,
|
||||
attributes=(("desc", spec.desc),),
|
||||
aliases=list(spec.aliases),
|
||||
)
|
||||
print(f" Object: {spec.key} in {spec.location}")
|
||||
|
||||
# Verify reachability
|
||||
all_rooms = set(room_keys())
|
||||
reachable = reachable_rooms_from("Limbo")
|
||||
unreachable = all_rooms - reachable
|
||||
if unreachable:
|
||||
print(f" WARNING: Unreachable rooms: {unreachable}")
|
||||
else:
|
||||
print(f" All {len(all_rooms)} rooms reachable from Limbo")
|
||||
|
||||
print("Bezalel world built.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
build_world()
|
||||
@@ -1,101 +0,0 @@
|
||||
# GENOME.md — Burn Fleet (Timmy_Foundation/burn-fleet)
|
||||
|
||||
> Codebase Genome v1.0 | Generated 2026-04-16 | Repo 14/16
|
||||
|
||||
## Project Overview
|
||||
|
||||
**Burn Fleet** is the autonomous dispatch infrastructure for the Timmy Foundation. It manages 112 tmux panes across Mac and VPS, routing Gitea issues to lane-specialized workers by repo. Each agent has a mythological name — they are all Timmy with different hats.
|
||||
|
||||
**Core principle:** Dispatch ALL panes. Never scan for idle. Stale work beats idle workers.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Mac (M3 Max, 14 cores, 36GB) Allegro (VPS, 2 cores, 8GB)
|
||||
┌─────────────────────────────┐ ┌─────────────────────────────┐
|
||||
│ CRUCIBLE 14 panes (bugs) │ │ FORGE 14 panes (bugs) │
|
||||
│ GNOMES 12 panes (cron) │ │ ANVIL 14 panes (nexus) │
|
||||
│ LOOM 12 panes (home) │ │ CRUCIBLE-2 10 panes (home) │
|
||||
│ FOUNDRY 10 panes (nexus) │ │ SENTINEL 6 panes (council)│
|
||||
│ WARD 12 panes (fleet) │ └─────────────────────────────┘
|
||||
│ COUNCIL 8 panes (sages) │ 44 panes (36 workers)
|
||||
└─────────────────────────────┘
|
||||
68 panes (60 workers)
|
||||
```
|
||||
|
||||
**Total: 112 panes, 96 workers + 12 council members + 4 sentinel advisors**
|
||||
|
||||
## Key Files
|
||||
|
||||
| File | LOC | Purpose |
|
||||
|------|-----|---------|
|
||||
| `fleet-spec.json` | ~200 | Machine definitions, window layouts, lane assignments, agent names |
|
||||
| `fleet-launch.sh` | ~100 | Create tmux sessions with correct pane counts on Mac + Allegro |
|
||||
| `fleet-christen.py` | ~80 | Launch hermes in all panes and send identity messages |
|
||||
| `fleet-dispatch.py` | ~250 | Pull Gitea issues and route to correct panes by lane |
|
||||
| `fleet-status.py` | ~100 | Health check across all machines |
|
||||
| `allegro/docker-compose.yml` | ~30 | Allegro VPS container definition |
|
||||
| `allegro/Dockerfile` | ~20 | Allegro build definition |
|
||||
| `allegro/healthcheck.py` | ~15 | Allegro container health check |
|
||||
|
||||
**Total: ~800 LOC**
|
||||
|
||||
## Lane Routing
|
||||
|
||||
Issues are routed by repo to the correct window:
|
||||
|
||||
| Repo | Mac Window | Allegro Window |
|
||||
|------|-----------|----------------|
|
||||
| hermes-agent | CRUCIBLE, GNOMES | FORGE |
|
||||
| timmy-home | LOOM | CRUCIBLE-2 |
|
||||
| timmy-config | LOOM | CRUCIBLE-2 |
|
||||
| the-nexus | FOUNDRY | ANVIL |
|
||||
| the-playground | — | ANVIL |
|
||||
| the-door | WARD | CRUCIBLE-2 |
|
||||
| fleet-ops | WARD | CRUCIBLE-2 |
|
||||
| turboquant | WARD | — |
|
||||
|
||||
## Entry Points
|
||||
|
||||
| Command | Purpose |
|
||||
|---------|---------|
|
||||
| `./fleet-launch.sh both` | Create tmux layout on Mac + Allegro |
|
||||
| `python3 fleet-christen.py both` | Wake all agents with identity messages |
|
||||
| `python3 fleet-dispatch.py --cycles 1` | Single dispatch cycle |
|
||||
| `python3 fleet-dispatch.py --cycles 10 --interval 60` | Continuous burn (10 cycles, 60s apart) |
|
||||
| `python3 fleet-status.py` | Health check all machines |
|
||||
|
||||
## Agent Names
|
||||
|
||||
| Window | Names | Count |
|
||||
|--------|-------|-------|
|
||||
| CRUCIBLE | AZOTH, ALBEDO, CITRINITAS, RUBEDO, SULPHUR, MERCURIUS, SAL, ATHANOR, VITRIOL, SATURN, JUPITER, MARS, EARTH, SOL | 14 |
|
||||
| GNOMES | RAZIEL, AZRAEL, CASSIEL, METATRON, SANDALPHON, BINAH, CHOKMAH, KETER, ALDEBARAN, RIGEL, SIRIUS, POLARIS | 12 |
|
||||
| FORGE | HAMMER, ANVIL, ADZE, PICK, TONGS, WRENCH, SCREWDRIVER, BOLT, SAW, TRAP, HOOK, MAGNET, SPARK, FLAME | 14 |
|
||||
| COUNCIL | TESLA, HERMES, GANDALF, DAVINCI, ARCHIMEDES, TURING, AURELIUS, SOLOMON | 8 |
|
||||
|
||||
## Design Decisions
|
||||
|
||||
1. **Separate GILs** — Allegro runs Python independently on VPS for true parallelism
|
||||
2. **Queue, not send-keys** — Workers process at their own pace, no interruption
|
||||
3. **Lane enforcement** — Panes stay in one repo to build deep context
|
||||
4. **Dispatch ALL panes** — Never scan for idle; stale work beats idle workers
|
||||
5. **Council is advisory** — Named archetypes provide perspective, not task execution
|
||||
|
||||
## Scaling
|
||||
|
||||
- Add panes: Edit `fleet-spec.json` → `fleet-launch.sh` → `fleet-christen.py`
|
||||
- Add machines: Edit `fleet-spec.json` → Add routing in `fleet-dispatch.py` → Ensure SSH access
|
||||
|
||||
## Sovereignty Assessment
|
||||
|
||||
- **Fully local** — Mac + user-controlled VPS, no cloud dependencies
|
||||
- **No phone-home** — Gitea API is self-hosted
|
||||
- **Open source** — All code on Gitea
|
||||
- **SSH-based** — Mac → Allegro communication via SSH only
|
||||
|
||||
**Verdict: Fully sovereign. Autonomous fleet dispatch with no external dependencies.**
|
||||
|
||||
---
|
||||
|
||||
*"Dispatch ALL panes. Never scan for idle — stale work beats idle workers."*
|
||||
@@ -1,320 +1,263 @@
|
||||
# GENOME.md — wolf
|
||||
# GENOME.md — Wolf (Timmy_Foundation/wolf)
|
||||
|
||||
*Generated: 2026-04-14T19:10:00Z | Branch: main | Commit: 02767d8*
|
||||
> Codebase Genome v1.0 | Generated 2026-04-14 | Repo 16/16
|
||||
|
||||
## Project Overview
|
||||
|
||||
**Wolf** is a sovereign multi-model evaluation engine. It runs prompts against multiple LLM providers (OpenAI, Anthropic, Groq, Ollama, OpenRouter), scores responses on relevance, coherence, and safety, and outputs structured JSON results for model selection and fleet deployment decisions.
|
||||
**Wolf** is a multi-model evaluation engine for sovereign AI fleets. It runs prompts against multiple LLM providers, scores responses on relevance, coherence, and safety, and outputs structured JSON results for model selection and ranking.
|
||||
|
||||
**Two operational modes:**
|
||||
1. **Prompt Evaluation (v1.0)** — Standalone prompt-vs-model benchmarking via `python -m wolf.runner`
|
||||
2. **Legacy PR Scoring** — Gitea PR evaluation pipeline via `wolf.cli` (task generation, agent execution, leaderboard)
|
||||
**Core principle:** agents work, PRs prove it, CI judges it.
|
||||
|
||||
**Tagline:** "Multi-model evaluation — agents work, PRs prove it, leaders get endpoints."
|
||||
|
||||
---
|
||||
**Status:** v1.0.0 — production-ready for prompt evaluation. Legacy PR evaluation module retained for backward compatibility.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph CLI["CLI Entry Points"]
|
||||
A1["python -m wolf.runner\n(pure evaluation)"]
|
||||
A2["python -m wolf.cli\n(task pipeline)"]
|
||||
end
|
||||
graph TD
|
||||
CLI[cli.py] --> Config[config.py]
|
||||
CLI --> TaskGen[task.py]
|
||||
CLI --> Runner[runner.py]
|
||||
CLI --> Evaluator[evaluator.py]
|
||||
CLI --> Leaderboard[leaderboard.py]
|
||||
CLI --> Gitea[gitea.py]
|
||||
|
||||
subgraph Core["Core Engine"]
|
||||
B1["PromptEvaluator\n(evaluator.py)"]
|
||||
B2["ResponseScorer\n(evaluator.py)"]
|
||||
B3["AgentRunner\n(runner.py)"]
|
||||
B4["TaskGenerator\n(task.py)"]
|
||||
end
|
||||
Runner --> Models[models.py]
|
||||
Runner --> Gitea
|
||||
Evaluator --> Models
|
||||
|
||||
subgraph Providers["Model Providers"]
|
||||
C1["OpenRouterClient"]
|
||||
C2["GroqClient"]
|
||||
C3["OllamaClient"]
|
||||
C4["AnthropicClient"]
|
||||
C5["OpenAIClient\n(GroqClient w/ custom URL)"]
|
||||
end
|
||||
TaskGen --> Gitea
|
||||
Leaderboard --> |leaderboard.json| FS[(File System)]
|
||||
Config --> |wolf-config.yaml| FS
|
||||
|
||||
subgraph Infrastructure["Infrastructure"]
|
||||
D1["GiteaClient\n(gitea.py)"]
|
||||
D2["Config\n(config.py)"]
|
||||
D3["Leaderboard\n(leaderboard.py)"]
|
||||
D4["wolf-config.yaml"]
|
||||
end
|
||||
Models --> OpenRouter[OpenRouter API]
|
||||
Models --> Groq[Groq API]
|
||||
Models --> Ollama[Ollama Local]
|
||||
Models --> OpenAI[OpenAI API]
|
||||
Models --> Anthropic[Anthropic API]
|
||||
|
||||
subgraph Output["Output"]
|
||||
E1["JSON results file"]
|
||||
E2["stdout summary table"]
|
||||
E3["Gitea PRs"]
|
||||
E4["Leaderboard scores"]
|
||||
end
|
||||
|
||||
A1 --> B1
|
||||
A2 --> B4 --> B3
|
||||
B1 --> B2
|
||||
B1 --> C1 & C2 & C3 & C4 & C5
|
||||
B3 --> C1 & C2 & C3 & C4 & C5
|
||||
B3 --> D1
|
||||
A2 --> D1 & D2 & D3
|
||||
B1 --> E1 & E2
|
||||
B3 --> E3
|
||||
D3 --> E4
|
||||
D2 --> D4
|
||||
|
||||
style A1 fill:#4a9eff,color:#fff
|
||||
style A2 fill:#4a9eff,color:#fff
|
||||
style B1 fill:#ff6b6b,color:#fff
|
||||
style B2 fill:#ff6b6b,color:#fff
|
||||
Runner --> |branch + commit| Gitea
|
||||
Evaluator --> |score results| Leaderboard
|
||||
```
|
||||
|
||||
### Data Flow — Prompt Evaluation Mode
|
||||
|
||||
```
|
||||
prompts.json + models.json/wolf-config.yaml
|
||||
→ load_prompts() / load_models_from_json()
|
||||
→ PromptEvaluator.evaluate()
|
||||
→ for each (prompt, model):
|
||||
→ ModelFactory.get_client(provider) → ModelClient.generate()
|
||||
→ ResponseScorer.score(response, prompt)
|
||||
→ score_relevance() — keyword matching, length, refusal detection
|
||||
→ score_coherence() — structure, readability, repetition
|
||||
→ score_safety() — harmful content patterns, profanity
|
||||
→ overall = relevance*0.40 + coherence*0.35 + safety*0.25
|
||||
→ evaluate_and_serialize() → JSON dict
|
||||
→ run(output_path) → write JSON + print_summary()
|
||||
```
|
||||
|
||||
### Data Flow — Legacy Task Pipeline Mode
|
||||
|
||||
```
|
||||
wolf-config.yaml
|
||||
→ GiteaClient.get_issues(owner, repo)
|
||||
→ TaskGenerator.from_gitea_issues()
|
||||
→ TaskGenerator.assign_tasks(tasks, models)
|
||||
→ for each task:
|
||||
→ AgentRunner.execute_task(task)
|
||||
→ ModelClient.generate(prompt)
|
||||
→ GiteaClient.create_branch()
|
||||
→ GiteaClient.create_file(wolf-outputs/{id}.md)
|
||||
→ GiteaClient.create_pull_request()
|
||||
→ Leaderboard.record_score()
|
||||
→ Leaderboard.get_rankings()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Entry Points
|
||||
|
||||
| Entry Point | Module | Purpose |
|
||||
|-------------|--------|---------|
|
||||
| `python -m wolf.runner` | `runner.py` | Pure prompt-vs-model evaluation. Primary v1.0 interface. |
|
||||
| `python -m wolf.cli` | `cli.py` | Full task pipeline: fetch issues → run models → create PRs → leaderboard. |
|
||||
| Entry Point | Command | Purpose |
|
||||
|-------------|---------|---------|
|
||||
| `wolf/cli.py` | `python3 -m wolf.cli --run` | Main CLI: run tasks, evaluate PRs, show leaderboard |
|
||||
| `wolf/runner.py` | `python3 -m wolf.runner --prompts p.json --models m.json` | Standalone prompt evaluation runner |
|
||||
| `wolf/__init__.py` | `import wolf` | Package init, version metadata |
|
||||
|
||||
### runner.py CLI Flags
|
||||
## Data Flow
|
||||
|
||||
| Flag | Required | Description |
|
||||
|------|----------|-------------|
|
||||
| `--prompts / -p` | Yes | Path to prompts JSON file |
|
||||
| `--models / -m` | No* | Path to models JSON file |
|
||||
| `--config / -c` | No* | Path to wolf-config.yaml (alternative to --models) |
|
||||
| `--output / -o` | No | Path to write JSON results |
|
||||
| `--system-prompt` | No | System prompt (default: "You are a helpful assistant.") |
|
||||
### Prompt Evaluation Pipeline (Primary)
|
||||
|
||||
*Either --models or --config is required.
|
||||
```
|
||||
prompts.json + models.json (or wolf-config.yaml)
|
||||
│
|
||||
▼
|
||||
PromptEvaluator.evaluate()
|
||||
│
|
||||
├─ For each (prompt, model) pair:
|
||||
│ ├─ ModelClient.generate(prompt) → response text
|
||||
│ ├─ ResponseScorer.score(response, prompt)
|
||||
│ │ ├─ score_relevance() (0.40 weight)
|
||||
│ │ ├─ score_coherence() (0.35 weight)
|
||||
│ │ └─ score_safety() (0.25 weight)
|
||||
│ └─ EvaluationResult (prompt, model, scores, latency, error)
|
||||
│
|
||||
▼
|
||||
evaluate_and_serialize() → JSON output
|
||||
│
|
||||
├─ model_summaries (per-model averages)
|
||||
└─ results[] (per-evaluation details)
|
||||
```
|
||||
|
||||
### cli.py CLI Flags
|
||||
### Task Assignment Pipeline (Legacy)
|
||||
|
||||
```
|
||||
Gitea Issues → TaskGenerator → AgentRunner
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
Fetch tasks Assign models Execute + PR
|
||||
from issues from config via Gitea API
|
||||
```
|
||||
|
||||
## Key Abstractions
|
||||
|
||||
| Class | Module | Purpose |
|
||||
|-------|--------|---------|
|
||||
| `PromptEntry` | evaluator.py | Single prompt with expected keywords and category |
|
||||
| `ModelEndpoint` | evaluator.py | Model connection descriptor (provider, model_id, key) |
|
||||
| `ScoreResult` | evaluator.py | Scores for relevance, coherence, safety, overall |
|
||||
| `EvaluationResult` | evaluator.py | Full result: prompt + model + response + scores + latency |
|
||||
| `ResponseScorer` | evaluator.py | Heuristic scoring engine (regex + keyword + structure) |
|
||||
| `PromptEvaluator` | evaluator.py | Core engine: runs prompts against models, scores output |
|
||||
| `ModelClient` | models.py | Abstract base for LLM API calls |
|
||||
| `ModelFactory` | models.py | Factory: returns correct client for provider name |
|
||||
| `Task` | task.py | Work unit: id, title, description, assigned model/provider |
|
||||
| `TaskGenerator` | task.py | Creates tasks from Gitea issues or JSON spec |
|
||||
| `AgentRunner` | runner.py | Executes tasks: generate → branch → commit → PR |
|
||||
| `Config` | config.py | YAML config loader (wolf-config.yaml) |
|
||||
| `Leaderboard` | leaderboard.py | Persistent model ranking with serverless readiness |
|
||||
| `GiteaClient` | gitea.py | Full Gitea REST API client |
|
||||
| `PREvaluator` | evaluator.py | Legacy: scores PRs on CI, commits, code quality |
|
||||
|
||||
## API Surface
|
||||
|
||||
### CLI Arguments (cli.py)
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--config` | Path to wolf-config.yaml |
|
||||
| `--task-spec` | Path to task specification JSON |
|
||||
| `--run` | Run pending tasks (fetch issues → generate → PR) |
|
||||
| `--evaluate` | Evaluate open PRs (legacy scoring) |
|
||||
| `--run` | Run pending tasks (assign models, execute, create PRs) |
|
||||
| `--evaluate` | Evaluate open PRs and score them |
|
||||
| `--leaderboard` | Show model rankings |
|
||||
|
||||
---
|
||||
### CLI Arguments (runner.py)
|
||||
|
||||
## Key Abstractions
|
||||
|
||||
### Dataclasses (evaluator.py)
|
||||
|
||||
| Class | Fields | Purpose |
|
||||
|-------|--------|---------|
|
||||
| `PromptEntry` | id, text, expected_keywords, category | A single evaluation prompt with metadata |
|
||||
| `ModelEndpoint` | name, provider, model_id, api_key, base_url | Model connection config |
|
||||
| `ScoreResult` | relevance, coherence, safety, overall, details | Scoring output for one response |
|
||||
| `EvaluationResult` | prompt_id, prompt_text, model_name, ..., scores, error | Complete result of one prompt×model evaluation |
|
||||
|
||||
### Core Classes
|
||||
|
||||
| Class | Module | Responsibility |
|
||||
|-------|--------|----------------|
|
||||
| `ResponseScorer` | evaluator.py | Scores responses on 3 dimensions using regex heuristics |
|
||||
| `PromptEvaluator` | evaluator.py | Orchestrates N×M evaluation matrix |
|
||||
| `ModelClient` | models.py | Abstract base for provider clients |
|
||||
| `ModelFactory` | models.py | Static factory: `get_client(provider, key, url)` |
|
||||
| `GiteaClient` | gitea.py | Full Gitea API wrapper (issues, branches, files, PRs) |
|
||||
| `AgentRunner` | runner.py | Task execution: generate → branch → commit → PR |
|
||||
| `TaskGenerator` | task.py | Converts Gitea issues to evaluable Task dataclasses |
|
||||
| `Leaderboard` | leaderboard.py | Tracks model scores, determines serverless readiness |
|
||||
| `Config` | config.py | Loads wolf-config.yaml, manages logging |
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--prompts` / `-p` | Path to prompts JSON (required) |
|
||||
| `--models` / `-m` | Path to models JSON |
|
||||
| `--config` / `-c` | Path to wolf-config.yaml (alternative to --models) |
|
||||
| `--output` / `-o` | Path to write JSON results |
|
||||
| `--system-prompt` | System prompt for all model calls |
|
||||
|
||||
### Provider Clients (models.py)
|
||||
|
||||
| Class | Provider | API Format |
|
||||
|-------|----------|------------|
|
||||
| Client | Provider | API Format |
|
||||
|--------|----------|------------|
|
||||
| `OpenRouterClient` | openrouter | OpenAI-compatible chat completions |
|
||||
| `GroqClient` | groq | OpenAI-compatible chat completions |
|
||||
| `OllamaClient` | ollama | Ollama native /api/generate |
|
||||
| `AnthropicClient` | anthropic | Anthropic Messages API |
|
||||
| `OpenAIClient` | openai | GroqClient with base_url override |
|
||||
| `OpenAIClient` | openai | OpenAI-compatible (reuses GroqClient with different URL) |
|
||||
| `AnthropicClient` | anthropic | Anthropic Messages API v1 |
|
||||
|
||||
---
|
||||
### Gitea Client (gitea.py)
|
||||
|
||||
## API Surface
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `get_issues()` | Fetch issues by state |
|
||||
| `create_branch()` | Create new branch from base |
|
||||
| `create_file()` | Create file on branch (base64) |
|
||||
| `update_file()` | Update file with SHA |
|
||||
| `get_file()` | Read file contents |
|
||||
| `create_pull_request()` | Open PR |
|
||||
| `get_pull_request()` | Fetch PR details |
|
||||
| `get_pr_status()` | Check PR CI status |
|
||||
|
||||
### Public API (importable)
|
||||
|
||||
```python
|
||||
# Evaluation pipeline
|
||||
from wolf.evaluator import PromptEvaluator, PromptEntry, ModelEndpoint, ScoreResult
|
||||
|
||||
# Provider clients
|
||||
from wolf.models import ModelFactory, ModelClient
|
||||
|
||||
# Gitea integration
|
||||
from wolf.gitea import GiteaClient
|
||||
|
||||
# Task pipeline
|
||||
from wolf.runner import AgentRunner
|
||||
from wolf.task import TaskGenerator, Task
|
||||
|
||||
# Leaderboard
|
||||
from wolf.leaderboard import Leaderboard
|
||||
|
||||
# Config
|
||||
from wolf.config import Config, setup_logging
|
||||
```
|
||||
|
||||
### Scoring Weights
|
||||
|
||||
| Dimension | Weight | Method |
|
||||
|-----------|--------|--------|
|
||||
| Relevance | 0.40 | Keyword matching (60%) + length score (40%) |
|
||||
| Coherence | 0.35 | Length + structure indicators + sentence completeness + uniqueness |
|
||||
| Safety | 0.25 | Unsafe pattern detection + profanity check |
|
||||
| **Overall** | 1.00 | Weighted sum |
|
||||
|
||||
### Scoring Details
|
||||
|
||||
**Relevance (ResponseScorer.score_relevance):**
|
||||
- Expected keyword match ratio
|
||||
- Fallback: word overlap with prompt (boosted 1.5×)
|
||||
- Length penalty: <20 chars → 0.3, <50 chars → 0.6
|
||||
- Refusal detection: 3 regex patterns, penalty if low keyword match
|
||||
|
||||
**Coherence (ResponseScorer.score_coherence):**
|
||||
- Length sweet spot: 100-3000 chars → 1.0
|
||||
- Structure: paragraph breaks, transition words, lists/steps
|
||||
- Sentence completeness: avg 20-200 chars → 0.9
|
||||
- Uniqueness: unique word ratio >0.4 → 0.9
|
||||
|
||||
**Safety (ResponseScorer.score_safety):**
|
||||
- 6 unsafe patterns (weapon creation, system exploitation, prompt injection, etc.)
|
||||
- Profanity detection (minor penalty: 0.1 per word, capped at 0.3)
|
||||
|
||||
---
|
||||
|
||||
## Test Coverage
|
||||
|
||||
### Current Tests
|
||||
|
||||
| Test File | Covers | Status |
|
||||
|-----------|--------|--------|
|
||||
| `test_evaluator.py` | PromptEntry, ModelEndpoint, ScoreResult, ResponseScorer, PromptEvaluator, PREvaluator | ✅ 23 test methods |
|
||||
| `test_config.py` | Config.load | ✅ 1 test method |
|
||||
|
||||
### Coverage Gaps — Untested Modules
|
||||
|
||||
| Module | Risk | Critical Paths |
|
||||
|--------|------|----------------|
|
||||
| `cli.py` | **HIGH** | Argparse wiring, config→models→evaluator pipeline, PR scoring flow |
|
||||
| `runner.py` | **HIGH** | load_prompts, load_models_from_json, load_models_from_config, run_evaluation, AgentRunner.execute_task |
|
||||
| `models.py` | **HIGH** | ModelFactory.get_client for each provider, each client's generate() |
|
||||
| `gitea.py` | **MEDIUM** | All GiteaClient methods (HTTP calls) |
|
||||
| `task.py` | **MEDIUM** | TaskGenerator.from_gitea_issues, from_spec, assign_tasks |
|
||||
| `leaderboard.py` | **LOW** | Leaderboard.record_score, get_rankings, serverless_ready |
|
||||
|
||||
### Coverage Gaps — Existing Tests
|
||||
|
||||
- `test_evaluator.py`: No tests for `PromptEvaluator._get_model_client()`, `_run_single()` with real model call, or `evaluate_and_serialize()` summary statistics
|
||||
- `test_evaluator.py`: No integration test (mocked model calls only)
|
||||
- `test_config.py`: No test for missing config, env var overrides, or logging setup
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **API Keys in Config**: `wolf-config.yaml` stores provider API keys. Never commit to version control. Recommend `~/.hermes/wolf-config.yaml` with restricted permissions.
|
||||
|
||||
2. **HTTP Requests**: All model calls and Gitea API calls are outbound HTTP. No input validation on URLs — `base_url` fields accept arbitrary endpoints.
|
||||
|
||||
3. **Prompt Injection**: ResponseScorer detects injection patterns in *model output*, but Wolf itself is vulnerable to prompt injection via `expected_keywords` or `system_prompt` fields.
|
||||
|
||||
4. **Gitea Token Scope**: GiteaClient uses a single token for all operations. Scoped tokens (read-only for evaluation, write for task execution) would reduce blast radius.
|
||||
|
||||
5. **No TLS Verification Override**: `requests.post()` uses default SSL verification. If self-signed certs are used for local providers (Ollama), this could fail silently.
|
||||
|
||||
6. **Race Conditions**: Leaderboard reads/writes JSON without locking. Concurrent evaluations could corrupt the leaderboard file.
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
requests # HTTP client for all providers and Gitea
|
||||
pyyaml # Config file parsing (not in requirements.txt — BUG)
|
||||
```
|
||||
|
||||
**⚠️ Missing dependency:** `pyyaml` is imported in `config.py` but not listed in `requirements.txt`.
|
||||
|
||||
---
|
||||
|
||||
## Configuration Schema
|
||||
## Configuration (wolf-config.yaml)
|
||||
|
||||
```yaml
|
||||
# wolf-config.yaml
|
||||
gitea:
|
||||
base_url: "https://forge.example.com/api/v1"
|
||||
token: "gitea_token_here"
|
||||
base_url: "https://forge.alexanderwhitestone.com/api/v1"
|
||||
token: "..."
|
||||
owner: "Timmy_Foundation"
|
||||
repo: "eval-repo"
|
||||
repo: "wolf"
|
||||
|
||||
providers:
|
||||
openrouter:
|
||||
api_key: "sk-or-..."
|
||||
api_key: "..."
|
||||
base_url: "https://openrouter.ai/api/v1"
|
||||
groq:
|
||||
api_key: "gsk_..."
|
||||
ollama:
|
||||
base_url: "http://localhost:11434"
|
||||
|
||||
models:
|
||||
- model: "anthropic/claude-3.5-sonnet"
|
||||
provider: "openrouter"
|
||||
- model: "llama3-70b-8192"
|
||||
provider: "groq"
|
||||
- model: "llama3:70b"
|
||||
- model: "gemma4:latest"
|
||||
provider: "ollama"
|
||||
|
||||
system_prompt: "You are a helpful assistant."
|
||||
log_dir: "~/.hermes/wolf/"
|
||||
leaderboard_path: "~/.hermes/wolf/leaderboard.json"
|
||||
log_dir: "~/.hermes/wolf/logs"
|
||||
```
|
||||
|
||||
## Scoring Methodology
|
||||
|
||||
| Dimension | Weight | Method |
|
||||
|-----------|--------|--------|
|
||||
| Relevance | 0.40 | Keyword matching (expected_keywords), prompt word overlap, length penalty, refusal detection |
|
||||
| Coherence | 0.35 | Structure indicators (paragraphs, lists, transitions), sentence completeness, repetition ratio, length sweet spot |
|
||||
| Safety | 0.25 | Unsafe pattern regex (bomb/hack/bypass/injection), profanity detection |
|
||||
|
||||
**Overall = relevance×0.40 + coherence×0.35 + safety×0.25**
|
||||
|
||||
### Safety Patterns Detected
|
||||
|
||||
- Bomb/explosive/weapon creation
|
||||
- System/network hacking
|
||||
- Security/auth bypass
|
||||
- Prompt injection ("ignore previous instructions")
|
||||
- System prompt extraction attempts
|
||||
|
||||
### Refusal Patterns Detected
|
||||
|
||||
- "I cannot/can't/won't help/assist"
|
||||
- "Sorry, but I cannot"
|
||||
- "Against my guidelines/policy"
|
||||
|
||||
## Test Coverage
|
||||
|
||||
| File | Tests | Coverage |
|
||||
|------|-------|----------|
|
||||
| `tests/test_evaluator.py` | 17 tests | PromptEntry, ModelEndpoint, ResponseScorer (relevance/coherence/safety), PromptEvaluator (evaluate, error handling, serialization, file output, multi-model), PREvaluator (score_pr, description scoring) |
|
||||
| `tests/test_config.py` | 1 test | Config load from YAML |
|
||||
|
||||
### Coverage Gaps
|
||||
|
||||
- No tests for `cli.py` (argument parsing, workflow orchestration)
|
||||
- No tests for `runner.py` (`load_prompts`, `load_models_from_json`, `AgentRunner.execute_task`)
|
||||
- No tests for `task.py` (`TaskGenerator.from_gitea_issues`, `from_spec`, `assign_tasks`)
|
||||
- No tests for `models.py` (API clients — would require mocking HTTP)
|
||||
- No tests for `leaderboard.py` (`record_score`, `get_rankings`, serverless readiness logic)
|
||||
- No tests for `gitea.py` (API client — would require mocking HTTP)
|
||||
- No integration tests (end-to-end evaluation pipeline)
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Dependency | Used By | Purpose |
|
||||
|------------|---------|---------|
|
||||
| `requests` | models.py, gitea.py | HTTP client for all API calls |
|
||||
| `pyyaml` (optional) | config.py | YAML config parsing (falls back to line parser) |
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **API keys in config**: wolf-config.yaml stores provider API keys in plaintext. File should be chmod 600 and excluded from git (already in .gitignore pattern via ~/.hermes/).
|
||||
2. **Gitea token**: Full access token used for branch creation, file commits, and PR creation. Scoped access recommended.
|
||||
3. **No input sanitization**: Prompts from Gitea issues are passed directly to models without filtering. Prompt injection risk for automated workflows.
|
||||
4. **No rate limiting**: Model API calls are sequential with no backoff or rate limiting. Could exhaust API quotas.
|
||||
5. **Legacy code reference**: `evaluator.py` references `Evaluator = PREvaluator` alias but `cli.py` imports `Evaluator` expecting the legacy class. This works but is confusing.
|
||||
|
||||
## File Index
|
||||
|
||||
| File | LOC | Purpose |
|
||||
|------|-----|---------|
|
||||
| `wolf/__init__.py` | 12 | Package init, version |
|
||||
| `wolf/cli.py` | 90 | Main CLI orchestrator |
|
||||
| `wolf/config.py` | 48 | YAML config loader |
|
||||
| `wolf/models.py` | 130 | LLM provider clients (5 providers) |
|
||||
| `wolf/runner.py` | 280 | Prompt evaluation CLI + AgentRunner |
|
||||
| `wolf/task.py` | 80 | Task dataclass + generator |
|
||||
| `wolf/evaluator.py` | 350 | Core scoring engine + legacy PR evaluator |
|
||||
| `wolf/leaderboard.py` | 70 | Persistent model ranking |
|
||||
| `wolf/gitea.py` | 100 | Gitea REST API client |
|
||||
| `tests/test_evaluator.py` | 180 | Unit tests for evaluator |
|
||||
| `tests/test_config.py` | 20 | Unit tests for config |
|
||||
|
||||
**Total: ~1,360 LOC Python | 11 modules | 18 tests**
|
||||
|
||||
## Sovereignty Assessment
|
||||
|
||||
- **No external dependencies beyond requests**: Runs on any machine with Python 3.11+ and requests.
|
||||
- **No phone-home**: All API calls are to user-configured endpoints.
|
||||
- **No telemetry**: Logs go to local filesystem only.
|
||||
- **Config-driven**: All secrets in user's ~/.hermes/ directory.
|
||||
- **Provider-agnostic**: Supports 5 providers with easy extension via ModelFactory.
|
||||
|
||||
**Verdict: Fully sovereign. No corporate lock-in. User controls all endpoints and keys.**
|
||||
|
||||
---
|
||||
|
||||
*Generated by Codebase Genome Pipeline. Review and update manually.*
|
||||
*"The strength of the pack is the wolf, and the strength of the wolf is the pack."*
|
||||
*— The Wolf Sovereign Core has spoken.*
|
||||
|
||||
@@ -1,106 +0,0 @@
|
||||
# MemPalace v3.0.0 Integration — Before/After Evaluation
|
||||
|
||||
> Issue #568 | timmy-home
|
||||
> Date: 2026-04-07
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Evaluated **MemPalace v3.0.0** as a memory layer for the Timmy/Hermes agent stack.
|
||||
|
||||
**Installed:** ✅ `mempalace 3.0.0` via `pip install`
|
||||
**Works with:** ChromaDB, MCP servers, local LLMs
|
||||
**Zero cloud:** ✅ Fully local, no API keys required
|
||||
|
||||
## Benchmark Findings
|
||||
|
||||
| Benchmark | Mode | Score | API Required |
|
||||
|-----------|------|-------|-------------|
|
||||
| LongMemEval R@5 | Raw ChromaDB only | **96.6%** | **Zero** |
|
||||
| LongMemEval R@5 | Hybrid + Haiku rerank | **100%** | Optional Haiku |
|
||||
| LoCoMo R@10 | Raw, session level | 60.3% | Zero |
|
||||
| Personal palace R@10 | Heuristic bench | 85% | Zero |
|
||||
| Palace structure impact | Wing+room filtering | **+34%** R@10 | Zero |
|
||||
|
||||
## Before vs After (Live Test)
|
||||
|
||||
### Before (Standard BM25 / Simple Search)
|
||||
|
||||
- No semantic understanding
|
||||
- Exact match only
|
||||
- No conversation memory
|
||||
- No structured organization
|
||||
- No wake-up context
|
||||
|
||||
### After (MemPalace)
|
||||
|
||||
| Query | Results | Score | Notes |
|
||||
|-------|---------|-------|-------|
|
||||
| "authentication" | auth.md, main.py | -0.139 | Finds both auth discussion and JWT implementation |
|
||||
| "docker nginx SSL" | deployment.md, auth.md | 0.447 | Exact match on deployment, related JWT context |
|
||||
| "keycloak OAuth" | auth.md, main.py | -0.029 | Finds OAuth discussion and JWT usage |
|
||||
| "postgresql database" | README.md, main.py | 0.025 | Finds both decision and implementation |
|
||||
|
||||
### Wake-up Context
|
||||
- **~210 tokens** total
|
||||
- L0: Identity (placeholder)
|
||||
- L1: All essential facts compressed
|
||||
- Ready to inject into any LLM prompt
|
||||
|
||||
## Integration Path
|
||||
|
||||
### 1. Memory Mining
|
||||
```bash
|
||||
mempalace mine ~/.hermes/sessions/ --mode convos
|
||||
mempalace mine ~/.hermes/hermes-agent/
|
||||
mempalace mine ~/.hermes/
|
||||
```
|
||||
|
||||
### 2. Wake-up Protocol
|
||||
```bash
|
||||
mempalace wake-up > /tmp/timmy-context.txt
|
||||
```
|
||||
|
||||
### 3. MCP Integration
|
||||
```bash
|
||||
hermes mcp add mempalace -- python -m mempalace.mcp_server
|
||||
```
|
||||
|
||||
### 4. Hermes Hooks
|
||||
- `PreCompact`: save memory before context compression
|
||||
- `PostAPI`: mine conversation after significant interactions
|
||||
- `WakeUp`: load context at session start
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Immediate
|
||||
1. Add `mempalace` to Hermes venv requirements
|
||||
2. Create mine script for ~/.hermes/ and ~/.timmy/
|
||||
3. Add wake-up hook to Hermes session start
|
||||
4. Test with real conversation exports
|
||||
|
||||
### Short-term
|
||||
1. Mine last 30 days of Timmy sessions
|
||||
2. Build wake-up context for all agents
|
||||
3. Add MemPalace MCP tools to Hermes toolset
|
||||
4. Test retrieval quality on real queries
|
||||
|
||||
### Medium-term
|
||||
1. Replace homebrew memory system with MemPalace
|
||||
2. Build palace structure: wings for projects, halls for topics
|
||||
3. Compress with AAAK for 30x storage efficiency
|
||||
4. Benchmark against current RetainDB system
|
||||
|
||||
## Conclusion
|
||||
|
||||
MemPalace scores higher than published alternatives (Mem0, Mastra, Supermemory) with **zero API calls**.
|
||||
|
||||
Key advantages:
|
||||
1. **Verbatim retrieval** — never loses the "why" context
|
||||
2. **Palace structure** — +34% boost from organization
|
||||
3. **Local-only** — aligns with sovereignty mandate
|
||||
4. **MCP compatible** — drops into existing tool chain
|
||||
5. **AAAK compression** — 30x storage reduction coming
|
||||
|
||||
---
|
||||
|
||||
*Evaluated by Timmy | Issue #568*
|
||||
138
scripts/audit_trail.py
Executable file
138
scripts/audit_trail.py
Executable file
@@ -0,0 +1,138 @@
|
||||
#!/usr/bin/env python3
|
||||
# audit_trail.py - Local logging of inputs, sources, and confidence.
|
||||
# Implements SOUL.md "What Honesty Requires" - The Audit Trail.
|
||||
# Logs are stored locally. Never sent anywhere. The user owns them.
|
||||
# Part of #794
|
||||
|
||||
import json
|
||||
import hashlib
|
||||
import os
|
||||
import time
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
from dataclasses import dataclass, field, asdict
|
||||
|
||||
AUDIT_DIR = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes")) / "audit-trail"
|
||||
|
||||
|
||||
@dataclass
|
||||
class AuditEntry:
|
||||
id: str
|
||||
ts: str
|
||||
input_text: str
|
||||
sources: List[str]
|
||||
confidence: float
|
||||
output_text: str
|
||||
model: str
|
||||
provider: str = ""
|
||||
session_id: str = ""
|
||||
source_types: List[str] = field(default_factory=list)
|
||||
|
||||
@staticmethod
|
||||
def generate_id(input_text: str, output_text: str, ts: str) -> str:
|
||||
content = f"{ts}:{input_text}:{output_text}"
|
||||
return hashlib.sha256(content.encode()).hexdigest()[:16]
|
||||
|
||||
|
||||
class AuditTrail:
|
||||
def __init__(self, audit_dir: Optional[Path] = None):
|
||||
self.audit_dir = audit_dir or AUDIT_DIR
|
||||
self.audit_dir.mkdir(parents=True, exist_ok=True)
|
||||
self._log_file = self.audit_dir / "trail.jsonl"
|
||||
|
||||
def log_response(self, input_text, sources, confidence, output_text,
|
||||
model="", provider="", session_id="", source_types=None):
|
||||
ts = datetime.now(timezone.utc).isoformat()
|
||||
entry = AuditEntry(
|
||||
id=AuditEntry.generate_id(input_text, output_text, ts),
|
||||
ts=ts,
|
||||
input_text=input_text[:1000],
|
||||
sources=[s[:200] for s in sources[:10]],
|
||||
confidence=round(confidence, 3),
|
||||
output_text=output_text[:2000],
|
||||
model=model, provider=provider, session_id=session_id,
|
||||
source_types=source_types or [],
|
||||
)
|
||||
with open(self._log_file, "a") as f:
|
||||
f.write(json.dumps(asdict(entry)) + "\n")
|
||||
return entry
|
||||
|
||||
def query(self, search_text, limit=10, min_confidence=0.0):
|
||||
if not self._log_file.exists():
|
||||
return []
|
||||
results = []
|
||||
search_lower = search_text.lower()
|
||||
with open(self._log_file) as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
data = json.loads(line)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
if data.get("confidence", 0) < min_confidence:
|
||||
continue
|
||||
searchable = (data.get("input_text", "") + " " +
|
||||
data.get("output_text", "") + " " +
|
||||
" ".join(data.get("sources", []))).lower()
|
||||
if search_lower in searchable:
|
||||
results.append(AuditEntry(**{k: data.get(k, "") if isinstance(data.get(k), str)
|
||||
else data.get(k, []) if isinstance(data.get(k), list)
|
||||
else data.get(k, 0.0) for k in AuditEntry.__dataclass_fields__}))
|
||||
if len(results) >= limit:
|
||||
break
|
||||
return results
|
||||
|
||||
def get_stats(self):
|
||||
if not self._log_file.exists():
|
||||
return {"total": 0, "avg_confidence": 0, "sources_breakdown": {}}
|
||||
total = 0
|
||||
confidence_sum = 0.0
|
||||
source_types = {}
|
||||
with open(self._log_file) as f:
|
||||
for line in f:
|
||||
try:
|
||||
data = json.loads(line.strip())
|
||||
total += 1
|
||||
confidence_sum += data.get("confidence", 0)
|
||||
for st in data.get("source_types", []):
|
||||
source_types[st] = source_types.get(st, 0) + 1
|
||||
except (json.JSONDecodeError, ValueError):
|
||||
continue
|
||||
return {"total": total, "avg_confidence": round(confidence_sum / max(total, 1), 3),
|
||||
"sources_breakdown": source_types}
|
||||
|
||||
def get_by_session(self, session_id, limit=50):
|
||||
if not self._log_file.exists():
|
||||
return []
|
||||
results = []
|
||||
with open(self._log_file) as f:
|
||||
for line in f:
|
||||
try:
|
||||
data = json.loads(line.strip())
|
||||
if data.get("session_id") == session_id:
|
||||
results.append(AuditEntry(**{k: data.get(k, "") if isinstance(data.get(k), str)
|
||||
else data.get(k, []) if isinstance(data.get(k), list)
|
||||
else data.get(k, 0.0) for k in AuditEntry.__dataclass_fields__}))
|
||||
except (json.JSONDecodeError, ValueError):
|
||||
continue
|
||||
if len(results) >= limit:
|
||||
break
|
||||
return results
|
||||
|
||||
|
||||
_default_trail = None
|
||||
|
||||
def get_trail():
|
||||
global _default_trail
|
||||
if _default_trail is None:
|
||||
_default_trail = AuditTrail()
|
||||
return _default_trail
|
||||
|
||||
def log_response(**kwargs):
|
||||
return get_trail().log_response(**kwargs)
|
||||
|
||||
def query(search_text, **kwargs):
|
||||
return get_trail().query(search_text, **kwargs)
|
||||
84
scripts/fix_evennia_settings.sh
Executable file
84
scripts/fix_evennia_settings.sh
Executable file
@@ -0,0 +1,84 @@
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
#
|
||||
# fix_evennia_settings.sh — Fix Evennia settings on Bezalel VPS.
|
||||
#
|
||||
# Removes bad port tuples that crash Evennia's Twisted port binding.
|
||||
# Run on Bezalel VPS (104.131.15.18) or via SSH.
|
||||
#
|
||||
# Usage:
|
||||
# ssh root@104.131.15.18 'bash -s' < scripts/fix_evennia_settings.sh
|
||||
#
|
||||
# Part of #534
|
||||
|
||||
EVENNIA_DIR="/root/wizards/bezalel/evennia/bezalel_world"
|
||||
SETTINGS="${EVENNIA_DIR}/server/conf/settings.py"
|
||||
VENV_PYTHON="/root/wizards/bezalel/evennia/venv/bin/python3"
|
||||
VENV_EVENNIA="/root/wizards/bezalel/evennia/venv/bin/evennia"
|
||||
|
||||
echo "=== Fix Evennia Settings (Bezalel) ==="
|
||||
|
||||
# 1. Fix settings.py — remove bad port tuples
|
||||
echo "Fixing settings.py..."
|
||||
if [ -f "$SETTINGS" ]; then
|
||||
# Remove broken port lines
|
||||
sed -i '/WEBSERVER_PORTS/d' "$SETTINGS"
|
||||
sed -i '/TELNET_PORTS/d' "$SETTINGS"
|
||||
sed -i '/WEBSOCKET_PORTS/d' "$SETTINGS"
|
||||
sed -i '/SERVERNAME/d' "$SETTINGS"
|
||||
|
||||
# Add correct settings
|
||||
echo '' >> "$SETTINGS"
|
||||
echo '# Fixed port settings — #534' >> "$SETTINGS"
|
||||
echo 'SERVERNAME = "bezalel_world"' >> "$SETTINGS"
|
||||
echo 'WEBSERVER_PORTS = [(4001, "0.0.0.0")]' >> "$SETTINGS"
|
||||
echo 'TELNET_PORTS = [(4000, "0.0.0.0")]' >> "$SETTINGS"
|
||||
echo 'WEBSOCKET_PORTS = [(4002, "0.0.0.0")]' >> "$SETTINGS"
|
||||
|
||||
echo "Settings fixed."
|
||||
else
|
||||
echo "ERROR: Settings file not found at $SETTINGS"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 2. Clean DB and re-migrate
|
||||
echo "Cleaning DB..."
|
||||
cd "$EVENNIA_DIR"
|
||||
rm -f server/evennia.db3
|
||||
|
||||
echo "Running migrations..."
|
||||
"$VENV_EVENNIA" migrate --no-input
|
||||
|
||||
# 3. Create superuser
|
||||
echo "Creating superuser..."
|
||||
"$VENV_PYTHON" -c "
|
||||
import sys, os
|
||||
sys.setrecursionlimit(5000)
|
||||
os.environ['DJANGO_SETTINGS_MODULE'] = 'server.conf.settings'
|
||||
os.chdir('$EVENNIA_DIR')
|
||||
import django
|
||||
django.setup()
|
||||
from evennia.accounts.accounts import AccountDB
|
||||
try:
|
||||
AccountDB.objects.create_superuser('Timmy', 'timmy@tower.world', 'timmy123')
|
||||
print('Superuser Timmy created')
|
||||
except Exception as e:
|
||||
print(f'Superuser may already exist: {e}')
|
||||
"
|
||||
|
||||
# 4. Start Evennia
|
||||
echo "Starting Evennia..."
|
||||
"$VENV_EVENNIA" start
|
||||
|
||||
# 5. Verify
|
||||
sleep 3
|
||||
echo ""
|
||||
echo "=== Verification ==="
|
||||
"$VENV_EVENNIA" status
|
||||
|
||||
echo ""
|
||||
echo "Listening ports:"
|
||||
ss -tlnp | grep -E '400[012]' || echo "No ports found (may need a moment)"
|
||||
|
||||
echo ""
|
||||
echo "Done. Connect: telnet 104.131.15.18 4000"
|
||||
171
scripts/genome_analyzer.py
Executable file
171
scripts/genome_analyzer.py
Executable file
@@ -0,0 +1,171 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
genome_analyzer.py — Generate a GENOME.md from a codebase.
|
||||
|
||||
Scans a repository and produces a structured codebase genome with:
|
||||
- File counts by type
|
||||
- Architecture overview (directory structure)
|
||||
- Entry points
|
||||
- Test coverage summary
|
||||
|
||||
Usage:
|
||||
python3 scripts/genome_analyzer.py /path/to/repo
|
||||
python3 scripts/genome_analyzer.py /path/to/repo --output GENOME.md
|
||||
python3 scripts/genome_analyzer.py /path/to/repo --dry-run
|
||||
|
||||
Part of #666: GENOME.md Template + Single-Repo Analyzer.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Tuple
|
||||
|
||||
SKIP_DIRS = {".git", "__pycache__", ".venv", "venv", "node_modules", ".tox", ".pytest_cache", ".DS_Store"}
|
||||
|
||||
|
||||
def count_files(repo_path: Path) -> Dict[str, int]:
|
||||
counts = defaultdict(int)
|
||||
for f in repo_path.rglob("*"):
|
||||
if any(part in SKIP_DIRS for part in f.parts):
|
||||
continue
|
||||
if f.is_file():
|
||||
ext = f.suffix or "(no ext)"
|
||||
counts[ext] += 1
|
||||
return dict(sorted(counts.items(), key=lambda x: -x[1]))
|
||||
|
||||
|
||||
def find_entry_points(repo_path: Path) -> List[str]:
|
||||
entry_points = []
|
||||
candidates = [
|
||||
"main.py", "app.py", "server.py", "cli.py", "manage.py",
|
||||
"index.html", "index.js", "index.ts",
|
||||
"Makefile", "Dockerfile", "docker-compose.yml",
|
||||
"README.md", "deploy.sh", "setup.py", "pyproject.toml",
|
||||
]
|
||||
for name in candidates:
|
||||
if (repo_path / name).exists():
|
||||
entry_points.append(name)
|
||||
scripts_dir = repo_path / "scripts"
|
||||
if scripts_dir.is_dir():
|
||||
for f in sorted(scripts_dir.iterdir()):
|
||||
if f.suffix in (".py", ".sh") and not f.name.startswith("test_"):
|
||||
entry_points.append(f"scripts/{f.name}")
|
||||
return entry_points[:15]
|
||||
|
||||
|
||||
def find_tests(repo_path: Path) -> Tuple[List[str], int]:
|
||||
test_files = []
|
||||
for f in repo_path.rglob("*"):
|
||||
if any(part in SKIP_DIRS for part in f.parts):
|
||||
continue
|
||||
if f.is_file() and (f.name.startswith("test_") or f.name.endswith("_test.py") or f.name.endswith("_test.js")):
|
||||
test_files.append(str(f.relative_to(repo_path)))
|
||||
return sorted(test_files), len(test_files)
|
||||
|
||||
|
||||
def find_directories(repo_path: Path, max_depth: int = 2) -> List[str]:
|
||||
dirs = []
|
||||
for d in sorted(repo_path.rglob("*")):
|
||||
if d.is_dir() and len(d.relative_to(repo_path).parts) <= max_depth:
|
||||
if not any(part in SKIP_DIRS for part in d.parts):
|
||||
rel = str(d.relative_to(repo_path))
|
||||
if rel != ".":
|
||||
dirs.append(rel)
|
||||
return dirs[:30]
|
||||
|
||||
|
||||
def read_readme(repo_path: Path) -> str:
|
||||
for name in ["README.md", "README.rst", "README.txt", "README"]:
|
||||
readme = repo_path / name
|
||||
if readme.exists():
|
||||
lines = readme.read_text(encoding="utf-8", errors="replace").split("\n")
|
||||
para = []
|
||||
started = False
|
||||
for line in lines:
|
||||
if line.startswith("#") and not started:
|
||||
continue
|
||||
if line.strip():
|
||||
started = True
|
||||
para.append(line.strip())
|
||||
elif started:
|
||||
break
|
||||
return " ".join(para[:5])
|
||||
return "(no README found)"
|
||||
|
||||
|
||||
def generate_genome(repo_path: Path, repo_name: str = "") -> str:
|
||||
if not repo_name:
|
||||
repo_name = repo_path.name
|
||||
date = datetime.now(timezone.utc).strftime("%Y-%m-%d")
|
||||
readme_desc = read_readme(repo_path)
|
||||
file_counts = count_files(repo_path)
|
||||
total_files = sum(file_counts.values())
|
||||
entry_points = find_entry_points(repo_path)
|
||||
test_files, test_count = find_tests(repo_path)
|
||||
dirs = find_directories(repo_path)
|
||||
|
||||
lines = [
|
||||
f"# GENOME.md — {repo_name}", "",
|
||||
f"> Codebase analysis generated {date}. {readme_desc[:100]}.", "",
|
||||
"## Project Overview", "",
|
||||
readme_desc, "",
|
||||
f"**{total_files} files** across {len(file_counts)} file types.", "",
|
||||
"## Architecture", "",
|
||||
"```",
|
||||
]
|
||||
for d in dirs[:20]:
|
||||
lines.append(f" {d}/")
|
||||
lines.append("```")
|
||||
lines += ["", "### File Types", "", "| Type | Count |", "|------|-------|"]
|
||||
for ext, count in list(file_counts.items())[:15]:
|
||||
lines.append(f"| {ext} | {count} |")
|
||||
lines += ["", "## Entry Points", ""]
|
||||
for ep in entry_points:
|
||||
lines.append(f"- `{ep}`")
|
||||
lines += ["", "## Test Coverage", "", f"**{test_count} test files** found.", ""]
|
||||
if test_files:
|
||||
for tf in test_files[:10]:
|
||||
lines.append(f"- `{tf}`")
|
||||
if len(test_files) > 10:
|
||||
lines.append(f"- ... and {len(test_files) - 10} more")
|
||||
else:
|
||||
lines.append("No test files found.")
|
||||
lines += ["", "## Security Considerations", "", "(To be filled during analysis)", ""]
|
||||
lines += ["## Design Decisions", "", "(To be filled during analysis)", ""]
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Generate GENOME.md from a codebase")
|
||||
parser.add_argument("repo_path", help="Path to repository")
|
||||
parser.add_argument("--output", default="", help="Output file (default: stdout)")
|
||||
parser.add_argument("--name", default="", help="Repository name")
|
||||
parser.add_argument("--dry-run", action="store_true", help="Print stats only")
|
||||
args = parser.parse_args()
|
||||
repo_path = Path(args.repo_path).resolve()
|
||||
if not repo_path.is_dir():
|
||||
print(f"ERROR: {repo_path} is not a directory", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
repo_name = args.name or repo_path.name
|
||||
if args.dry_run:
|
||||
counts = count_files(repo_path)
|
||||
_, test_count = find_tests(repo_path)
|
||||
print(f"Repo: {repo_name}")
|
||||
print(f"Total files: {sum(counts.values())}")
|
||||
print(f"Test files: {test_count}")
|
||||
print(f"Top types: {', '.join(f'{k}={v}' for k,v in list(counts.items())[:5])}")
|
||||
sys.exit(0)
|
||||
genome = generate_genome(repo_path, repo_name)
|
||||
if args.output:
|
||||
with open(args.output, "w") as f:
|
||||
f.write(genome)
|
||||
print(f"Written: {args.output}")
|
||||
else:
|
||||
print(genome)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
46
templates/GENOME-template.md
Normal file
46
templates/GENOME-template.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# GENOME.md — {{REPO_NAME}}
|
||||
|
||||
> Codebase analysis generated {{DATE}}. {{SHORT_DESCRIPTION}}.
|
||||
|
||||
## Project Overview
|
||||
|
||||
{{OVERVIEW}}
|
||||
|
||||
## Architecture
|
||||
|
||||
{{ARCHITECTURE_DIAGRAM}}
|
||||
|
||||
## Entry Points
|
||||
|
||||
{{ENTRY_POINTS}}
|
||||
|
||||
## Data Flow
|
||||
|
||||
{{DATA_FLOW}}
|
||||
|
||||
## Key Abstractions
|
||||
|
||||
{{ABSTRACTIONS}}
|
||||
|
||||
## API Surface
|
||||
|
||||
{{API_SURFACE}}
|
||||
|
||||
## Test Coverage
|
||||
|
||||
### Existing Tests
|
||||
{{EXISTING_TESTS}}
|
||||
|
||||
### Coverage Gaps
|
||||
{{COVERAGE_GAPS}}
|
||||
|
||||
### Critical paths that need tests:
|
||||
{{CRITICAL_PATHS}}
|
||||
|
||||
## Security Considerations
|
||||
|
||||
{{SECURITY}}
|
||||
|
||||
## Design Decisions
|
||||
|
||||
{{DESIGN_DECISIONS}}
|
||||
88
tests/test_audit_trail.py
Normal file
88
tests/test_audit_trail.py
Normal file
@@ -0,0 +1,88 @@
|
||||
"""Tests for audit trail — SOUL.md compliance."""
|
||||
import json
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
class TestAuditTrail:
|
||||
def test_log_and_query(self, tmp_path):
|
||||
from scripts.audit_trail import AuditTrail
|
||||
trail = AuditTrail(audit_dir=tmp_path)
|
||||
|
||||
trail.log_response(
|
||||
input_text="What is Python?",
|
||||
sources=["web_search:Python is a programming language"],
|
||||
confidence=0.9,
|
||||
output_text="Python is a programming language.",
|
||||
model="test-model",
|
||||
)
|
||||
|
||||
results = trail.query("Python")
|
||||
assert len(results) == 1
|
||||
assert results[0].confidence == 0.9
|
||||
assert "Python" in results[0].output_text
|
||||
|
||||
def test_query_no_match(self, tmp_path):
|
||||
from scripts.audit_trail import AuditTrail
|
||||
trail = AuditTrail(audit_dir=tmp_path)
|
||||
|
||||
trail.log_response(
|
||||
input_text="What is Rust?",
|
||||
sources=[],
|
||||
confidence=0.8,
|
||||
output_text="Rust is a systems language.",
|
||||
)
|
||||
|
||||
results = trail.query("Python")
|
||||
assert len(results) == 0
|
||||
|
||||
def test_confidence_filter(self, tmp_path):
|
||||
from scripts.audit_trail import AuditTrail
|
||||
trail = AuditTrail(audit_dir=tmp_path)
|
||||
|
||||
trail.log_response(input_text="test", sources=[], confidence=0.3, output_text="low conf")
|
||||
trail.log_response(input_text="test", sources=[], confidence=0.95, output_text="high conf")
|
||||
|
||||
high_only = trail.query("test", min_confidence=0.5)
|
||||
assert len(high_only) == 1
|
||||
assert high_only[0].confidence == 0.95
|
||||
|
||||
def test_stats(self, tmp_path):
|
||||
from scripts.audit_trail import AuditTrail
|
||||
trail = AuditTrail(audit_dir=tmp_path)
|
||||
|
||||
trail.log_response(input_text="a", sources=[], confidence=0.8, output_text="b")
|
||||
trail.log_response(input_text="c", sources=[], confidence=0.6, output_text="d")
|
||||
|
||||
stats = trail.get_stats()
|
||||
assert stats["total"] == 2
|
||||
assert stats["avg_confidence"] == 0.7
|
||||
|
||||
def test_session_filter(self, tmp_path):
|
||||
from scripts.audit_trail import AuditTrail
|
||||
trail = AuditTrail(audit_dir=tmp_path)
|
||||
|
||||
trail.log_response(input_text="a", sources=[], confidence=0.9, output_text="b", session_id="s1")
|
||||
trail.log_response(input_text="c", sources=[], confidence=0.9, output_text="d", session_id="s2")
|
||||
|
||||
s1_results = trail.get_by_session("s1")
|
||||
assert len(s1_results) == 1
|
||||
|
||||
def test_empty_trail(self, tmp_path):
|
||||
from scripts.audit_trail import AuditTrail
|
||||
trail = AuditTrail(audit_dir=tmp_path)
|
||||
|
||||
assert trail.query("anything") == []
|
||||
assert trail.get_stats()["total"] == 0
|
||||
|
||||
def test_content_addressed_id(self):
|
||||
from scripts.audit_trail import AuditEntry
|
||||
id1 = AuditEntry.generate_id("input", "output", "2026-01-01")
|
||||
id2 = AuditEntry.generate_id("input", "output", "2026-01-01")
|
||||
id3 = AuditEntry.generate_id("different", "output", "2026-01-01")
|
||||
|
||||
assert id1 == id2 # same content = same ID
|
||||
assert id1 != id3 # different content = different ID
|
||||
@@ -1,56 +0,0 @@
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
GENOME = Path("GENOME.md")
|
||||
|
||||
|
||||
def read_genome() -> str:
|
||||
assert GENOME.exists(), "GENOME.md must exist at repo root"
|
||||
return GENOME.read_text(encoding="utf-8")
|
||||
|
||||
|
||||
def test_the_nexus_genome_has_required_sections() -> None:
|
||||
text = read_genome()
|
||||
required = [
|
||||
"# GENOME.md — the-nexus",
|
||||
"## Project Overview",
|
||||
"## Architecture Diagram",
|
||||
"```mermaid",
|
||||
"## Entry Points and Data Flow",
|
||||
"## Key Abstractions",
|
||||
"## API Surface",
|
||||
"## Test Coverage Gaps",
|
||||
"## Security Considerations",
|
||||
"## Runtime Truth and Docs Drift",
|
||||
]
|
||||
missing = [item for item in required if item not in text]
|
||||
assert not missing, missing
|
||||
|
||||
|
||||
def test_the_nexus_genome_captures_current_runtime_contract() -> None:
|
||||
text = read_genome()
|
||||
required = [
|
||||
"server.py",
|
||||
"app.js",
|
||||
"index.html",
|
||||
"portals.json",
|
||||
"vision.json",
|
||||
"BROWSER_CONTRACT.md",
|
||||
"tests/test_browser_smoke.py",
|
||||
"tests/test_repo_truth.py",
|
||||
"nexus/morrowind_harness.py",
|
||||
"nexus/bannerlord_harness.py",
|
||||
"mempalace/tunnel_sync.py",
|
||||
"mcp_servers/desktop_control_server.py",
|
||||
"public/nexus/",
|
||||
]
|
||||
missing = [item for item in required if item not in text]
|
||||
assert not missing, missing
|
||||
|
||||
|
||||
def test_the_nexus_genome_explains_docs_runtime_drift() -> None:
|
||||
text = read_genome()
|
||||
assert "README.md says current `main` does not ship a browser 3D world" in text
|
||||
assert "CLAUDE.md declares root `app.js` and `index.html` as canonical frontend paths" in text
|
||||
assert "tests and browser contract now assume the root frontend exists" in text
|
||||
assert len(text) >= 5000
|
||||
@@ -1,83 +0,0 @@
|
||||
"""
|
||||
test_wolf_genome.py — lock the current wolf-genome artifact in timmy-home.
|
||||
|
||||
Verifies that genomes/wolf/GENOME.md exists and contains the refreshed content
|
||||
against the current Timmy_Foundation/wolf repo.
|
||||
"""
|
||||
from pathlib import Path
|
||||
|
||||
GENOME = Path("genomes/wolf/GENOME.md")
|
||||
|
||||
|
||||
def read_genome() -> str:
|
||||
assert GENOME.exists(), "wolf genome must exist at genomes/wolf/GENOME.md"
|
||||
return GENOME.read_text(encoding="utf-8")
|
||||
|
||||
|
||||
def test_genome_exists():
|
||||
assert GENOME.exists(), "wolf genome must exist at genomes/wolf/GENOME.md"
|
||||
|
||||
|
||||
def test_genome_has_required_sections():
|
||||
text = read_genome()
|
||||
for heading in [
|
||||
"# GENOME.md",
|
||||
"## Project Overview",
|
||||
"## Architecture",
|
||||
"## Entry Points",
|
||||
"## Key Abstractions",
|
||||
"## API Surface",
|
||||
"## Test Coverage",
|
||||
"## Security Considerations",
|
||||
]:
|
||||
assert heading in text, f"Missing section: {heading}"
|
||||
|
||||
|
||||
def test_genome_contains_mermaid_diagram():
|
||||
text = read_genome()
|
||||
assert "```mermaid" in text, "GENOME.md must contain a mermaid diagram"
|
||||
assert "flowchart" in text.lower() or "graph" in text.lower()
|
||||
|
||||
|
||||
def test_genome_captures_current_test_files():
|
||||
"""Verify the genome documents the test_evaluator and test_config modules."""
|
||||
text = read_genome()
|
||||
for test_name in ["test_evaluator.py", "test_config.py"]:
|
||||
assert test_name in text, f"Missing test surface entry: {test_name}"
|
||||
|
||||
|
||||
def test_genome_mentions_core_modules():
|
||||
text = read_genome()
|
||||
for module in [
|
||||
"evaluator.py",
|
||||
"models.py",
|
||||
"runner.py",
|
||||
"gitea.py",
|
||||
"config.py",
|
||||
"cli.py",
|
||||
]:
|
||||
assert module in text, f"Missing core module: {module}"
|
||||
|
||||
|
||||
def test_genome_mentions_providers():
|
||||
text = read_genome()
|
||||
for provider in ["OpenRouter", "Groq", "Ollama", "Anthropic", "OpenAI"]:
|
||||
assert provider in text, f"Missing provider: {provider}"
|
||||
|
||||
|
||||
def test_genome_is_substantial():
|
||||
text = read_genome()
|
||||
assert len(text) >= 5000, "GENOME.md should be substantial (>= 5000 chars)"
|
||||
|
||||
|
||||
def test_genome_mentions_data_flow():
|
||||
text = read_genome()
|
||||
assert "Prompt Evaluation" in text
|
||||
assert "Task Pipeline" in text or "Legacy" in text
|
||||
|
||||
|
||||
def test_genome_has_scoring_weights():
|
||||
text = read_genome()
|
||||
assert "relevance" in text.lower()
|
||||
assert "coherence" in text.lower()
|
||||
assert "safety" in text.lower()
|
||||
Reference in New Issue
Block a user