feat: Codebase Genome — the-nexus full analysis (#672 )

Complete GENOME.md for the-nexus repository: - Project overview (3D world, Three.js, portal architecture) - Architecture diagram (Mermaid) - Entry points and data flow - Key abstractions (NEXUS, SpatialMemory, Portal System, GOFAI) - API surface (internal + external) - Dependencies (Three.js, Python WebSocket) - Test coverage gaps (7 critical paths untested) - Security considerations (WebSocket auth, localStorage) - Technical debt (4082-line app.js, no TypeScript) - Migration notes from CLAUDE.md Closes #672
2026-04-14 22:33:42 -04:00
2 changed files with 319 additions and 184 deletions
--- a/GENOME.md
+++ b/GENOME.md
@@ -1,184 +0,0 @@
-# GENOME.md — compounding-intelligence
-
-> Auto-generated codebase genome. Repo 9/16 in the Codebase Genome series.
-
-## Project Overview
-
-**compounding-intelligence** turns 1B+ daily tokens into durable, compounding fleet intelligence. It solves the core problem of AI agent amnesia: every session starts at zero, rediscovering the same facts, pitfalls, and patterns that previous sessions already learned.
-
-The project implements three pipelines forming a compounding loop:
-
-```
-SESSION ENDS --> HARVESTER --> KNOWLEDGE STORE --> BOOTSTRAPPER --> NEW SESSION STARTS SMARTER
-                                      |
-                                 MEASURER --> Prove it's working
-```
-
-**Key insight**: Intelligence from a million tokens of work evaporates when the session ends. This project captures it, stores it, and injects it into future sessions so they start smarter.
-
-## Architecture
-
-```mermaid
-graph LR
-    A[Session Transcripts] -->|Harvester| B[Knowledge Store]
-    B -->|Bootstrapper| C[New Session Context]
-    C --> D[Agent Work]
-    D --> A
-    B -->|Measurer| E[Dashboard]
-    E -->|Metrics| F[Proof of Compounding]
-
-    subgraph Knowledge Store
-        B1[index.json]
-        B2[global/]
-        B3[repos/{repo}.md]
-        B4[agents/{agent}.md]
-    end
-```
-
-### Pipeline 1: Harvester
- **Input**: Finished session transcripts (JSONL format)
- **Process**: LLM extracts durable knowledge using structured prompt
- **Output**: Facts stored in `knowledge/` directory
- **Categories**: fact, pitfall, pattern, tool-quirk, question
- **Deduplication**: Content-hash based, existing knowledge has priority
-
-### Pipeline 2: Bootstrapper
- **Input**: `knowledge/` store
- **Process**: Queries for relevant facts, assembles compact 2k-token context
- **Output**: Injected context at session start
- **Goal**: New sessions start with full situational awareness
-
-### Pipeline 3: Measurer
- **Input**: Knowledge store + session metrics
- **Process**: Tracks knowledge velocity, error reduction, hit rate
- **Output**: Dashboard.md + daily reports
- **Goal**: Prove the compounding loop works
-
-## Directory Structure
-
-```
-compounding-intelligence/
-|-- README.md                          # Project overview and roadmap
-|-- knowledge/
-|   |-- index.json                     # Machine-readable fact index (versioned)
-|   |-- global/                        # Cross-repo knowledge
-|   |-- repos/{repo}.md                # Per-repo knowledge files
-|   |-- agents/{agent}.md              # Agent-type notes
-|-- scripts/
-|   |-- test_harvest_prompt.py         # Validation for harvest prompt output
-|   |-- test_harvest_prompt_comprehensive.py  # Extended test suite
-|-- templates/
-|   |-- harvest-prompt.md             # LLM prompt for knowledge extraction
-|-- metrics/
-|   |-- .gitkeep                      # Placeholder for dashboard
-|-- test_sessions/
-|   |-- session_failure.jsonl         # Test data: failed session
-|   |-- session_partial.jsonl         # Test data: partial session
-|   |-- session_patterns.jsonl        # Test data: pattern extraction
-|   |-- session_questions.jsonl       # Test data: question identification
-|   |-- session_success.jsonl         # Test data: successful session
-```
-
-## Entry Points
-
-| File | Purpose | Entry |
-|------|---------|-------|
-| `templates/harvest-prompt.md` | Extraction prompt | LLM input template |
-| `scripts/test_harvest_prompt.py` | Validation | `python3 test_harvest_prompt.py` |
-| `knowledge/index.json` | Data store | Read/write by all pipelines |
-
-## Data Flow
-
-```
-1. Agent completes session -> session transcript (JSONL)
-2. Harvester reads transcript
-3. LLM processes via harvest-prompt.md template
-4. Extracted knowledge validated against schema
-5. Deduplicated against existing index.json
-6. New facts appended with source attribution
-7. Bootstrapper queries index.json for relevant facts
-8. Context injected into next session
-9. Measurer tracks velocity and quality metrics
-```
-
-## Knowledge Schema
-
-Each knowledge item in `index.json`:
-
-```json
-{
-  "fact": "One sentence description",
-  "category": "fact|pitfall|pattern|tool-quirk|question",
-  "repo": "Repository name or 'global'",
-  "confidence": 0.0-1.0,
-  "source": "mempalace|fact_store|skill|harvester",
-  "source_file": "Origin file if applicable",
-  "migrated_at": "ISO 8601 timestamp"
-}
-```
-
-### Confidence Scoring
- **0.9-1.0**: Explicitly stated with verification
- **0.7-0.8**: Clearly implied by multiple data points
- **0.5-0.6**: Suggested but not fully verified
- **0.3-0.4**: Inferred from limited data
- **0.1-0.2**: Speculative or uncertain
-
-## Key Abstractions
-
-1. **Knowledge Item**: Atomic unit of extracted intelligence. One fact, one category, one confidence score.
-2. **Knowledge Store**: Directory-based persistent storage with JSON index.
-3. **Harvest Prompt**: Structured LLM prompt that converts session transcripts to knowledge items.
-4. **Bootstrap Context**: Compact 2k-token summary injected at session start.
-5. **Compounding Loop**: The cycle of extract -> store -> inject -> work -> extract.
-
-## API Surface
-
-### Knowledge Store (file-based)
- **Read**: `knowledge/index.json` — all facts
- **Write**: Append to `index.json` after deduplication
- **Query**: Filter by category, repo, confidence threshold
-
-### Templates
- **harvest-prompt.md**: Input template for LLM extraction
- **bootstrap-context.md**: Output template for session injection
-
-## Test Coverage
-
-| Test File | Covers | Status |
-|-----------|--------|--------|
-| `test_harvest_prompt.py` | Schema validation, required fields | Present |
-| `test_harvest_prompt_comprehensive.py` | Extended validation, edge cases | Present |
-| `test_sessions/session_failure.jsonl` | Failure extraction | Test data |
-| `test_sessions/session_partial.jsonl` | Partial session handling | Test data |
-| `test_sessions/session_patterns.jsonl` | Pattern extraction | Test data |
-| `test_sessions/session_questions.jsonl` | Question identification | Test data |
-| `test_sessions/session_success.jsonl` | Full extraction | Test data |
-
-### Gaps
- No integration tests for full harvester pipeline
- No tests for bootstrapper context assembly
- No tests for measurer metrics computation
- No tests for deduplication logic
- No CI pipeline configured
-
-## Security Considerations
-
-1. **Knowledge injection**: Bootstrapper injects context from knowledge store. Malicious facts in the store could influence agent behavior. Trust scoring partially mitigates this.
-2. **Session transcripts**: May contain sensitive data (tokens, API keys). Harvester must filter sensitive patterns before storage.
-3. **LLM extraction**: Harvest prompt instructs "no hallucination" but LLMs can still confabulate. Confidence scoring and source attribution provide auditability.
-4. **File-based storage**: No access control on knowledge files. Anyone with filesystem access can read/modify.
-
-## Dependencies
-
- Python 3.10+
- No external packages (stdlib only)
- LLM access for harvester pipeline (Ollama or cloud provider)
- Hermes agent framework for session management
-
-## Status
-
- **Phase**: Early development
- **Epics**: 4 (Harvester, Knowledge Store, Bootstrap, Measurement)
- **Milestone**: 4 (Retroactive Harvest)
- **Open Issues**: Active development across harvester and knowledge store pipelines
--- a/the-nexus-GENOME.md
+++ b/the-nexus-GENOME.md
@@ -0,0 +1,319 @@
+# GENOME.md — the-nexus
+
+**Generated:** 2026-04-14  
+**Repo:** Timmy_Foundation/the-nexus  
+**Analysis:** Codebase Genome #672
+
+---
+
+## Project Overview
+
+The Nexus is Timmy's canonical 3D home-world — a browser-based Three.js application that serves as:
+1. **Local-first training ground** for Timmy (the sovereign AI)
+2. **Wizardly visualization surface** for the fleet system
+3. **Portal architecture** connecting to other worlds and services
+
+The app is a real-time 3D environment with spatial memory, GOFAI reasoning, agent presence, and portal-based navigation.
+
+---
+
+## Architecture
+
+```mermaid
+graph TB
+    subgraph Browser["BROWSER LAYER"]
+        HTML[index.html]
+        APP[app.js - 4082 lines]
+        CSS[style.css]
+        Worker[gofai_worker.js]
+    end
+    
+    subgraph ThreeJS["THREE.JS RENDERING"]
+        Scene[Scene Management]
+        Camera[Camera System]
+        Renderer[WebGL Renderer]
+        Post[Post-processing<br/>Bloom, SMAA]
+        Physics[Physics/Player]
+    end
+    
+    subgraph Nexus["NEXUS COMPONENTS"]
+        SM[SpatialMemory]
+        SA[SpatialAudio]
+        MB[MemoryBirth]
+        MO[MemoryOptimizer]
+        MI[MemoryInspect]
+        MP[MemoryPulse]
+        RT[ReasoningTrace]
+        RV[ResonanceVisualizer]
+    end
+    
+    subgraph GOFAI["GOFAI REASONING"]
+        Worker2[Web Worker]
+        Rules[Rule Engine]
+        Facts[Fact Store]
+        Inference[Inference Loop]
+    end
+    
+    subgraph Backend["BACKEND SERVICES"]
+        Server[server.py<br/>WebSocket Bridge]
+        L402[L402 Cost API]
+        Portal[Portal Registry]
+    end
+    
+    subgraph Data["DATA/PERSISTENCE"]
+        Local[localStorage]
+        IDB[IndexedDB]
+        JSON[portals.json]
+        Vision[vision.json]
+    end
+    
+    HTML --> APP
+    APP --> ThreeJS
+    APP --> Nexus
+    APP --> GOFAI
+    APP --> Backend
+    APP --> Data
+    
+    Worker2 --> APP
+    Server --> APP
+```
+
+---
+
+## Entry Points
+
+### Primary Entry
+- **`index.html`** — Main HTML shell, loads app.js
+- **`app.js`** — Main application (4082 lines), Three.js scene setup
+
+### Secondary Entry Points
+- **`boot.js`** — Bootstrap sequence
+- **`bootstrap.mjs`** — ES module bootstrap
+- **`server.py`** — WebSocket bridge server
+
+### Configuration Entry Points
+- **`portals.json`** — Portal definitions and destinations
+- **`vision.json`** — Vision/agent configuration
+- **`config/fleet_agents.json`** — Fleet agent definitions
+
+---
+
+## Data Flow
+
+```
+User Input
+    ↓
+app.js (Event Loop)
+    ↓
+┌─────────────────────────────────────┐
+│ Three.js Scene                      │
+│ - Player movement                   │
+│ - Camera controls                   │
+│ - Physics simulation                │
+│ - Portal detection                  │
+└─────────────────────────────────────┘
+    ↓
+┌─────────────────────────────────────┐
+│ Nexus Components                    │
+│ - SpatialMemory (room/context)      │
+│ - MemoryBirth (new memories)        │
+│ - MemoryPulse (heartbeat)           │
+│ - ReasoningTrace (GOFAI output)     │
+└─────────────────────────────────────┘
+    ↓
+┌─────────────────────────────────────┐
+│ GOFAI Worker (off-thread)           │
+│ - Rule evaluation                   │
+│ - Fact inference                    │
+│ - Decision making                   │
+└─────────────────────────────────────┘
+    ↓
+┌─────────────────────────────────────┐
+│ Backend Services                    │
+│ - WebSocket (server.py)             │
+│ - L402 cost API                     │
+│ - Portal registry                   │
+└─────────────────────────────────────┘
+    ↓
+Persistence (localStorage/IndexedDB)
+```
+
+---
+
+## Key Abstractions
+
+### 1. Nexus Object (`NEXUS`)
+Central configuration and state object containing:
+- Color palette
+- Room definitions
+- Portal configurations
+- Agent settings
+
+### 2. SpatialMemory
+Manages room-based context for the AI agent:
+- Room transitions trigger context switches
+- Facts are stored per-room
+- NPCs have location awareness
+
+### 3. Portal System
+Connects the 3D world to external services:
+- Portals defined in `portals.json`
+- Each portal links to a service/endpoint
+- Visual indicators in 3D space
+
+### 4. GOFAI Worker
+Off-thread reasoning engine:
+- Rule-based inference
+- Fact store with persistence
+- Decision making for agent behavior
+
+### 5. Memory Components
+- **MemoryBirth**: Creates new memories from interactions
+- **MemoryOptimizer**: Compresses and deduplicates memories
+- **MemoryPulse**: Heartbeat system for memory health
+- **MemoryInspect**: Debug/inspection interface
+
+---
+
+## API Surface
+
+### Internal APIs (JavaScript)
+
+| Module | Export | Purpose |
+|--------|--------|---------|
+| `app.js` | `NEXUS` | Main config/state object |
+| `SpatialMemory` | class | Room-based context management |
+| `SpatialAudio` | class | 3D positional audio |
+| `MemoryBirth` | class | Memory creation |
+| `MemoryOptimizer` | class | Memory compression |
+| `ReasoningTrace` | class | GOFAI reasoning visualization |
+
+### External APIs (HTTP/WebSocket)
+
+| Endpoint | Protocol | Purpose |
+|----------|----------|---------|
+| `ws://localhost:PORT` | WebSocket | Real-time bridge to backend |
+| `http://localhost:8080/api/cost-estimate` | HTTP | L402 cost estimation |
+| Portal endpoints | Various | External service connections |
+
+---
+
+## Dependencies
+
+### Runtime Dependencies
+- **Three.js** — 3D rendering engine
+- **Three.js Addons** — Post-processing (Bloom, SMAA)
+
+### Build Dependencies
+- **ES Modules** — Native browser modules
+- **No bundler** — Direct script loading
+
+### Backend Dependencies
+- **Python 3.x** — server.py
+- **WebSocket** — Real-time communication
+
+---
+
+## Test Coverage
+
+### Existing Tests
+- `tests/boot.test.js` — Bootstrap sequence tests
+
+### Test Gaps
+1. **Three.js scene initialization** — No tests
+2. **Portal system** — No tests
+3. **Memory components** — No tests
+4. **GOFAI worker** — No tests
+5. **WebSocket communication** — No tests
+6. **Spatial memory transitions** — No tests
+7. **Physics/player movement** — No tests
+
+### Recommended Test Priorities
+1. Portal detection and activation
+2. Spatial memory room transitions
+3. GOFAI worker message passing
+4. WebSocket connection handling
+5. Memory persistence (localStorage/IndexedDB)
+
+---
+
+## Security Considerations
+
+### Current Risks
+1. **WebSocket without auth** — server.py has no authentication
+2. **localStorage sensitive data** — Memories stored unencrypted
+3. **CORS open** — No origin restrictions on WebSocket
+4. **L402 endpoint** — Cost API may expose internal state
+
+### Mitigations
+1. Add WebSocket authentication
+2. Encrypt sensitive memories
+3. Restrict CORS origins
+4. Rate limit L402 endpoint
+
+---
+
+## File Structure
+
+```
+the-nexus/
+├── app.js                    # Main app (4082 lines)
+├── index.html                # HTML shell
+├── style.css                 # Styles
+├── server.py                 # WebSocket bridge
+├── boot.js                   # Bootstrap
+├── bootstrap.mjs             # ES module bootstrap
+├── gofai_worker.js           # GOFAI web worker
+├── portals.json              # Portal definitions
+├── vision.json               # Vision config
+├── nexus/                    # Nexus components
+│   └── components/
+│       ├── spatial-memory.js
+│       ├── spatial-audio.js
+│       ├── memory-birth.js
+│       ├── memory-optimizer.js
+│       ├── memory-inspect.js
+│       ├── memory-pulse.js
+│       ├── reasoning-trace.js
+│       └── resonance-visualizer.js
+├── config/                   # Configuration
+├── docs/                     # Documentation
+├── tests/                    # Tests
+├── agent/                    # Agent components
+├── bin/                      # Scripts
+└── assets/                   # Static assets
+```
+
+---
+
+## Technical Debt
+
+1. **Large app.js** (4082 lines) — Should be split into modules
+2. **No TypeScript** — Pure JavaScript, no type safety
+3. **Manual DOM manipulation** — Could use a framework
+4. **No build system** — Direct ES modules, no optimization
+5. **Limited error handling** — Minimal try/catch coverage
+
+---
+
+## Migration Notes
+
+From CLAUDE.md:
+- Current `main` does NOT ship the old root frontend files
+- A clean checkout serves a directory listing
+- The live browser shell exists in legacy form at `/Users/apayne/the-matrix`
+- Migration priorities: #684 (docs), #685 (legacy audit), #686 (smoke tests), #687 (restore shell)
+
+---
+
+## Next Steps
+
+1. **Restore browser shell** — Bring frontend back to main
+2. **Add tests** — Cover critical paths (portals, memory, GOFAI)
+3. **Split app.js** — Modularize the 4082-line file
+4. **Add authentication** — Secure WebSocket and APIs
+5. **TypeScript migration** — Add type safety
+
+---
+
+*Generated by Codebase Genome pipeline — Issue #672*