feat: senior architect quality analysis + XSS fixes + HITL guide

- Add QUALITY_ANALYSIS.md — 10-point architect review covering architecture coherence, completeness (~35-40% vs vision), mobile UX, security, test coverage, code quality, and DX - Fix P0 XSS: mobile.html chat input now uses DOM textContent instead of innerHTML string interpolation with raw user input - Fix P0 XSS: swarm_live.html agent/auction rendering rewritten with safe DOM methods (_t/_el helpers) — no more ${agent.name} in innerHTML - Add M7xx test category (4 new tests) covering XSS prevention assertions; total suite now 232 passing (was 228) - HITL session guide included in analysis with step-by-step phone test instructions and critical scenario priority ordering https://claude.ai/code/session_0183Nzcy7TMqjrAopnTtygds
2026-02-21 18:11:22 +00:00
parent f862ffde93
commit 95555b3738
4 changed files with 420 additions and 39 deletions
--- a/QUALITY_ANALYSIS.md
+++ b/QUALITY_ANALYSIS.md
@@ -0,0 +1,306 @@
+# Timmy Time — Senior Architect Quality Analysis
+**Date:** 2026-02-21
+**Branch:** `claude/quality-analysis-mobile-testing-0zgPi`
+**Test Suite:** 228/228 passing ✅
+
+---
+
+## Executive Summary
+
+Timmy Time has a strong Python backend skeleton and a working HTMX UI, but the project is at a **critical architectural fork**: a second, fully-detached React frontend was introduced that uses 100% mock/static data with zero API connectivity. This split creates the illusion of a richer app than exists. Completeness against the stated vision is **~35-40%**. The mobile HITL framework is the standout quality asset.
+
+---
+
+## 1. Architecture Coherence — CRITICAL ⚠️
+
+**Score: 3/10**
+
+### Finding: Dual Frontend, Zero Integration
+
+The project ships two separate UIs that both claim to be "Mission Control":
+
+| UI | Tech | Backend Connected? |
+|----|------|--------------------|
+| `src/dashboard/` | FastAPI + Jinja2 + HTMX | ✅ Yes — real Timmy chat, health, history |
+| `dashboard-web/` | React + TypeScript + Vite | ❌ No — 100% static mock data |
+
+The React dashboard (`dashboard-web/client/src/lib/data.ts`) exports `MOCK_CHAT`, `MOCK_HEALTH`, `MOCK_NOTIFICATIONS`, `MOCK_TASKS`, `MOCK_WS_EVENTS` — every data source is hardcoded. There is **not a single `fetch()` call** to the FastAPI backend. The `ChatPanel` simulates responses with `setTimeout()`. The `StatusSidebar` shows a hardcoded Ollama status — it never calls `/health/status`.
+
+**Impact:** The React UI is a clickable mockup, not a product. A new developer would not know which frontend is authoritative.
+
+### Finding: React App Has No Build Config
+
+`dashboard-web/client/` contains `src/` and `index.html` but no `package.json`, `vite.config.ts`, or `tsconfig.json` in that directory. The app imports from `@/components/ui/*` (shadcn/ui) but the `components/ui/` directory does not exist in the repo. The React app is **not buildable as committed**.
+
+---
+
+## 2. Completeness Against Vision — 35-40%
+
+**Score: 4/10**
+
+| Feature | Roadmap | Status |
+|---------|---------|--------|
+| Agno + Ollama + SQLite dashboard | v1.0.0 | ✅ Complete |
+| HTMX chat with history | v1.0.0 | ✅ Complete |
+| AirLLM big-brain backend | v1.0.0 | ✅ Complete |
+| CLI (chat/think/status) | v1.0.0 | ✅ Complete |
+| Swarm registry + coordinator | v2.0.0 | ⚠️ Skeleton only — no real agents |
+| Agent personas (Echo, Mace, Forge…) | v2.0.0 | ❌ Catalog only — never instantiated |
+| MCP tools integration | v2.0.0 | ❌ Not started |
+| Voice NLU | v2.0.0 | ⚠️ Backend module — no live UI |
+| Push notifications | v2.0.0 | ⚠️ Backend module — never triggered |
+| Siri Shortcuts | v2.0.0 | ⚠️ Endpoint stub only |
+| WebSocket live swarm feed | v2.0.0 | ⚠️ Server-side ready — no UI consumer |
+| L402 / Lightning payments | v3.0.0 | ⚠️ Mock implementation only |
+| Real LND gRPC backend | v3.0.0 | ❌ Not started |
+| Single `.app` bundle | v3.0.0 | ❌ Not started |
+| React dashboard (live data) | — | ❌ All mock data |
+| Mobile HITL checklist | — | ✅ Complete (27 scenarios) |
+
+---
+
+## 3. Mobile UX Audit
+
+**Score: 7/10 (HTMX UI) / 2/10 (React UI)**
+
+### HTMX Dashboard — Strong
+
+The HTMX-served dashboard has solid mobile foundations verified by the automated test suite:
+
+- ✅ `viewport-fit=cover` — Dynamic Island / notch support
+- ✅ `apple-mobile-web-app-capable` — Home Screen PWA mode
+- ✅ `safe-area-inset-top/bottom` — padding clears notch and home indicator
+- ✅ `overscroll-behavior: none` — no rubber-band on main page
+- ✅ `-webkit-overflow-scrolling: touch` — momentum scroll in chat
+- ✅ `dvh` units — correct height on iOS with collapsing chrome
+- ✅ 44px touch targets on SEND button and inputs
+- ✅ `font-size: 16px` in mobile query — iOS zoom prevention
+- ✅ `enterkeyhint="send"` — Send-labelled keyboard key
+- ✅ HTMX `hx-sync="this:drop"` — double-tap protection
+- ✅ HTMX `hx-disabled-elt` — in-flight button lockout
+
+### Gap: Mobile Quick Actions Page (`/mobile`)
+
+The `/mobile` route template shows a "Mobile only" page with quick action tiles and a JS-based chat — but it uses **CSS `display: none` on desktop** via `.mobile-only` with an `@media (min-width: 769px)` rule. The desktop fallback shows a placeholder. This is a valid progressive enhancement approach but the page is not linked from the main nav bar.
+
+### React Dashboard — Mobile Not Functional
+
+The React dashboard uses `hidden lg:flex` for the left sidebar (desktop only) and an `AnimatePresence` slide-in overlay for mobile. The mobile UX architecture is correct. However, because all data is mock, tapping "Chat" produces a simulated response from a setTimeout, not from Ollama. This is not tested and not usable.
+
+---
+
+## 4. Human-in-the-Loop (HITL) Mobile Testing
+
+**Score: 8/10**
+
+The `/mobile-test` route is the standout quality feature. It provides:
+
+- 21 structured test scenarios across 7 categories (Layout, Touch, Chat, Health, Scroll, Notch, Live UI)
+- PASS/FAIL/SKIP buttons with sessionStorage persistence across scroll
+- Live pass rate counter and progress bar
+- Accessible on any phone via local network URL
+- ← MISSION CONTROL back-link for easy navigation
+
+**Gaps to improve:**
+- No server-side results storage — results lost when tab closes
+- No shareable/exportable report (screenshot required for handoff)
+- React dashboard has no equivalent HITL page
+- No automated Playwright/Selenium mobile tests that could catch regressions
+
+---
+
+## 5. Security Assessment
+
+**Score: 5/10**
+
+### XSS Vulnerability — `/mobile` template
+
+`mobile.html` line ~85 uses raw `innerHTML` string interpolation with user-supplied message content:
+
+```javascript
+// mobile.html — VULNERABLE
+chat.innerHTML += `
+    <div class="chat-message user">
+        <div>${message}</div>  <!-- message is user input, not escaped -->
+    </div>
+`;
+```
+
+If a user types `<img src=x onerror=alert(1)>`, it executes. This is a stored XSS via `innerHTML`. Fix: use `document.createTextNode(message)` or escape HTML before insertion.
+
+The `swarm_live.html` has the same pattern with WebSocket data:
+```javascript
+container.innerHTML = agents.map(agent => `...${agent.name}...`).join('');
+```
+If agent names contain `<script>` tags (or any HTML), this executes in context.
+
+### Hardcoded Secrets
+
+`l402_proxy.py`: `_MACAROON_SECRET = "timmy-macaroon-secret".encode()` (default)
+`payment_handler.py`: `_HMAC_SECRET = "timmy-sovereign-sats".encode()` (default)
+
+Both fall back to env var reads which is correct, but the defaults should not be production-safe strings — they should be None with a startup assertion requiring them to be set.
+
+### No Route Authentication
+
+All `/swarm/spawn`, `/swarm/tasks`, `/marketplace`, `/agents/timmy/chat` endpoints have no auth guard. On a `--host 0.0.0.0` server, anyone on the local network can post tasks or clear chat history. Acceptable for v1 local-only use but must be documented and gated before LAN exposure.
+
+---
+
+## 6. Test Coverage
+
+**Score: 7/10**
+
+| Suite | Tests | Quality |
+|-------|-------|---------|
+| Agent unit | 13 | Good |
+| Backends | 14 | Good |
+| Mobile scenarios | 32 | Excellent — covers M1xx-M6xx categories |
+| Swarm | 29+10+16 | Good |
+| L402 proxy | 13 | Good |
+| Voice NLU | 15 | Good |
+| Dashboard routes | 18+18 | Good |
+| WebSocket | 3 | Thin — no reconnect or message-type tests |
+| React components | 0 | Missing entirely |
+| End-to-end (Playwright) | 0 | Missing |
+
+**Key gaps:**
+1. No tests for the XSS vulnerabilities
+2. No tests for the `/mobile` quick-chat endpoint
+3. WebSocket tests don't cover reconnection logic or malformed payloads
+4. React app has zero test coverage
+
+---
+
+## 7. Code Quality
+
+**Score: 7/10**
+
+**Strengths:**
+- Clean module separation (`timmy/`, `swarm/`, `dashboard/routes/`, `timmy_serve/`)
+- Consistent use of dataclasses for domain models
+- Good docstrings on all public functions
+- SQLite-backed persistence for both Agno memory and swarm registry
+- pydantic-settings config with `.env` override support
+
+**Weaknesses:**
+- Swarm `coordinator.py` uses a module-level singleton `coordinator = SwarmCoordinator()` — not injectable, hard to test in isolation
+- `swarm/registry.py` opens a new SQLite connection on every call (no connection pool)
+- `dashboard/routes/swarm.py` creates a new `Jinja2Templates` instance — it should reuse the one from `app.py`
+- React components import from `@/components/ui/*` which don't exist in the committed tree
+
+---
+
+## 8. Developer Experience
+
+**Score: 6/10**
+
+**Strengths:**
+- README is excellent — copy-paste friendly, covers Mac quickstart, phone access, troubleshooting
+- DEVELOPMENT_REPORT.md provides full history of what was built and why
+- `.env.example` covers all config variables
+- Self-TDD watchdog CLI is a creative addition
+
+**Weaknesses:**
+- No `docker-compose.yml` — setup requires manual Python venv + Ollama install
+- Two apps (FastAPI + React) with no single `make dev` command to start both
+- `STATUS.md` says v1.0.0 but development is well past that — version drift
+- React app missing from the quickstart instructions entirely
+
+---
+
+## 9. Backend Architecture
+
+**Score: 7/10**
+
+The FastAPI backend is well-structured. The swarm subsystem follows a clean coordinator pattern. The L402 mock is architecturally correct (the interface matches what real LND calls would require).
+
+**Gaps:**
+- Swarm "agents" are database records — `spawn_agent()` registers a record but no Python process is actually launched. `agent_runner.py` uses `subprocess.Popen` to run `python -m swarm.agent_runner` but no `__main__` block exists in that file.
+- The bidding system (`bidder.py`) runs an asyncio auction but there are no actual bidder agents submitting bids — auctions will always time out with no winner.
+- Voice TTS (`voice_tts.py`) requires `pyttsx3` (optional dep) but the voice route offers no graceful fallback message when pyttsx3 is absent.
+
+---
+
+## 10. Prioritized Defects
+
+| Priority | ID | Issue | File |
+|----------|----|-------|------|
+| P0 | SEC-01 | XSS via innerHTML with unsanitized user input | `mobile.html:85`, `swarm_live.html:72` |
+| P0 | ARCH-01 | React dashboard 100% mock — no backend calls | `dashboard-web/client/src/` |
+| P0 | ARCH-02 | React app not buildable — missing package.json, shadcn/ui | `dashboard-web/client/` |
+| P1 | SEC-02 | Hardcoded L402/HMAC secrets without startup assertion | `l402_proxy.py`, `payment_handler.py` |
+| P1 | FUNC-01 | Swarm spawn creates DB record but no process | `swarm/agent_runner.py` |
+| P1 | FUNC-02 | Auction always fails — no real bid submitters | `swarm/bidder.py` |
+| P2 | UX-01 | `/mobile` route not in desktop nav | `base.html`, `index.html` |
+| P2 | TEST-01 | WebSocket reconnection not tested | `tests/test_websocket.py` |
+| P2 | DX-01 | No single dev startup command | `README.md` |
+| P3 | PERF-01 | SQLite connection opened per-query in registry | `swarm/registry.py` |
+
+---
+
+## HITL Mobile Test Session Guide
+
+To run a complete human-in-the-loop mobile test session right now:
+
+```bash
+# 1. Start the dashboard
+source .venv/bin/activate
+uvicorn dashboard.app:app --host 0.0.0.0 --port 8000 --reload
+
+# 2. Find your local IP
+ipconfig getifaddr en0   # macOS
+hostname -I              # Linux
+
+# 3. Open on your phone (same Wi-Fi)
+http://<YOUR_IP>:8000/mobile-test
+
+# 4. Work through the 21 scenarios, marking PASS / FAIL / SKIP
+# 5. Screenshot the SUMMARY section for your records
+
+# ─── Also test the main dashboard on mobile ───────────────────────
+http://<YOUR_IP>:8000        # Main Mission Control
+http://<YOUR_IP>:8000/mobile # Quick Actions (mobile-optimized)
+```
+
+**Critical scenarios to test first:**
+- T01 — iOS zoom prevention (tap input, watch for zoom)
+- C02 — Multi-turn memory (tell Timmy your name, ask it back)
+- C04 — Offline graceful error (stop Ollama, send message)
+- N01/N02 — Notch / home bar clearance (notched iPhone)
+
+---
+
+## Recommended Next Prompt for Development
+
+```
+Connect the React dashboard (dashboard-web/) to the live FastAPI backend.
+
+Priority order:
+
+1. FIX BUILD FIRST: Add package.json, vite.config.ts, tailwind.config.ts, and
+   tsconfig.json to dashboard-web/client/. Install shadcn/ui so the existing
+   component imports resolve. Verify `npm run dev` starts the app.
+
+2. CHAT (highest user value): Replace ChatPanel mock with real fetch to
+   POST /agents/timmy/chat. Show the actual Timmy response from Ollama.
+   Implement loading state (matches the existing isTyping UI).
+
+3. HEALTH: Replace MOCK_HEALTH in StatusSidebar with a polling fetch to
+   GET /health/status (every 30s, matching HTMX behaviour).
+
+4. SWARM WEBSOCKET: Open a real WebSocket to ws://localhost:8000/swarm/ws
+   and pipe state updates into SwarmPanel — replacing MOCK_WS_EVENTS.
+
+5. SECURITY: Fix XSS in mobile.html and swarm_live.html — replace innerHTML
+   string interpolation with safe DOM methods (textContent / createTextNode).
+
+Use React Query (TanStack) for data fetching with stale-while-revalidate.
+Keep the existing HTMX dashboard running in parallel — the React app should
+be the forward-looking UI.
+```
+
+---
+
+*Analysis by Claude Code — Senior Architect Review*
+*Timmy Time Dashboard | branch: claude/quality-analysis-mobile-testing-0zgPi*
--- a/src/dashboard/templates/mobile.html
+++ b/src/dashboard/templates/mobile.html
@@ -159,13 +159,17 @@ async function sendMobileMessage(event) {
    
    const chat = document.getElementById('mobile-chat');
    
-    // Add user message
-    chat.innerHTML += `
-        <div class="chat-message user">
-            <div class="chat-meta">You</div>
-            <div>${message}</div>
-        </div>
-    `;
+    // Add user message — use DOM methods to avoid XSS
+    const userDiv = document.createElement('div');
+    userDiv.className = 'chat-message user';
+    const userMeta = document.createElement('div');
+    userMeta.className = 'chat-meta';
+    userMeta.textContent = 'You';
+    const userText = document.createElement('div');
+    userText.textContent = message;  // textContent escapes HTML
+    userDiv.appendChild(userMeta);
+    userDiv.appendChild(userText);
+    chat.appendChild(userDiv);
    chat.scrollTop = chat.scrollHeight;
    
    input.value = '';
--- a/src/dashboard/templates/swarm_live.html
+++ b/src/dashboard/templates/swarm_live.html
@@ -104,50 +104,69 @@ function handleMessage(message) {
    }
 }

+// Safe text setter — avoids XSS when inserting user/server data into DOM
+function _t(el, text) { el.textContent = text; return el; }
+function _el(tag, cls) { const e = document.createElement(tag); if (cls) e.className = cls; return e; }
+
 function updateAgentsList(agents) {
    const container = document.getElementById('agents-list');
-    
+    container.innerHTML = '';
+
    if (!agents || agents.length === 0) {
-        container.innerHTML = '<p style="color: var(--text-muted);">No agents registered</p>';
+        const p = _el('p'); p.style.color = 'var(--text-muted)';
+        _t(p, 'No agents registered');
+        container.appendChild(p);
        return;
    }
-    
-    container.innerHTML = agents.map(agent => `
-        <div class="agent-card">
-            <div class="agent-avatar">${agent.name.charAt(0).toUpperCase()}</div>
-            <div class="agent-info">
-                <div class="agent-name">${agent.name}</div>
-                <div class="agent-meta">${agent.description || 'No description'}</div>
-                <div class="agent-meta">
-                    <span class="badge badge-${agent.status === 'active' ? 'success' : agent.status === 'busy' ? 'warning' : 'danger'}">${agent.status}</span>
-                    ${agent.min_bid} sats min bid
-                    | ${agent.tasks_completed} tasks
-                    | ${agent.total_earned} sats earned
-                </div>
-            </div>
-        </div>
-    `).join('');
+
+    agents.forEach(agent => {
+        const card = _el('div', 'agent-card');
+        const avatar = _el('div', 'agent-avatar');
+        _t(avatar, (agent.name || '?').charAt(0).toUpperCase());
+        const info = _el('div', 'agent-info');
+        const name = _el('div', 'agent-name');
+        _t(name, agent.name || '');
+        const desc = _el('div', 'agent-meta');
+        _t(desc, agent.description || 'No description');
+        const meta = _el('div', 'agent-meta');
+        const badge = _el('span', `badge badge-${agent.status === 'active' ? 'success' : agent.status === 'busy' ? 'warning' : 'danger'}`);
+        _t(badge, agent.status || '');
+        const stats = _el('span');
+        _t(stats, ` ${agent.min_bid ?? 0} sats min bid | ${agent.tasks_completed ?? 0} tasks | ${agent.total_earned ?? 0} sats earned`);
+        meta.appendChild(badge);
+        meta.appendChild(stats);
+        info.appendChild(name);
+        info.appendChild(desc);
+        info.appendChild(meta);
+        card.appendChild(avatar);
+        card.appendChild(info);
+        container.appendChild(card);
+    });
 }

 function updateAuctionsList(auctions) {
    const container = document.getElementById('auctions-list');
-    
+    container.innerHTML = '';
+
    if (!auctions || auctions.length === 0) {
-        container.innerHTML = '<p style="color: var(--text-muted);">No active auctions</p>';
+        const p = _el('p'); p.style.color = 'var(--text-muted)';
+        _t(p, 'No active auctions');
+        container.appendChild(p);
        return;
    }
-    
-    container.innerHTML = auctions.map(auction => `
-        <div class="agent-card">
-            <div class="agent-info">
-                <div class="agent-name">Task ${auction.task_id.slice(0, 8)}</div>
-                <div class="agent-meta">
-                    ${Math.round(auction.time_remaining)}s remaining
-                    | ${auction.bid_count} bids
-                </div>
-            </div>
-        </div>
-    `).join('');
+
+    auctions.forEach(auction => {
+        const card = _el('div', 'agent-card');
+        const info = _el('div', 'agent-info');
+        const name = _el('div', 'agent-name');
+        _t(name, 'Task ' + String(auction.task_id || '').slice(0, 8));
+        const meta = _el('div', 'agent-meta');
+        _t(meta, `${Math.round(auction.time_remaining ?? 0)}s remaining | ${auction.bid_count ?? 0} bids`);
+        info.appendChild(name);
+        info.appendChild(meta);
+        card.appendChild(info);
+        container.appendChild(card);
+    });
 }

 function addLog(message, type = 'info') {
--- a/tests/test_mobile_scenarios.py
+++ b/tests/test_mobile_scenarios.py
@@ -279,3 +279,55 @@ def test_M605_health_status_passes_model_to_template(client):
    # The default model is llama3.2 — it should appear in the partial from settings, not hardcoded
    assert response.status_code == 200
    assert "llama3.2" in response.text  # rendered via template variable, not hardcoded literal
+
+
+# ── M7xx — XSS prevention ─────────────────────────────────────────────────────
+
+def _mobile_html() -> str:
+    """Read the mobile template source."""
+    path = Path(__file__).parent.parent / "src" / "dashboard" / "templates" / "mobile.html"
+    return path.read_text()
+
+
+def _swarm_live_html() -> str:
+    """Read the swarm live template source."""
+    path = Path(__file__).parent.parent / "src" / "dashboard" / "templates" / "swarm_live.html"
+    return path.read_text()
+
+
+def test_M701_mobile_chat_no_raw_message_interpolation():
+    """mobile.html must not interpolate ${message} directly into innerHTML — XSS risk."""
+    html = _mobile_html()
+    # The vulnerable pattern is `${message}` inside a template literal assigned to innerHTML
+    # After the fix, message must only appear via textContent assignment
+    assert "textContent = message" in html or "textContent=message" in html, (
+        "mobile.html still uses innerHTML + ${message} interpolation — XSS vulnerability"
+    )
+
+
+def test_M702_mobile_chat_user_input_not_in_innerhtml_template_literal():
+    """${message} must not appear inside a backtick string that is assigned to innerHTML."""
+    html = _mobile_html()
+    # Find all innerHTML += `...` blocks and verify none contain ${message}
+    blocks = re.findall(r"innerHTML\s*\+=?\s*`([^`]*)`", html, re.DOTALL)
+    for block in blocks:
+        assert "${message}" not in block, (
+            "innerHTML template literal still contains ${message} — XSS vulnerability"
+        )
+
+
+def test_M703_swarm_live_agent_name_not_interpolated_in_innerhtml():
+    """swarm_live.html must not put ${agent.name} inside innerHTML template literals."""
+    html = _swarm_live_html()
+    blocks = re.findall(r"innerHTML\s*=\s*agents\.map\([^;]+\)\.join\([^)]*\)", html, re.DOTALL)
+    assert len(blocks) == 0, (
+        "swarm_live.html still uses innerHTML=agents.map(…) with interpolated agent data — XSS vulnerability"
+    )
+
+
+def test_M704_swarm_live_uses_textcontent_for_agent_data():
+    """swarm_live.html must use textContent (not innerHTML) to set agent name/description."""
+    html = _swarm_live_html()
+    assert "textContent" in html, (
+        "swarm_live.html does not use textContent — agent data may be raw-interpolated into DOM"
+    )