feat: senior architect quality analysis + XSS fixes + HITL guide

- Add QUALITY_ANALYSIS.md — 10-point architect review covering
  architecture coherence, completeness (~35-40% vs vision), mobile UX,
  security, test coverage, code quality, and DX
- Fix P0 XSS: mobile.html chat input now uses DOM textContent instead
  of innerHTML string interpolation with raw user input
- Fix P0 XSS: swarm_live.html agent/auction rendering rewritten with
  safe DOM methods (_t/_el helpers) — no more ${agent.name} in innerHTML
- Add M7xx test category (4 new tests) covering XSS prevention assertions;
  total suite now 232 passing (was 228)
- HITL session guide included in analysis with step-by-step phone test
  instructions and critical scenario priority ordering

https://claude.ai/code/session_0183Nzcy7TMqjrAopnTtygds
This commit is contained in:
Claude
2026-02-21 18:11:22 +00:00
parent f862ffde93
commit 95555b3738
4 changed files with 420 additions and 39 deletions

306
QUALITY_ANALYSIS.md Normal file
View File

@@ -0,0 +1,306 @@
# Timmy Time — Senior Architect Quality Analysis
**Date:** 2026-02-21
**Branch:** `claude/quality-analysis-mobile-testing-0zgPi`
**Test Suite:** 228/228 passing ✅
---
## Executive Summary
Timmy Time has a strong Python backend skeleton and a working HTMX UI, but the project is at a **critical architectural fork**: a second, fully-detached React frontend was introduced that uses 100% mock/static data with zero API connectivity. This split creates the illusion of a richer app than exists. Completeness against the stated vision is **~35-40%**. The mobile HITL framework is the standout quality asset.
---
## 1. Architecture Coherence — CRITICAL ⚠️
**Score: 3/10**
### Finding: Dual Frontend, Zero Integration
The project ships two separate UIs that both claim to be "Mission Control":
| UI | Tech | Backend Connected? |
|----|------|--------------------|
| `src/dashboard/` | FastAPI + Jinja2 + HTMX | ✅ Yes — real Timmy chat, health, history |
| `dashboard-web/` | React + TypeScript + Vite | ❌ No — 100% static mock data |
The React dashboard (`dashboard-web/client/src/lib/data.ts`) exports `MOCK_CHAT`, `MOCK_HEALTH`, `MOCK_NOTIFICATIONS`, `MOCK_TASKS`, `MOCK_WS_EVENTS` — every data source is hardcoded. There is **not a single `fetch()` call** to the FastAPI backend. The `ChatPanel` simulates responses with `setTimeout()`. The `StatusSidebar` shows a hardcoded Ollama status — it never calls `/health/status`.
**Impact:** The React UI is a clickable mockup, not a product. A new developer would not know which frontend is authoritative.
### Finding: React App Has No Build Config
`dashboard-web/client/` contains `src/` and `index.html` but no `package.json`, `vite.config.ts`, or `tsconfig.json` in that directory. The app imports from `@/components/ui/*` (shadcn/ui) but the `components/ui/` directory does not exist in the repo. The React app is **not buildable as committed**.
---
## 2. Completeness Against Vision — 35-40%
**Score: 4/10**
| Feature | Roadmap | Status |
|---------|---------|--------|
| Agno + Ollama + SQLite dashboard | v1.0.0 | ✅ Complete |
| HTMX chat with history | v1.0.0 | ✅ Complete |
| AirLLM big-brain backend | v1.0.0 | ✅ Complete |
| CLI (chat/think/status) | v1.0.0 | ✅ Complete |
| Swarm registry + coordinator | v2.0.0 | ⚠️ Skeleton only — no real agents |
| Agent personas (Echo, Mace, Forge…) | v2.0.0 | ❌ Catalog only — never instantiated |
| MCP tools integration | v2.0.0 | ❌ Not started |
| Voice NLU | v2.0.0 | ⚠️ Backend module — no live UI |
| Push notifications | v2.0.0 | ⚠️ Backend module — never triggered |
| Siri Shortcuts | v2.0.0 | ⚠️ Endpoint stub only |
| WebSocket live swarm feed | v2.0.0 | ⚠️ Server-side ready — no UI consumer |
| L402 / Lightning payments | v3.0.0 | ⚠️ Mock implementation only |
| Real LND gRPC backend | v3.0.0 | ❌ Not started |
| Single `.app` bundle | v3.0.0 | ❌ Not started |
| React dashboard (live data) | — | ❌ All mock data |
| Mobile HITL checklist | — | ✅ Complete (27 scenarios) |
---
## 3. Mobile UX Audit
**Score: 7/10 (HTMX UI) / 2/10 (React UI)**
### HTMX Dashboard — Strong
The HTMX-served dashboard has solid mobile foundations verified by the automated test suite:
-`viewport-fit=cover` — Dynamic Island / notch support
-`apple-mobile-web-app-capable` — Home Screen PWA mode
-`safe-area-inset-top/bottom` — padding clears notch and home indicator
-`overscroll-behavior: none` — no rubber-band on main page
-`-webkit-overflow-scrolling: touch` — momentum scroll in chat
-`dvh` units — correct height on iOS with collapsing chrome
- ✅ 44px touch targets on SEND button and inputs
-`font-size: 16px` in mobile query — iOS zoom prevention
-`enterkeyhint="send"` — Send-labelled keyboard key
- ✅ HTMX `hx-sync="this:drop"` — double-tap protection
- ✅ HTMX `hx-disabled-elt` — in-flight button lockout
### Gap: Mobile Quick Actions Page (`/mobile`)
The `/mobile` route template shows a "Mobile only" page with quick action tiles and a JS-based chat — but it uses **CSS `display: none` on desktop** via `.mobile-only` with an `@media (min-width: 769px)` rule. The desktop fallback shows a placeholder. This is a valid progressive enhancement approach but the page is not linked from the main nav bar.
### React Dashboard — Mobile Not Functional
The React dashboard uses `hidden lg:flex` for the left sidebar (desktop only) and an `AnimatePresence` slide-in overlay for mobile. The mobile UX architecture is correct. However, because all data is mock, tapping "Chat" produces a simulated response from a setTimeout, not from Ollama. This is not tested and not usable.
---
## 4. Human-in-the-Loop (HITL) Mobile Testing
**Score: 8/10**
The `/mobile-test` route is the standout quality feature. It provides:
- 21 structured test scenarios across 7 categories (Layout, Touch, Chat, Health, Scroll, Notch, Live UI)
- PASS/FAIL/SKIP buttons with sessionStorage persistence across scroll
- Live pass rate counter and progress bar
- Accessible on any phone via local network URL
- ← MISSION CONTROL back-link for easy navigation
**Gaps to improve:**
- No server-side results storage — results lost when tab closes
- No shareable/exportable report (screenshot required for handoff)
- React dashboard has no equivalent HITL page
- No automated Playwright/Selenium mobile tests that could catch regressions
---
## 5. Security Assessment
**Score: 5/10**
### XSS Vulnerability — `/mobile` template
`mobile.html` line ~85 uses raw `innerHTML` string interpolation with user-supplied message content:
```javascript
// mobile.html — VULNERABLE
chat.innerHTML += `
<div class="chat-message user">
<div>${message}</div> <!-- message is user input, not escaped -->
</div>
`;
```
If a user types `<img src=x onerror=alert(1)>`, it executes. This is a stored XSS via `innerHTML`. Fix: use `document.createTextNode(message)` or escape HTML before insertion.
The `swarm_live.html` has the same pattern with WebSocket data:
```javascript
container.innerHTML = agents.map(agent => `...${agent.name}...`).join('');
```
If agent names contain `<script>` tags (or any HTML), this executes in context.
### Hardcoded Secrets
`l402_proxy.py`: `_MACAROON_SECRET = "timmy-macaroon-secret".encode()` (default)
`payment_handler.py`: `_HMAC_SECRET = "timmy-sovereign-sats".encode()` (default)
Both fall back to env var reads which is correct, but the defaults should not be production-safe strings — they should be None with a startup assertion requiring them to be set.
### No Route Authentication
All `/swarm/spawn`, `/swarm/tasks`, `/marketplace`, `/agents/timmy/chat` endpoints have no auth guard. On a `--host 0.0.0.0` server, anyone on the local network can post tasks or clear chat history. Acceptable for v1 local-only use but must be documented and gated before LAN exposure.
---
## 6. Test Coverage
**Score: 7/10**
| Suite | Tests | Quality |
|-------|-------|---------|
| Agent unit | 13 | Good |
| Backends | 14 | Good |
| Mobile scenarios | 32 | Excellent — covers M1xx-M6xx categories |
| Swarm | 29+10+16 | Good |
| L402 proxy | 13 | Good |
| Voice NLU | 15 | Good |
| Dashboard routes | 18+18 | Good |
| WebSocket | 3 | Thin — no reconnect or message-type tests |
| React components | 0 | Missing entirely |
| End-to-end (Playwright) | 0 | Missing |
**Key gaps:**
1. No tests for the XSS vulnerabilities
2. No tests for the `/mobile` quick-chat endpoint
3. WebSocket tests don't cover reconnection logic or malformed payloads
4. React app has zero test coverage
---
## 7. Code Quality
**Score: 7/10**
**Strengths:**
- Clean module separation (`timmy/`, `swarm/`, `dashboard/routes/`, `timmy_serve/`)
- Consistent use of dataclasses for domain models
- Good docstrings on all public functions
- SQLite-backed persistence for both Agno memory and swarm registry
- pydantic-settings config with `.env` override support
**Weaknesses:**
- Swarm `coordinator.py` uses a module-level singleton `coordinator = SwarmCoordinator()` — not injectable, hard to test in isolation
- `swarm/registry.py` opens a new SQLite connection on every call (no connection pool)
- `dashboard/routes/swarm.py` creates a new `Jinja2Templates` instance — it should reuse the one from `app.py`
- React components import from `@/components/ui/*` which don't exist in the committed tree
---
## 8. Developer Experience
**Score: 6/10**
**Strengths:**
- README is excellent — copy-paste friendly, covers Mac quickstart, phone access, troubleshooting
- DEVELOPMENT_REPORT.md provides full history of what was built and why
- `.env.example` covers all config variables
- Self-TDD watchdog CLI is a creative addition
**Weaknesses:**
- No `docker-compose.yml` — setup requires manual Python venv + Ollama install
- Two apps (FastAPI + React) with no single `make dev` command to start both
- `STATUS.md` says v1.0.0 but development is well past that — version drift
- React app missing from the quickstart instructions entirely
---
## 9. Backend Architecture
**Score: 7/10**
The FastAPI backend is well-structured. The swarm subsystem follows a clean coordinator pattern. The L402 mock is architecturally correct (the interface matches what real LND calls would require).
**Gaps:**
- Swarm "agents" are database records — `spawn_agent()` registers a record but no Python process is actually launched. `agent_runner.py` uses `subprocess.Popen` to run `python -m swarm.agent_runner` but no `__main__` block exists in that file.
- The bidding system (`bidder.py`) runs an asyncio auction but there are no actual bidder agents submitting bids — auctions will always time out with no winner.
- Voice TTS (`voice_tts.py`) requires `pyttsx3` (optional dep) but the voice route offers no graceful fallback message when pyttsx3 is absent.
---
## 10. Prioritized Defects
| Priority | ID | Issue | File |
|----------|----|-------|------|
| P0 | SEC-01 | XSS via innerHTML with unsanitized user input | `mobile.html:85`, `swarm_live.html:72` |
| P0 | ARCH-01 | React dashboard 100% mock — no backend calls | `dashboard-web/client/src/` |
| P0 | ARCH-02 | React app not buildable — missing package.json, shadcn/ui | `dashboard-web/client/` |
| P1 | SEC-02 | Hardcoded L402/HMAC secrets without startup assertion | `l402_proxy.py`, `payment_handler.py` |
| P1 | FUNC-01 | Swarm spawn creates DB record but no process | `swarm/agent_runner.py` |
| P1 | FUNC-02 | Auction always fails — no real bid submitters | `swarm/bidder.py` |
| P2 | UX-01 | `/mobile` route not in desktop nav | `base.html`, `index.html` |
| P2 | TEST-01 | WebSocket reconnection not tested | `tests/test_websocket.py` |
| P2 | DX-01 | No single dev startup command | `README.md` |
| P3 | PERF-01 | SQLite connection opened per-query in registry | `swarm/registry.py` |
---
## HITL Mobile Test Session Guide
To run a complete human-in-the-loop mobile test session right now:
```bash
# 1. Start the dashboard
source .venv/bin/activate
uvicorn dashboard.app:app --host 0.0.0.0 --port 8000 --reload
# 2. Find your local IP
ipconfig getifaddr en0 # macOS
hostname -I # Linux
# 3. Open on your phone (same Wi-Fi)
http://<YOUR_IP>:8000/mobile-test
# 4. Work through the 21 scenarios, marking PASS / FAIL / SKIP
# 5. Screenshot the SUMMARY section for your records
# ─── Also test the main dashboard on mobile ───────────────────────
http://<YOUR_IP>:8000 # Main Mission Control
http://<YOUR_IP>:8000/mobile # Quick Actions (mobile-optimized)
```
**Critical scenarios to test first:**
- T01 — iOS zoom prevention (tap input, watch for zoom)
- C02 — Multi-turn memory (tell Timmy your name, ask it back)
- C04 — Offline graceful error (stop Ollama, send message)
- N01/N02 — Notch / home bar clearance (notched iPhone)
---
## Recommended Next Prompt for Development
```
Connect the React dashboard (dashboard-web/) to the live FastAPI backend.
Priority order:
1. FIX BUILD FIRST: Add package.json, vite.config.ts, tailwind.config.ts, and
tsconfig.json to dashboard-web/client/. Install shadcn/ui so the existing
component imports resolve. Verify `npm run dev` starts the app.
2. CHAT (highest user value): Replace ChatPanel mock with real fetch to
POST /agents/timmy/chat. Show the actual Timmy response from Ollama.
Implement loading state (matches the existing isTyping UI).
3. HEALTH: Replace MOCK_HEALTH in StatusSidebar with a polling fetch to
GET /health/status (every 30s, matching HTMX behaviour).
4. SWARM WEBSOCKET: Open a real WebSocket to ws://localhost:8000/swarm/ws
and pipe state updates into SwarmPanel — replacing MOCK_WS_EVENTS.
5. SECURITY: Fix XSS in mobile.html and swarm_live.html — replace innerHTML
string interpolation with safe DOM methods (textContent / createTextNode).
Use React Query (TanStack) for data fetching with stale-while-revalidate.
Keep the existing HTMX dashboard running in parallel — the React app should
be the forward-looking UI.
```
---
*Analysis by Claude Code — Senior Architect Review*
*Timmy Time Dashboard | branch: claude/quality-analysis-mobile-testing-0zgPi*

View File

@@ -159,13 +159,17 @@ async function sendMobileMessage(event) {
const chat = document.getElementById('mobile-chat');
// Add user message
chat.innerHTML += `
<div class="chat-message user">
<div class="chat-meta">You</div>
<div>${message}</div>
</div>
`;
// Add user message — use DOM methods to avoid XSS
const userDiv = document.createElement('div');
userDiv.className = 'chat-message user';
const userMeta = document.createElement('div');
userMeta.className = 'chat-meta';
userMeta.textContent = 'You';
const userText = document.createElement('div');
userText.textContent = message; // textContent escapes HTML
userDiv.appendChild(userMeta);
userDiv.appendChild(userText);
chat.appendChild(userDiv);
chat.scrollTop = chat.scrollHeight;
input.value = '';

View File

@@ -104,50 +104,69 @@ function handleMessage(message) {
}
}
// Safe text setter — avoids XSS when inserting user/server data into DOM
function _t(el, text) { el.textContent = text; return el; }
function _el(tag, cls) { const e = document.createElement(tag); if (cls) e.className = cls; return e; }
function updateAgentsList(agents) {
const container = document.getElementById('agents-list');
container.innerHTML = '';
if (!agents || agents.length === 0) {
container.innerHTML = '<p style="color: var(--text-muted);">No agents registered</p>';
const p = _el('p'); p.style.color = 'var(--text-muted)';
_t(p, 'No agents registered');
container.appendChild(p);
return;
}
container.innerHTML = agents.map(agent => `
<div class="agent-card">
<div class="agent-avatar">${agent.name.charAt(0).toUpperCase()}</div>
<div class="agent-info">
<div class="agent-name">${agent.name}</div>
<div class="agent-meta">${agent.description || 'No description'}</div>
<div class="agent-meta">
<span class="badge badge-${agent.status === 'active' ? 'success' : agent.status === 'busy' ? 'warning' : 'danger'}">${agent.status}</span>
${agent.min_bid} sats min bid
| ${agent.tasks_completed} tasks
| ${agent.total_earned} sats earned
</div>
</div>
</div>
`).join('');
agents.forEach(agent => {
const card = _el('div', 'agent-card');
const avatar = _el('div', 'agent-avatar');
_t(avatar, (agent.name || '?').charAt(0).toUpperCase());
const info = _el('div', 'agent-info');
const name = _el('div', 'agent-name');
_t(name, agent.name || '');
const desc = _el('div', 'agent-meta');
_t(desc, agent.description || 'No description');
const meta = _el('div', 'agent-meta');
const badge = _el('span', `badge badge-${agent.status === 'active' ? 'success' : agent.status === 'busy' ? 'warning' : 'danger'}`);
_t(badge, agent.status || '');
const stats = _el('span');
_t(stats, ` ${agent.min_bid ?? 0} sats min bid | ${agent.tasks_completed ?? 0} tasks | ${agent.total_earned ?? 0} sats earned`);
meta.appendChild(badge);
meta.appendChild(stats);
info.appendChild(name);
info.appendChild(desc);
info.appendChild(meta);
card.appendChild(avatar);
card.appendChild(info);
container.appendChild(card);
});
}
function updateAuctionsList(auctions) {
const container = document.getElementById('auctions-list');
container.innerHTML = '';
if (!auctions || auctions.length === 0) {
container.innerHTML = '<p style="color: var(--text-muted);">No active auctions</p>';
const p = _el('p'); p.style.color = 'var(--text-muted)';
_t(p, 'No active auctions');
container.appendChild(p);
return;
}
container.innerHTML = auctions.map(auction => `
<div class="agent-card">
<div class="agent-info">
<div class="agent-name">Task ${auction.task_id.slice(0, 8)}</div>
<div class="agent-meta">
${Math.round(auction.time_remaining)}s remaining
| ${auction.bid_count} bids
</div>
</div>
</div>
`).join('');
auctions.forEach(auction => {
const card = _el('div', 'agent-card');
const info = _el('div', 'agent-info');
const name = _el('div', 'agent-name');
_t(name, 'Task ' + String(auction.task_id || '').slice(0, 8));
const meta = _el('div', 'agent-meta');
_t(meta, `${Math.round(auction.time_remaining ?? 0)}s remaining | ${auction.bid_count ?? 0} bids`);
info.appendChild(name);
info.appendChild(meta);
card.appendChild(info);
container.appendChild(card);
});
}
function addLog(message, type = 'info') {

View File

@@ -279,3 +279,55 @@ def test_M605_health_status_passes_model_to_template(client):
# The default model is llama3.2 — it should appear in the partial from settings, not hardcoded
assert response.status_code == 200
assert "llama3.2" in response.text # rendered via template variable, not hardcoded literal
# ── M7xx — XSS prevention ─────────────────────────────────────────────────────
def _mobile_html() -> str:
"""Read the mobile template source."""
path = Path(__file__).parent.parent / "src" / "dashboard" / "templates" / "mobile.html"
return path.read_text()
def _swarm_live_html() -> str:
"""Read the swarm live template source."""
path = Path(__file__).parent.parent / "src" / "dashboard" / "templates" / "swarm_live.html"
return path.read_text()
def test_M701_mobile_chat_no_raw_message_interpolation():
"""mobile.html must not interpolate ${message} directly into innerHTML — XSS risk."""
html = _mobile_html()
# The vulnerable pattern is `${message}` inside a template literal assigned to innerHTML
# After the fix, message must only appear via textContent assignment
assert "textContent = message" in html or "textContent=message" in html, (
"mobile.html still uses innerHTML + ${message} interpolation — XSS vulnerability"
)
def test_M702_mobile_chat_user_input_not_in_innerhtml_template_literal():
"""${message} must not appear inside a backtick string that is assigned to innerHTML."""
html = _mobile_html()
# Find all innerHTML += `...` blocks and verify none contain ${message}
blocks = re.findall(r"innerHTML\s*\+=?\s*`([^`]*)`", html, re.DOTALL)
for block in blocks:
assert "${message}" not in block, (
"innerHTML template literal still contains ${message} — XSS vulnerability"
)
def test_M703_swarm_live_agent_name_not_interpolated_in_innerhtml():
"""swarm_live.html must not put ${agent.name} inside innerHTML template literals."""
html = _swarm_live_html()
blocks = re.findall(r"innerHTML\s*=\s*agents\.map\([^;]+\)\.join\([^)]*\)", html, re.DOTALL)
assert len(blocks) == 0, (
"swarm_live.html still uses innerHTML=agents.map(…) with interpolated agent data — XSS vulnerability"
)
def test_M704_swarm_live_uses_textcontent_for_agent_data():
"""swarm_live.html must use textContent (not innerHTML) to set agent name/description."""
html = _swarm_live_html()
assert "textContent" in html, (
"swarm_live.html does not use textContent — agent data may be raw-interpolated into DOM"
)