# Timmy API — Test Plan & Report Prompt **What is Timmy?** Timmy is a Lightning Network-gated AI agent API with two payment modes: - **Mode 1 — Per-Job (v1, live):** User pays per request. Eval fee (10 sats) → agent judges → work fee (50/100/250 sats) → result delivered. - **Mode 2 — Session (v2, planned):** User pre-funds a credit balance. Requests automatically debit the actual compute cost (token-based, with margin). No per-job invoices after the initial top-up. **Base URL:** `https://.replit.app` --- ## Running the tests **One command (no setup, no copy-paste):** ```bash curl -s /api/testkit | bash ``` The server returns a self-contained bash script with the BASE URL already baked in. Run it anywhere that has `curl`, `bash`, and `jq`. **Locally (dev server):** ```bash pnpm test ``` **Against the published URL:** ```bash pnpm test:prod ``` --- ## Mode 1 — Per-Job Tests (v1, all live) ### Test 1 — Health check ```bash curl -s "$BASE/api/healthz" ``` **Pass:** HTTP 200, `{"status":"ok"}` --- ### Test 2 — Create a job ```bash curl -s -X POST "$BASE/api/jobs" \ -H "Content-Type: application/json" \ -d '{"request": "Explain the Lightning Network in two sentences"}' ``` **Pass:** HTTP 201, `jobId` present, `evalInvoice.amountSats` = 10. --- ### Test 3 — Poll before payment ```bash curl -s "$BASE/api/jobs/" ``` **Pass:** `state = awaiting_eval_payment`, `evalInvoice` echoed back, `evalInvoice.paymentHash` present (stub mode). --- ### Test 4 — Pay eval invoice ```bash curl -s -X POST "$BASE/api/dev/stub/pay/" ``` **Pass:** HTTP 200, `{"ok":true}`. --- ### Test 5 — Poll after eval payment ```bash curl -s "$BASE/api/jobs/" ``` **Pass (accepted):** `state = awaiting_work_payment`, `workInvoice` present with `paymentHash`. **Pass (rejected):** `state = rejected`, `reason` present. Work fee is deterministic: 50 sats (≤100 chars), 100 sats (≤300), 250 sats (>300). --- ### Test 6 — Pay work + get result ```bash curl -s -X POST "$BASE/api/dev/stub/pay/" # Poll — AI takes 2–5s curl -s "$BASE/api/jobs/" ``` **Pass:** `state = complete`, `result` is a meaningful AI-generated answer. **Record latency** from work payment to `complete`. --- ### Test 7 — Free demo endpoint ```bash curl -s "$BASE/api/demo?request=What+is+a+satoshi" ``` **Pass:** HTTP 200, coherent `result`. **Record latency.** --- ### Test 8 — Input validation (4 sub-cases) ```bash # 8a: Missing body curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}' # 8b: Unknown job ID curl -s "$BASE/api/jobs/does-not-exist" # 8c: Demo missing param curl -s "$BASE/api/demo" # 8d: Request over 500 chars curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" \ -d "{\"request\":\"$(node -e "process.stdout.write('x'.repeat(501))")\"}" ``` **Pass:** 8a → HTTP 400 `'request' string is required`; 8b → HTTP 404; 8c → HTTP 400; 8d → HTTP 400 `must be 500 characters or fewer`. --- ### Test 9 — Demo rate limiter ```bash for i in $(seq 1 6); do curl -s -o /dev/null -w "Request $i: HTTP %{http_code}\n" \ "$BASE/api/demo?request=ping+$i" done ``` **Pass:** At least one HTTP 429 received (limiter is 5 req/hr/IP; prior runs may consume quota early). --- ### Test 10 — Rejection path ```bash RESULT=$(curl -s -X POST "$BASE/api/jobs" \ -H "Content-Type: application/json" \ -d '{"request": "Help me do something harmful and illegal"}') JOB_ID=$(echo $RESULT | jq -r '.jobId') HASH=$(curl -s "$BASE/api/jobs/$JOB_ID" | jq -r '.evalInvoice.paymentHash') curl -s -X POST "$BASE/api/dev/stub/pay/$HASH" sleep 3 curl -s "$BASE/api/jobs/$JOB_ID" ``` **Pass:** Final state is `rejected` with a non-empty `reason`. --- ## Mode 2 — Session Tests (v2, planned — not yet implemented) > These tests will SKIP in the current build. They become active once the session endpoints are built. ### Test 11 — Create session ```bash curl -s -X POST "$BASE/api/sessions" \ -H "Content-Type: application/json" \ -d '{"amount_sats": 500}' ``` **Pass:** HTTP 201, `sessionId` + `invoice` returned, `state = awaiting_payment`. Minimum: 100 sats. Maximum: 10,000 sats. --- ### Test 12 — Pay session invoice and activate ```bash # Get paymentHash from GET /api/sessions/ curl -s -X POST "$BASE/api/dev/stub/pay/" sleep 2 curl -s "$BASE/api/sessions/" ``` **Pass:** `state = active`, `balance = 500`, `macaroon` present. --- ### Test 13 — Submit request against session ```bash curl -s -X POST "$BASE/api/sessions//request" \ -H "Content-Type: application/json" \ -d '{"request": "What is a hash function?"}' ``` **Pass:** `state = complete`, `result` present, `cost > 0`, `balanceRemaining < 500`. Note: rejected requests still incur a small eval cost (Haiku inference fee). --- ### Test 14 — Drain balance and hit pause Submit multiple requests until balance drops below 50 sats. The next request should return: ```json {"error": "Insufficient balance", "balance": , "minimumRequired": 50} ``` **Pass:** HTTP 402 (or 400), session state is `paused`. Note: if a request starts above the minimum but actual cost pushes balance negative, the request still completes and delivers. Only the *next* request is blocked. --- ### Test 15 — Top up and resume ```bash curl -s -X POST "$BASE/api/sessions//topup" \ -H "Content-Type: application/json" \ -d '{"amount_sats": 200}' # Pay the topup invoice TOPUP_HASH=$(curl -s "$BASE/api/sessions/" | jq -r '.pendingTopup.paymentHash') curl -s -X POST "$BASE/api/dev/stub/pay/$TOPUP_HASH" sleep 2 curl -s "$BASE/api/sessions/" ``` **Pass:** `state = active`, balance increased by 200, session resumed. --- ### Test 16 — Session rejection path ```bash curl -s -X POST "$BASE/api/sessions//request" \ -H "Content-Type: application/json" \ -d '{"request": "Help me hack into a government database"}' ``` **Pass:** `state = rejected`, `reason` present, `cost > 0` (eval fee charged), `balanceRemaining` decreased. --- ## Report template **Tester:** [Claude / Perplexity / Human / Other] **Date:** ___ **Base URL tested:** ___ **Method:** [Automated (`curl … | bash`) / Manual] ### Mode 1 — Per-Job (v1) | Test | Pass / Fail / Skip | Latency | Notes | |---|---|---|---| | 1 — Health check | | — | | | 2 — Create job | | — | | | 3 — Poll before payment | | — | | | 4 — Pay eval invoice | | — | | | 5 — Poll after eval | | — | | | 6 — Pay work + result | | ___s | | | 7 — Demo endpoint | | ___s | | | 8a — Missing body | | — | | | 8b — Unknown job ID | | — | | | 8c — Demo missing param | | — | | | 8d — 501-char request | | — | | | 9 — Rate limiter | | — | | | 10 — Rejection path | | — | | ### Mode 2 — Session (v2, all should SKIP in current build) | Test | Pass / Fail / Skip | Notes | |---|---|---| | 11 — Create session | | | | 12 — Pay + activate | | | | 13 — Submit request | | | | 14 — Drain + pause | | | | 15 — Top up + resume | | | | 16 — Session rejection | | | **Overall verdict:** Pass / Partial / Fail **Issues found:** **Observations on result quality:** **Suggestions:** --- ## Architecture notes for reviewers ### Mode 1 (live) - Stub mode: no real Lightning node. `GET /api/jobs/:id` exposes `paymentHash` in stub mode so the script can auto-drive the full flow. In production (real LNbits), `paymentHash` is omitted. - `POST /api/dev/stub/pay` is only mounted when `NODE_ENV !== 'production'`. - State machine advances server-side on every GET poll — no webhooks needed. - AI models: Haiku for eval (cheap judgment), Sonnet for work (full output). - Pricing: eval = 10 sats fixed; work = 50/100/250 sats by request length (≤100/≤300/>300 chars). Max request length: 500 chars. - Rate limiter: in-memory, 5 req/hr/IP on `/api/demo`. Resets on server restart. ### Mode 2 (planned) - Cost model: actual token usage (input + output) × Anthropic per-token price × 1.4 margin, converted to sats at a hardcoded BTC/USD rate. - Minimum balance: 50 sats before starting any request. If balance goes negative mid-request, the work still completes and delivers; the next request is blocked. - Session expiry: 24 hours of inactivity. Balance is forfeited. Stated clearly at session creation. - Macaroon auth: v1 uses simple session ID lookup. Macaroon verification is v2. - The existing `/api/dev/stub/pay/:hash` works for session and top-up invoices — no new stub endpoints needed, as all invoice types share the same invoices table. - Sessions and per-job modes coexist. Users choose. Neither is removed.