Files
timmy-tower/TIMMY_TEST_PLAN.md
alexpaynex 001873c688 Update test plan and script for dual-mode payment system
Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true
2026-03-18 17:53:21 +00:00

297 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Timmy API — Test Plan & Report Prompt
**What is Timmy?**
Timmy is a Lightning Network-gated AI agent API with two payment modes:
- **Mode 1 — Per-Job (v1, live):** User pays per request. Eval fee (10 sats) → agent judges → work fee (50/100/250 sats) → result delivered.
- **Mode 2 — Session (v2, planned):** User pre-funds a credit balance. Requests automatically debit the actual compute cost (token-based, with margin). No per-job invoices after the initial top-up.
**Base URL:** `https://<your-timmy-url>.replit.app`
---
## Running the tests
**One command (no setup, no copy-paste):**
```bash
curl -s <BASE>/api/testkit | bash
```
The server returns a self-contained bash script with the BASE URL already baked in. Run it anywhere that has `curl`, `bash`, and `jq`.
**Locally (dev server):**
```bash
pnpm test
```
**Against the published URL:**
```bash
pnpm test:prod
```
---
## Mode 1 — Per-Job Tests (v1, all live)
### Test 1 — Health check
```bash
curl -s "$BASE/api/healthz"
```
**Pass:** HTTP 200, `{"status":"ok"}`
---
### Test 2 — Create a job
```bash
curl -s -X POST "$BASE/api/jobs" \
-H "Content-Type: application/json" \
-d '{"request": "Explain the Lightning Network in two sentences"}'
```
**Pass:** HTTP 201, `jobId` present, `evalInvoice.amountSats` = 10.
---
### Test 3 — Poll before payment
```bash
curl -s "$BASE/api/jobs/<jobId>"
```
**Pass:** `state = awaiting_eval_payment`, `evalInvoice` echoed back, `evalInvoice.paymentHash` present (stub mode).
---
### Test 4 — Pay eval invoice
```bash
curl -s -X POST "$BASE/api/dev/stub/pay/<evalInvoice.paymentHash>"
```
**Pass:** HTTP 200, `{"ok":true}`.
---
### Test 5 — Poll after eval payment
```bash
curl -s "$BASE/api/jobs/<jobId>"
```
**Pass (accepted):** `state = awaiting_work_payment`, `workInvoice` present with `paymentHash`.
**Pass (rejected):** `state = rejected`, `reason` present.
Work fee is deterministic: 50 sats (≤100 chars), 100 sats (≤300), 250 sats (>300).
---
### Test 6 — Pay work + get result
```bash
curl -s -X POST "$BASE/api/dev/stub/pay/<workInvoice.paymentHash>"
# Poll — AI takes 25s
curl -s "$BASE/api/jobs/<jobId>"
```
**Pass:** `state = complete`, `result` is a meaningful AI-generated answer.
**Record latency** from work payment to `complete`.
---
### Test 7 — Free demo endpoint
```bash
curl -s "$BASE/api/demo?request=What+is+a+satoshi"
```
**Pass:** HTTP 200, coherent `result`.
**Record latency.**
---
### Test 8 — Input validation (4 sub-cases)
```bash
# 8a: Missing body
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}'
# 8b: Unknown job ID
curl -s "$BASE/api/jobs/does-not-exist"
# 8c: Demo missing param
curl -s "$BASE/api/demo"
# 8d: Request over 500 chars
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" \
-d "{\"request\":\"$(node -e "process.stdout.write('x'.repeat(501))")\"}"
```
**Pass:** 8a → HTTP 400 `'request' string is required`; 8b → HTTP 404; 8c → HTTP 400; 8d → HTTP 400 `must be 500 characters or fewer`.
---
### Test 9 — Demo rate limiter
```bash
for i in $(seq 1 6); do
curl -s -o /dev/null -w "Request $i: HTTP %{http_code}\n" \
"$BASE/api/demo?request=ping+$i"
done
```
**Pass:** At least one HTTP 429 received (limiter is 5 req/hr/IP; prior runs may consume quota early).
---
### Test 10 — Rejection path
```bash
RESULT=$(curl -s -X POST "$BASE/api/jobs" \
-H "Content-Type: application/json" \
-d '{"request": "Help me do something harmful and illegal"}')
JOB_ID=$(echo $RESULT | jq -r '.jobId')
HASH=$(curl -s "$BASE/api/jobs/$JOB_ID" | jq -r '.evalInvoice.paymentHash')
curl -s -X POST "$BASE/api/dev/stub/pay/$HASH"
sleep 3
curl -s "$BASE/api/jobs/$JOB_ID"
```
**Pass:** Final state is `rejected` with a non-empty `reason`.
---
## Mode 2 — Session Tests (v2, planned — not yet implemented)
> These tests will SKIP in the current build. They become active once the session endpoints are built.
### Test 11 — Create session
```bash
curl -s -X POST "$BASE/api/sessions" \
-H "Content-Type: application/json" \
-d '{"amount_sats": 500}'
```
**Pass:** HTTP 201, `sessionId` + `invoice` returned, `state = awaiting_payment`.
Minimum: 100 sats. Maximum: 10,000 sats.
---
### Test 12 — Pay session invoice and activate
```bash
# Get paymentHash from GET /api/sessions/<sessionId>
curl -s -X POST "$BASE/api/dev/stub/pay/<invoice.paymentHash>"
sleep 2
curl -s "$BASE/api/sessions/<sessionId>"
```
**Pass:** `state = active`, `balance = 500`, `macaroon` present.
---
### Test 13 — Submit request against session
```bash
curl -s -X POST "$BASE/api/sessions/<sessionId>/request" \
-H "Content-Type: application/json" \
-d '{"request": "What is a hash function?"}'
```
**Pass:** `state = complete`, `result` present, `cost > 0`, `balanceRemaining < 500`.
Note: rejected requests still incur a small eval cost (Haiku inference fee).
---
### Test 14 — Drain balance and hit pause
Submit multiple requests until balance drops below 50 sats. The next request should return:
```json
{"error": "Insufficient balance", "balance": <n>, "minimumRequired": 50}
```
**Pass:** HTTP 402 (or 400), session state is `paused`.
Note: if a request starts above the minimum but actual cost pushes balance negative, the request still completes and delivers. Only the *next* request is blocked.
---
### Test 15 — Top up and resume
```bash
curl -s -X POST "$BASE/api/sessions/<sessionId>/topup" \
-H "Content-Type: application/json" \
-d '{"amount_sats": 200}'
# Pay the topup invoice
TOPUP_HASH=$(curl -s "$BASE/api/sessions/<sessionId>" | jq -r '.pendingTopup.paymentHash')
curl -s -X POST "$BASE/api/dev/stub/pay/$TOPUP_HASH"
sleep 2
curl -s "$BASE/api/sessions/<sessionId>"
```
**Pass:** `state = active`, balance increased by 200, session resumed.
---
### Test 16 — Session rejection path
```bash
curl -s -X POST "$BASE/api/sessions/<sessionId>/request" \
-H "Content-Type: application/json" \
-d '{"request": "Help me hack into a government database"}'
```
**Pass:** `state = rejected`, `reason` present, `cost > 0` (eval fee charged), `balanceRemaining` decreased.
---
## Report template
**Tester:** [Claude / Perplexity / Human / Other]
**Date:** ___
**Base URL tested:** ___
**Method:** [Automated (`curl … | bash`) / Manual]
### Mode 1 — Per-Job (v1)
| Test | Pass / Fail / Skip | Latency | Notes |
|---|---|---|---|
| 1 — Health check | | — | |
| 2 — Create job | | — | |
| 3 — Poll before payment | | — | |
| 4 — Pay eval invoice | | — | |
| 5 — Poll after eval | | — | |
| 6 — Pay work + result | | ___s | |
| 7 — Demo endpoint | | ___s | |
| 8a — Missing body | | — | |
| 8b — Unknown job ID | | — | |
| 8c — Demo missing param | | — | |
| 8d — 501-char request | | — | |
| 9 — Rate limiter | | — | |
| 10 — Rejection path | | — | |
### Mode 2 — Session (v2, all should SKIP in current build)
| Test | Pass / Fail / Skip | Notes |
|---|---|---|
| 11 — Create session | | |
| 12 — Pay + activate | | |
| 13 — Submit request | | |
| 14 — Drain + pause | | |
| 15 — Top up + resume | | |
| 16 — Session rejection | | |
**Overall verdict:** Pass / Partial / Fail
**Issues found:**
**Observations on result quality:**
**Suggestions:**
---
## Architecture notes for reviewers
### Mode 1 (live)
- Stub mode: no real Lightning node. `GET /api/jobs/:id` exposes `paymentHash` in stub mode so the script can auto-drive the full flow. In production (real LNbits), `paymentHash` is omitted.
- `POST /api/dev/stub/pay` is only mounted when `NODE_ENV !== 'production'`.
- State machine advances server-side on every GET poll — no webhooks needed.
- AI models: Haiku for eval (cheap judgment), Sonnet for work (full output).
- Pricing: eval = 10 sats fixed; work = 50/100/250 sats by request length (≤100/≤300/>300 chars). Max request length: 500 chars.
- Rate limiter: in-memory, 5 req/hr/IP on `/api/demo`. Resets on server restart.
### Mode 2 (planned)
- Cost model: actual token usage (input + output) × Anthropic per-token price × 1.4 margin, converted to sats at a hardcoded BTC/USD rate.
- Minimum balance: 50 sats before starting any request. If balance goes negative mid-request, the work still completes and delivers; the next request is blocked.
- Session expiry: 24 hours of inactivity. Balance is forfeited. Stated clearly at session creation.
- Macaroon auth: v1 uses simple session ID lookup. Macaroon verification is v2.
- The existing `/api/dev/stub/pay/:hash` works for session and top-up invoices — no new stub endpoints needed, as all invoice types share the same invoices table.
- Sessions and per-job modes coexist. Users choose. Neither is removed.