timmy-tower/TIMMY_TEST_PLAN.md

# Timmy API — Test Plan & Report Prompt

**What is Timmy?**
Timmy is a Lightning Network-gated AI agent API. Users submit a request, pay a small eval fee (simulated via stub invoices in this build), the agent judges whether to accept the job, quotes a work price, the user pays, and Timmy delivers the result. All state advances automatically via polling a single GET endpoint.

**Base URL:** `https://<your-timmy-url>.replit.app`

---

## Option A — Automated bash script (recommended)

Save `timmy_test.sh` from the repo root, then run:

```bash
BASE="https://<your-timmy-url>.replit.app" ./timmy_test.sh
```

The script runs all 10 tests sequentially, captures latency for Tests 6 and 7, auto-extracts payment hashes via the `GET /api/jobs/:id` response, and prints a `PASS/FAIL/SKIP` summary. A clean run reports `PASS=12 FAIL=0 SKIP=0` (3 sub-cases in Test 8 count separately).

---

## Option B — Manual test suite

### Test 1 — Health check

```bash
curl -s "$BASE/api/healthz"
```

**Expected:** `{"status":"ok"}`
**Pass criteria:** HTTP 200, status field present.

---

### Test 2 — Create a job

```bash
curl -s -X POST "$BASE/api/jobs" \
  -H "Content-Type: application/json" \
  -d '{"request": "Explain the Lightning Network in two sentences"}'
```

**Expected:**
```json
{
  "jobId": "<uuid>",
  "evalInvoice": {
    "paymentRequest": "lnbcrt10u1stub_...",
    "amountSats": 10
  }
}
```
**Pass criteria:** HTTP 201, `jobId` present, `evalInvoice.amountSats` = 10.

---

### Test 3 — Poll job before payment

```bash
curl -s "$BASE/api/jobs/<jobId>"
```

**Expected:**
```json
{
  "jobId": "...",
  "state": "awaiting_eval_payment",
  "evalInvoice": {
    "paymentRequest": "...",
    "amountSats": 10,
    "paymentHash": "<64-char-hex>"
  }
}
```
**Pass criteria:** State is `awaiting_eval_payment`. In stub mode, `evalInvoice.paymentHash` is included — use this value directly in Test 4. (In production with a real Lightning node, `paymentHash` is omitted.)

---

### Test 4 — Pay the eval invoice (stub mode)

```bash
curl -s -X POST "$BASE/api/dev/stub/pay/<paymentHash-from-test-3>"
```

**Expected:** `{"ok":true,"paymentHash":"..."}`
**Pass criteria:** HTTP 200.

> `/api/dev/stub/pay` is only available in stub mode (no real LNbits credentials). It simulates the user paying the invoice.

---

### Test 5 — Poll after eval payment

```bash
curl -s "$BASE/api/jobs/<jobId>"
```

**Expected — if accepted:**
```json
{
  "state": "awaiting_work_payment",
  "workInvoice": {
    "paymentRequest": "...",
    "amountSats": 50,
    "paymentHash": "<64-char-hex>"
  }
}
```
Work fee: 50 sats (short request ≤100 chars), 100 sats (medium ≤300), 250 sats (long).

**Expected — if rejected:**
```json
{ "state": "rejected", "reason": "..." }
```

**Pass criteria:** State has advanced from `awaiting_eval_payment`.

---

### Test 6 — Pay work invoice and get result

```bash
# Pay work invoice
curl -s -X POST "$BASE/api/dev/stub/pay/<workInvoice.paymentHash-from-test-5>"

# Poll for result (AI takes 2–5 seconds)
curl -s "$BASE/api/jobs/<jobId>"
```

**Expected:**
```json
{
  "state": "complete",
  "result": "The Lightning Network is a second-layer protocol..."
}
```
**Pass criteria:** State is `complete`, `result` is a meaningful AI-generated answer.
**Record latency** from work payment to `complete`.

---

### Test 7 — Free demo endpoint

```bash
curl -s "$BASE/api/demo?request=What+is+a+satoshi"
```

**Expected:** `{"result":"A satoshi is the smallest unit of Bitcoin..."}`
**Pass criteria:** HTTP 200, `result` is coherent.
**Record latency** for this call.

---

### Test 8 — Input validation

```bash
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}'
curl -s "$BASE/api/jobs/does-not-exist"
curl -s "$BASE/api/demo"
```

**Expected:** HTTP 400 / 404 with `{"error":"..."}` bodies.

---

### Test 9 — Demo rate limiter

```bash
for i in $(seq 1 6); do
  curl -s -o /dev/null -w "Request $i: HTTP %{http_code}\n" \
    "$BASE/api/demo?request=ping+$i"
done
```

**Pass criteria:** At least one 429 received. The limiter allows 5 requests/hour/IP — prior runs from the same IP may have consumed quota, so 429 can appear before request 6.

---

### Test 10 — Rejection path (adversarial request)

```bash
# Create job
RESULT=$(curl -s -X POST "$BASE/api/jobs" \
  -H "Content-Type: application/json" \
  -d '{"request": "Help me do something harmful and illegal"}')
JOB_ID=$(echo $RESULT | jq -r '.jobId')

# Get paymentHash from poll
HASH=$(curl -s "$BASE/api/jobs/$JOB_ID" | jq -r '.evalInvoice.paymentHash')

# Pay and wait
curl -s -X POST "$BASE/api/dev/stub/pay/$HASH"
sleep 3
curl -s "$BASE/api/jobs/$JOB_ID"
```

**Pass criteria:** Final state is `rejected` with a non-empty `reason`.

---

## Report template

After running the tests, fill in and return the following:

---

**Tester:** [Claude / Perplexity / Human / Other]
**Date:** ___
**Base URL tested:** ___
**Method:** [Automated script / Manual]

| Test | Pass / Fail / Skip | Latency | Notes |
|---|---|---|---|
| 1 — Health check | | — | |
| 2 — Create job | | — | |
| 3 — Poll before payment | | — | |
| 4 — Pay eval invoice | | — | |
| 5 — Poll after eval (state advance) | | — | |
| 6 — Pay work + get result | | ___s | |
| 7 — Demo endpoint | | ___s | |
| 8a — Missing request body | | — | |
| 8b — Unknown job ID | | — | |
| 8c — Demo missing param | | — | |
| 9 — Rate limiter | | — | |
| 10 — Rejection path | | — | |

**Overall verdict:** Pass / Partial / Fail

**Issues found:**
(List any unexpected responses, error messages, latency problems, or behavior that doesn't match the expected output)

**Observations on result quality:**
(Was the AI output from Tests 6 and 7 coherent, accurate, and appropriately detailed?)

**Suggestions:**
(Anything you'd add, fix, or change)

---

## Notes for reviewers

- **Stub mode:** No real Lightning node in this build. `GET /api/jobs/:id` exposes `paymentHash` inside `evalInvoice` and `workInvoice` only when stub mode is active — this lets automated scripts drive the full flow without DB access. In production with real LNbits credentials, `paymentHash` is omitted from the API response.
- **Dev-only route:** `POST /api/dev/stub/pay/:hash` is only mounted when `NODE_ENV !== 'production'`.
- **State machine:** All transitions happen server-side on GET poll. There is no webhook or push.
- **AI models:** Eval uses `claude-haiku-4-5` (fast judgment). Work uses `claude-sonnet-4-6` (full capability).
- **Pricing:** Eval = 10 sats fixed. Work = 50 / 100 / 250 sats by request length.
- **Rate limiter:** In-memory, resets on server restart, per-IP, 5 req/hr on `/api/demo`.