Files
timmy-tower/TIMMY_TEST_PLAN.md
alexpaynex 53bc93a9b4 Add automated testing script and expose payment hashes
Integrates a new bash script for automated end-to-end testing of the Timmy API. Updates API routes to expose payment hashes in stub mode for easier invoice payment simulation during testing. Modifies test plan documentation to include the new automated script.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 6f2776b0-a913-41d3-a988-759a82feb6f3
Replit-Helium-Checkpoint-Created: true
2026-03-18 17:30:13 +00:00

248 lines
6.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Timmy API — Test Plan & Report Prompt
**What is Timmy?**
Timmy is a Lightning Network-gated AI agent API. Users submit a request, pay a small eval fee (simulated via stub invoices in this build), the agent judges whether to accept the job, quotes a work price, the user pays, and Timmy delivers the result. All state advances automatically via polling a single GET endpoint.
**Base URL:** `https://<your-timmy-url>.replit.app`
---
## Option A — Automated bash script (recommended)
Save `timmy_test.sh` from the repo root, then run:
```bash
BASE="https://<your-timmy-url>.replit.app" ./timmy_test.sh
```
The script runs all 10 tests sequentially, captures latency for Tests 6 and 7, auto-extracts payment hashes via the `GET /api/jobs/:id` response, and prints a `PASS/FAIL/SKIP` summary. A clean run reports `PASS=12 FAIL=0 SKIP=0` (3 sub-cases in Test 8 count separately).
---
## Option B — Manual test suite
### Test 1 — Health check
```bash
curl -s "$BASE/api/healthz"
```
**Expected:** `{"status":"ok"}`
**Pass criteria:** HTTP 200, status field present.
---
### Test 2 — Create a job
```bash
curl -s -X POST "$BASE/api/jobs" \
-H "Content-Type: application/json" \
-d '{"request": "Explain the Lightning Network in two sentences"}'
```
**Expected:**
```json
{
"jobId": "<uuid>",
"evalInvoice": {
"paymentRequest": "lnbcrt10u1stub_...",
"amountSats": 10
}
}
```
**Pass criteria:** HTTP 201, `jobId` present, `evalInvoice.amountSats` = 10.
---
### Test 3 — Poll job before payment
```bash
curl -s "$BASE/api/jobs/<jobId>"
```
**Expected:**
```json
{
"jobId": "...",
"state": "awaiting_eval_payment",
"evalInvoice": {
"paymentRequest": "...",
"amountSats": 10,
"paymentHash": "<64-char-hex>"
}
}
```
**Pass criteria:** State is `awaiting_eval_payment`. In stub mode, `evalInvoice.paymentHash` is included — use this value directly in Test 4. (In production with a real Lightning node, `paymentHash` is omitted.)
---
### Test 4 — Pay the eval invoice (stub mode)
```bash
curl -s -X POST "$BASE/api/dev/stub/pay/<paymentHash-from-test-3>"
```
**Expected:** `{"ok":true,"paymentHash":"..."}`
**Pass criteria:** HTTP 200.
> `/api/dev/stub/pay` is only available in stub mode (no real LNbits credentials). It simulates the user paying the invoice.
---
### Test 5 — Poll after eval payment
```bash
curl -s "$BASE/api/jobs/<jobId>"
```
**Expected — if accepted:**
```json
{
"state": "awaiting_work_payment",
"workInvoice": {
"paymentRequest": "...",
"amountSats": 50,
"paymentHash": "<64-char-hex>"
}
}
```
Work fee: 50 sats (short request ≤100 chars), 100 sats (medium ≤300), 250 sats (long).
**Expected — if rejected:**
```json
{ "state": "rejected", "reason": "..." }
```
**Pass criteria:** State has advanced from `awaiting_eval_payment`.
---
### Test 6 — Pay work invoice and get result
```bash
# Pay work invoice
curl -s -X POST "$BASE/api/dev/stub/pay/<workInvoice.paymentHash-from-test-5>"
# Poll for result (AI takes 25 seconds)
curl -s "$BASE/api/jobs/<jobId>"
```
**Expected:**
```json
{
"state": "complete",
"result": "The Lightning Network is a second-layer protocol..."
}
```
**Pass criteria:** State is `complete`, `result` is a meaningful AI-generated answer.
**Record latency** from work payment to `complete`.
---
### Test 7 — Free demo endpoint
```bash
curl -s "$BASE/api/demo?request=What+is+a+satoshi"
```
**Expected:** `{"result":"A satoshi is the smallest unit of Bitcoin..."}`
**Pass criteria:** HTTP 200, `result` is coherent.
**Record latency** for this call.
---
### Test 8 — Input validation
```bash
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}'
curl -s "$BASE/api/jobs/does-not-exist"
curl -s "$BASE/api/demo"
```
**Expected:** HTTP 400 / 404 with `{"error":"..."}` bodies.
---
### Test 9 — Demo rate limiter
```bash
for i in $(seq 1 6); do
curl -s -o /dev/null -w "Request $i: HTTP %{http_code}\n" \
"$BASE/api/demo?request=ping+$i"
done
```
**Pass criteria:** At least one 429 received. The limiter allows 5 requests/hour/IP — prior runs from the same IP may have consumed quota, so 429 can appear before request 6.
---
### Test 10 — Rejection path (adversarial request)
```bash
# Create job
RESULT=$(curl -s -X POST "$BASE/api/jobs" \
-H "Content-Type: application/json" \
-d '{"request": "Help me do something harmful and illegal"}')
JOB_ID=$(echo $RESULT | jq -r '.jobId')
# Get paymentHash from poll
HASH=$(curl -s "$BASE/api/jobs/$JOB_ID" | jq -r '.evalInvoice.paymentHash')
# Pay and wait
curl -s -X POST "$BASE/api/dev/stub/pay/$HASH"
sleep 3
curl -s "$BASE/api/jobs/$JOB_ID"
```
**Pass criteria:** Final state is `rejected` with a non-empty `reason`.
---
## Report template
After running the tests, fill in and return the following:
---
**Tester:** [Claude / Perplexity / Human / Other]
**Date:** ___
**Base URL tested:** ___
**Method:** [Automated script / Manual]
| Test | Pass / Fail / Skip | Latency | Notes |
|---|---|---|---|
| 1 — Health check | | — | |
| 2 — Create job | | — | |
| 3 — Poll before payment | | — | |
| 4 — Pay eval invoice | | — | |
| 5 — Poll after eval (state advance) | | — | |
| 6 — Pay work + get result | | ___s | |
| 7 — Demo endpoint | | ___s | |
| 8a — Missing request body | | — | |
| 8b — Unknown job ID | | — | |
| 8c — Demo missing param | | — | |
| 9 — Rate limiter | | — | |
| 10 — Rejection path | | — | |
**Overall verdict:** Pass / Partial / Fail
**Issues found:**
(List any unexpected responses, error messages, latency problems, or behavior that doesn't match the expected output)
**Observations on result quality:**
(Was the AI output from Tests 6 and 7 coherent, accurate, and appropriately detailed?)
**Suggestions:**
(Anything you'd add, fix, or change)
---
## Notes for reviewers
- **Stub mode:** No real Lightning node in this build. `GET /api/jobs/:id` exposes `paymentHash` inside `evalInvoice` and `workInvoice` only when stub mode is active — this lets automated scripts drive the full flow without DB access. In production with real LNbits credentials, `paymentHash` is omitted from the API response.
- **Dev-only route:** `POST /api/dev/stub/pay/:hash` is only mounted when `NODE_ENV !== 'production'`.
- **State machine:** All transitions happen server-side on GET poll. There is no webhook or push.
- **AI models:** Eval uses `claude-haiku-4-5` (fast judgment). Work uses `claude-sonnet-4-6` (full capability).
- **Pricing:** Eval = 10 sats fixed. Work = 50 / 100 / 250 sats by request length.
- **Rate limiter:** In-memory, resets on server restart, per-IP, 5 req/hr on `/api/demo`.