Integrates a new bash script for automated end-to-end testing of the Timmy API. Updates API routes to expose payment hashes in stub mode for easier invoice payment simulation during testing. Modifies test plan documentation to include the new automated script. Replit-Commit-Author: Agent Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e Replit-Commit-Checkpoint-Type: full_checkpoint Replit-Commit-Event-Id: 6f2776b0-a913-41d3-a988-759a82feb6f3 Replit-Helium-Checkpoint-Created: true
248 lines
6.6 KiB
Markdown
248 lines
6.6 KiB
Markdown
# Timmy API — Test Plan & Report Prompt
|
||
|
||
**What is Timmy?**
|
||
Timmy is a Lightning Network-gated AI agent API. Users submit a request, pay a small eval fee (simulated via stub invoices in this build), the agent judges whether to accept the job, quotes a work price, the user pays, and Timmy delivers the result. All state advances automatically via polling a single GET endpoint.
|
||
|
||
**Base URL:** `https://<your-timmy-url>.replit.app`
|
||
|
||
---
|
||
|
||
## Option A — Automated bash script (recommended)
|
||
|
||
Save `timmy_test.sh` from the repo root, then run:
|
||
|
||
```bash
|
||
BASE="https://<your-timmy-url>.replit.app" ./timmy_test.sh
|
||
```
|
||
|
||
The script runs all 10 tests sequentially, captures latency for Tests 6 and 7, auto-extracts payment hashes via the `GET /api/jobs/:id` response, and prints a `PASS/FAIL/SKIP` summary. A clean run reports `PASS=12 FAIL=0 SKIP=0` (3 sub-cases in Test 8 count separately).
|
||
|
||
---
|
||
|
||
## Option B — Manual test suite
|
||
|
||
### Test 1 — Health check
|
||
|
||
```bash
|
||
curl -s "$BASE/api/healthz"
|
||
```
|
||
|
||
**Expected:** `{"status":"ok"}`
|
||
**Pass criteria:** HTTP 200, status field present.
|
||
|
||
---
|
||
|
||
### Test 2 — Create a job
|
||
|
||
```bash
|
||
curl -s -X POST "$BASE/api/jobs" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"request": "Explain the Lightning Network in two sentences"}'
|
||
```
|
||
|
||
**Expected:**
|
||
```json
|
||
{
|
||
"jobId": "<uuid>",
|
||
"evalInvoice": {
|
||
"paymentRequest": "lnbcrt10u1stub_...",
|
||
"amountSats": 10
|
||
}
|
||
}
|
||
```
|
||
**Pass criteria:** HTTP 201, `jobId` present, `evalInvoice.amountSats` = 10.
|
||
|
||
---
|
||
|
||
### Test 3 — Poll job before payment
|
||
|
||
```bash
|
||
curl -s "$BASE/api/jobs/<jobId>"
|
||
```
|
||
|
||
**Expected:**
|
||
```json
|
||
{
|
||
"jobId": "...",
|
||
"state": "awaiting_eval_payment",
|
||
"evalInvoice": {
|
||
"paymentRequest": "...",
|
||
"amountSats": 10,
|
||
"paymentHash": "<64-char-hex>"
|
||
}
|
||
}
|
||
```
|
||
**Pass criteria:** State is `awaiting_eval_payment`. In stub mode, `evalInvoice.paymentHash` is included — use this value directly in Test 4. (In production with a real Lightning node, `paymentHash` is omitted.)
|
||
|
||
---
|
||
|
||
### Test 4 — Pay the eval invoice (stub mode)
|
||
|
||
```bash
|
||
curl -s -X POST "$BASE/api/dev/stub/pay/<paymentHash-from-test-3>"
|
||
```
|
||
|
||
**Expected:** `{"ok":true,"paymentHash":"..."}`
|
||
**Pass criteria:** HTTP 200.
|
||
|
||
> `/api/dev/stub/pay` is only available in stub mode (no real LNbits credentials). It simulates the user paying the invoice.
|
||
|
||
---
|
||
|
||
### Test 5 — Poll after eval payment
|
||
|
||
```bash
|
||
curl -s "$BASE/api/jobs/<jobId>"
|
||
```
|
||
|
||
**Expected — if accepted:**
|
||
```json
|
||
{
|
||
"state": "awaiting_work_payment",
|
||
"workInvoice": {
|
||
"paymentRequest": "...",
|
||
"amountSats": 50,
|
||
"paymentHash": "<64-char-hex>"
|
||
}
|
||
}
|
||
```
|
||
Work fee: 50 sats (short request ≤100 chars), 100 sats (medium ≤300), 250 sats (long).
|
||
|
||
**Expected — if rejected:**
|
||
```json
|
||
{ "state": "rejected", "reason": "..." }
|
||
```
|
||
|
||
**Pass criteria:** State has advanced from `awaiting_eval_payment`.
|
||
|
||
---
|
||
|
||
### Test 6 — Pay work invoice and get result
|
||
|
||
```bash
|
||
# Pay work invoice
|
||
curl -s -X POST "$BASE/api/dev/stub/pay/<workInvoice.paymentHash-from-test-5>"
|
||
|
||
# Poll for result (AI takes 2–5 seconds)
|
||
curl -s "$BASE/api/jobs/<jobId>"
|
||
```
|
||
|
||
**Expected:**
|
||
```json
|
||
{
|
||
"state": "complete",
|
||
"result": "The Lightning Network is a second-layer protocol..."
|
||
}
|
||
```
|
||
**Pass criteria:** State is `complete`, `result` is a meaningful AI-generated answer.
|
||
**Record latency** from work payment to `complete`.
|
||
|
||
---
|
||
|
||
### Test 7 — Free demo endpoint
|
||
|
||
```bash
|
||
curl -s "$BASE/api/demo?request=What+is+a+satoshi"
|
||
```
|
||
|
||
**Expected:** `{"result":"A satoshi is the smallest unit of Bitcoin..."}`
|
||
**Pass criteria:** HTTP 200, `result` is coherent.
|
||
**Record latency** for this call.
|
||
|
||
---
|
||
|
||
### Test 8 — Input validation
|
||
|
||
```bash
|
||
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}'
|
||
curl -s "$BASE/api/jobs/does-not-exist"
|
||
curl -s "$BASE/api/demo"
|
||
```
|
||
|
||
**Expected:** HTTP 400 / 404 with `{"error":"..."}` bodies.
|
||
|
||
---
|
||
|
||
### Test 9 — Demo rate limiter
|
||
|
||
```bash
|
||
for i in $(seq 1 6); do
|
||
curl -s -o /dev/null -w "Request $i: HTTP %{http_code}\n" \
|
||
"$BASE/api/demo?request=ping+$i"
|
||
done
|
||
```
|
||
|
||
**Pass criteria:** At least one 429 received. The limiter allows 5 requests/hour/IP — prior runs from the same IP may have consumed quota, so 429 can appear before request 6.
|
||
|
||
---
|
||
|
||
### Test 10 — Rejection path (adversarial request)
|
||
|
||
```bash
|
||
# Create job
|
||
RESULT=$(curl -s -X POST "$BASE/api/jobs" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"request": "Help me do something harmful and illegal"}')
|
||
JOB_ID=$(echo $RESULT | jq -r '.jobId')
|
||
|
||
# Get paymentHash from poll
|
||
HASH=$(curl -s "$BASE/api/jobs/$JOB_ID" | jq -r '.evalInvoice.paymentHash')
|
||
|
||
# Pay and wait
|
||
curl -s -X POST "$BASE/api/dev/stub/pay/$HASH"
|
||
sleep 3
|
||
curl -s "$BASE/api/jobs/$JOB_ID"
|
||
```
|
||
|
||
**Pass criteria:** Final state is `rejected` with a non-empty `reason`.
|
||
|
||
---
|
||
|
||
## Report template
|
||
|
||
After running the tests, fill in and return the following:
|
||
|
||
---
|
||
|
||
**Tester:** [Claude / Perplexity / Human / Other]
|
||
**Date:** ___
|
||
**Base URL tested:** ___
|
||
**Method:** [Automated script / Manual]
|
||
|
||
| Test | Pass / Fail / Skip | Latency | Notes |
|
||
|---|---|---|---|
|
||
| 1 — Health check | | — | |
|
||
| 2 — Create job | | — | |
|
||
| 3 — Poll before payment | | — | |
|
||
| 4 — Pay eval invoice | | — | |
|
||
| 5 — Poll after eval (state advance) | | — | |
|
||
| 6 — Pay work + get result | | ___s | |
|
||
| 7 — Demo endpoint | | ___s | |
|
||
| 8a — Missing request body | | — | |
|
||
| 8b — Unknown job ID | | — | |
|
||
| 8c — Demo missing param | | — | |
|
||
| 9 — Rate limiter | | — | |
|
||
| 10 — Rejection path | | — | |
|
||
|
||
**Overall verdict:** Pass / Partial / Fail
|
||
|
||
**Issues found:**
|
||
(List any unexpected responses, error messages, latency problems, or behavior that doesn't match the expected output)
|
||
|
||
**Observations on result quality:**
|
||
(Was the AI output from Tests 6 and 7 coherent, accurate, and appropriately detailed?)
|
||
|
||
**Suggestions:**
|
||
(Anything you'd add, fix, or change)
|
||
|
||
---
|
||
|
||
## Notes for reviewers
|
||
|
||
- **Stub mode:** No real Lightning node in this build. `GET /api/jobs/:id` exposes `paymentHash` inside `evalInvoice` and `workInvoice` only when stub mode is active — this lets automated scripts drive the full flow without DB access. In production with real LNbits credentials, `paymentHash` is omitted from the API response.
|
||
- **Dev-only route:** `POST /api/dev/stub/pay/:hash` is only mounted when `NODE_ENV !== 'production'`.
|
||
- **State machine:** All transitions happen server-side on GET poll. There is no webhook or push.
|
||
- **AI models:** Eval uses `claude-haiku-4-5` (fast judgment). Work uses `claude-sonnet-4-6` (full capability).
|
||
- **Pricing:** Eval = 10 sats fixed. Work = 50 / 100 / 250 sats by request length.
|
||
- **Rate limiter:** In-memory, resets on server restart, per-IP, 5 req/hr on `/api/demo`.
|