Integrates a new bash script for automated end-to-end testing of the Timmy API. Updates API routes to expose payment hashes in stub mode for easier invoice payment simulation during testing. Modifies test plan documentation to include the new automated script. Replit-Commit-Author: Agent Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e Replit-Commit-Checkpoint-Type: full_checkpoint Replit-Commit-Event-Id: 6f2776b0-a913-41d3-a988-759a82feb6f3 Replit-Helium-Checkpoint-Created: true
6.6 KiB
Timmy API — Test Plan & Report Prompt
What is Timmy?
Timmy is a Lightning Network-gated AI agent API. Users submit a request, pay a small eval fee (simulated via stub invoices in this build), the agent judges whether to accept the job, quotes a work price, the user pays, and Timmy delivers the result. All state advances automatically via polling a single GET endpoint.
Base URL: https://<your-timmy-url>.replit.app
Option A — Automated bash script (recommended)
Save timmy_test.sh from the repo root, then run:
BASE="https://<your-timmy-url>.replit.app" ./timmy_test.sh
The script runs all 10 tests sequentially, captures latency for Tests 6 and 7, auto-extracts payment hashes via the GET /api/jobs/:id response, and prints a PASS/FAIL/SKIP summary. A clean run reports PASS=12 FAIL=0 SKIP=0 (3 sub-cases in Test 8 count separately).
Option B — Manual test suite
Test 1 — Health check
curl -s "$BASE/api/healthz"
Expected: {"status":"ok"}
Pass criteria: HTTP 200, status field present.
Test 2 — Create a job
curl -s -X POST "$BASE/api/jobs" \
-H "Content-Type: application/json" \
-d '{"request": "Explain the Lightning Network in two sentences"}'
Expected:
{
"jobId": "<uuid>",
"evalInvoice": {
"paymentRequest": "lnbcrt10u1stub_...",
"amountSats": 10
}
}
Pass criteria: HTTP 201, jobId present, evalInvoice.amountSats = 10.
Test 3 — Poll job before payment
curl -s "$BASE/api/jobs/<jobId>"
Expected:
{
"jobId": "...",
"state": "awaiting_eval_payment",
"evalInvoice": {
"paymentRequest": "...",
"amountSats": 10,
"paymentHash": "<64-char-hex>"
}
}
Pass criteria: State is awaiting_eval_payment. In stub mode, evalInvoice.paymentHash is included — use this value directly in Test 4. (In production with a real Lightning node, paymentHash is omitted.)
Test 4 — Pay the eval invoice (stub mode)
curl -s -X POST "$BASE/api/dev/stub/pay/<paymentHash-from-test-3>"
Expected: {"ok":true,"paymentHash":"..."}
Pass criteria: HTTP 200.
/api/dev/stub/payis only available in stub mode (no real LNbits credentials). It simulates the user paying the invoice.
Test 5 — Poll after eval payment
curl -s "$BASE/api/jobs/<jobId>"
Expected — if accepted:
{
"state": "awaiting_work_payment",
"workInvoice": {
"paymentRequest": "...",
"amountSats": 50,
"paymentHash": "<64-char-hex>"
}
}
Work fee: 50 sats (short request ≤100 chars), 100 sats (medium ≤300), 250 sats (long).
Expected — if rejected:
{ "state": "rejected", "reason": "..." }
Pass criteria: State has advanced from awaiting_eval_payment.
Test 6 — Pay work invoice and get result
# Pay work invoice
curl -s -X POST "$BASE/api/dev/stub/pay/<workInvoice.paymentHash-from-test-5>"
# Poll for result (AI takes 2–5 seconds)
curl -s "$BASE/api/jobs/<jobId>"
Expected:
{
"state": "complete",
"result": "The Lightning Network is a second-layer protocol..."
}
Pass criteria: State is complete, result is a meaningful AI-generated answer.
Record latency from work payment to complete.
Test 7 — Free demo endpoint
curl -s "$BASE/api/demo?request=What+is+a+satoshi"
Expected: {"result":"A satoshi is the smallest unit of Bitcoin..."}
Pass criteria: HTTP 200, result is coherent.
Record latency for this call.
Test 8 — Input validation
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}'
curl -s "$BASE/api/jobs/does-not-exist"
curl -s "$BASE/api/demo"
Expected: HTTP 400 / 404 with {"error":"..."} bodies.
Test 9 — Demo rate limiter
for i in $(seq 1 6); do
curl -s -o /dev/null -w "Request $i: HTTP %{http_code}\n" \
"$BASE/api/demo?request=ping+$i"
done
Pass criteria: At least one 429 received. The limiter allows 5 requests/hour/IP — prior runs from the same IP may have consumed quota, so 429 can appear before request 6.
Test 10 — Rejection path (adversarial request)
# Create job
RESULT=$(curl -s -X POST "$BASE/api/jobs" \
-H "Content-Type: application/json" \
-d '{"request": "Help me do something harmful and illegal"}')
JOB_ID=$(echo $RESULT | jq -r '.jobId')
# Get paymentHash from poll
HASH=$(curl -s "$BASE/api/jobs/$JOB_ID" | jq -r '.evalInvoice.paymentHash')
# Pay and wait
curl -s -X POST "$BASE/api/dev/stub/pay/$HASH"
sleep 3
curl -s "$BASE/api/jobs/$JOB_ID"
Pass criteria: Final state is rejected with a non-empty reason.
Report template
After running the tests, fill in and return the following:
Tester: [Claude / Perplexity / Human / Other]
Date: ___
Base URL tested: ___
Method: [Automated script / Manual]
| Test | Pass / Fail / Skip | Latency | Notes |
|---|---|---|---|
| 1 — Health check | — | ||
| 2 — Create job | — | ||
| 3 — Poll before payment | — | ||
| 4 — Pay eval invoice | — | ||
| 5 — Poll after eval (state advance) | — | ||
| 6 — Pay work + get result | ___s | ||
| 7 — Demo endpoint | ___s | ||
| 8a — Missing request body | — | ||
| 8b — Unknown job ID | — | ||
| 8c — Demo missing param | — | ||
| 9 — Rate limiter | — | ||
| 10 — Rejection path | — |
Overall verdict: Pass / Partial / Fail
Issues found:
(List any unexpected responses, error messages, latency problems, or behavior that doesn't match the expected output)
Observations on result quality:
(Was the AI output from Tests 6 and 7 coherent, accurate, and appropriately detailed?)
Suggestions:
(Anything you'd add, fix, or change)
Notes for reviewers
- Stub mode: No real Lightning node in this build.
GET /api/jobs/:idexposespaymentHashinsideevalInvoiceandworkInvoiceonly when stub mode is active — this lets automated scripts drive the full flow without DB access. In production with real LNbits credentials,paymentHashis omitted from the API response. - Dev-only route:
POST /api/dev/stub/pay/:hashis only mounted whenNODE_ENV !== 'production'. - State machine: All transitions happen server-side on GET poll. There is no webhook or push.
- AI models: Eval uses
claude-haiku-4-5(fast judgment). Work usesclaude-sonnet-4-6(full capability). - Pricing: Eval = 10 sats fixed. Work = 50 / 100 / 250 sats by request length.
- Rate limiter: In-memory, resets on server restart, per-IP, 5 req/hr on
/api/demo.