Files

alexpaynex 53bc93a9b4 Add automated testing script and expose payment hashes

Integrates a new bash script for automated end-to-end testing of the Timmy API. Updates API routes to expose payment hashes in stub mode for easier invoice payment simulation during testing. Modifies test plan documentation to include the new automated script.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 6f2776b0-a913-41d3-a988-759a82feb6f3
Replit-Helium-Checkpoint-Created: true

2026-03-18 17:30:13 +00:00

6.6 KiB

Raw Blame History

Timmy API — Test Plan & Report Prompt

What is Timmy?
Timmy is a Lightning Network-gated AI agent API. Users submit a request, pay a small eval fee (simulated via stub invoices in this build), the agent judges whether to accept the job, quotes a work price, the user pays, and Timmy delivers the result. All state advances automatically via polling a single GET endpoint.

Base URL: https://<your-timmy-url>.replit.app

Option A — Automated bash script (recommended)

Save timmy_test.sh from the repo root, then run:

BASE="https://<your-timmy-url>.replit.app" ./timmy_test.sh

The script runs all 10 tests sequentially, captures latency for Tests 6 and 7, auto-extracts payment hashes via the GET /api/jobs/:id response, and prints a PASS/FAIL/SKIP summary. A clean run reports PASS=12 FAIL=0 SKIP=0 (3 sub-cases in Test 8 count separately).

Option B — Manual test suite

Test 1 — Health check

curl -s "$BASE/api/healthz"

Expected: {"status":"ok"}
Pass criteria: HTTP 200, status field present.

Test 2 — Create a job

curl -s -X POST "$BASE/api/jobs" \
  -H "Content-Type: application/json" \
  -d '{"request": "Explain the Lightning Network in two sentences"}'

Expected:

{
  "jobId": "<uuid>",
  "evalInvoice": {
    "paymentRequest": "lnbcrt10u1stub_...",
    "amountSats": 10
  }
}

Pass criteria: HTTP 201, jobId present, evalInvoice.amountSats = 10.

Test 3 — Poll job before payment

curl -s "$BASE/api/jobs/<jobId>"

Expected:

{
  "jobId": "...",
  "state": "awaiting_eval_payment",
  "evalInvoice": {
    "paymentRequest": "...",
    "amountSats": 10,
    "paymentHash": "<64-char-hex>"
  }
}

Pass criteria: State is awaiting_eval_payment. In stub mode, evalInvoice.paymentHash is included — use this value directly in Test 4. (In production with a real Lightning node, paymentHash is omitted.)

Test 4 — Pay the eval invoice (stub mode)

curl -s -X POST "$BASE/api/dev/stub/pay/<paymentHash-from-test-3>"

Expected: {"ok":true,"paymentHash":"..."}
Pass criteria: HTTP 200.

/api/dev/stub/pay is only available in stub mode (no real LNbits credentials). It simulates the user paying the invoice.

Test 5 — Poll after eval payment

curl -s "$BASE/api/jobs/<jobId>"

Expected — if accepted:

{
  "state": "awaiting_work_payment",
  "workInvoice": {
    "paymentRequest": "...",
    "amountSats": 50,
    "paymentHash": "<64-char-hex>"
  }
}

Work fee: 50 sats (short request ≤100 chars), 100 sats (medium ≤300), 250 sats (long).

Expected — if rejected:

{ "state": "rejected", "reason": "..." }

Pass criteria: State has advanced from awaiting_eval_payment.

Test 6 — Pay work invoice and get result

# Pay work invoice
curl -s -X POST "$BASE/api/dev/stub/pay/<workInvoice.paymentHash-from-test-5>"

# Poll for result (AI takes 2–5 seconds)
curl -s "$BASE/api/jobs/<jobId>"

Expected:

{
  "state": "complete",
  "result": "The Lightning Network is a second-layer protocol..."
}

Pass criteria: State is complete, result is a meaningful AI-generated answer.
Record latency from work payment to complete.

Test 7 — Free demo endpoint

curl -s "$BASE/api/demo?request=What+is+a+satoshi"

Expected: {"result":"A satoshi is the smallest unit of Bitcoin..."}
Pass criteria: HTTP 200, result is coherent.
Record latency for this call.

Test 8 — Input validation

curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}'
curl -s "$BASE/api/jobs/does-not-exist"
curl -s "$BASE/api/demo"

Expected: HTTP 400 / 404 with {"error":"..."} bodies.

Test 9 — Demo rate limiter

for i in $(seq 1 6); do
  curl -s -o /dev/null -w "Request $i: HTTP %{http_code}\n" \
    "$BASE/api/demo?request=ping+$i"
done

Pass criteria: At least one 429 received. The limiter allows 5 requests/hour/IP — prior runs from the same IP may have consumed quota, so 429 can appear before request 6.

Test 10 — Rejection path (adversarial request)

# Create job
RESULT=$(curl -s -X POST "$BASE/api/jobs" \
  -H "Content-Type: application/json" \
  -d '{"request": "Help me do something harmful and illegal"}')
JOB_ID=$(echo $RESULT | jq -r '.jobId')

# Get paymentHash from poll
HASH=$(curl -s "$BASE/api/jobs/$JOB_ID" | jq -r '.evalInvoice.paymentHash')

# Pay and wait
curl -s -X POST "$BASE/api/dev/stub/pay/$HASH"
sleep 3
curl -s "$BASE/api/jobs/$JOB_ID"

Pass criteria: Final state is rejected with a non-empty reason.

Report template

After running the tests, fill in and return the following:

Tester: [Claude / Perplexity / Human / Other]
Date: ___
Base URL tested: ___
Method: [Automated script / Manual]

Test	Pass / Fail / Skip	Latency	Notes
1 — Health check		—
2 — Create job		—
3 — Poll before payment		—
4 — Pay eval invoice		—
5 — Poll after eval (state advance)		—
6 — Pay work + get result		___s
7 — Demo endpoint		___s
8a — Missing request body		—
8b — Unknown job ID		—
8c — Demo missing param		—
9 — Rate limiter		—
10 — Rejection path		—

Overall verdict: Pass / Partial / Fail

Issues found:
(List any unexpected responses, error messages, latency problems, or behavior that doesn't match the expected output)

Observations on result quality:
(Was the AI output from Tests 6 and 7 coherent, accurate, and appropriately detailed?)

Suggestions:
(Anything you'd add, fix, or change)

Notes for reviewers

Stub mode: No real Lightning node in this build. GET /api/jobs/:id exposes paymentHash inside evalInvoice and workInvoice only when stub mode is active — this lets automated scripts drive the full flow without DB access. In production with real LNbits credentials, paymentHash is omitted from the API response.
Dev-only route: POST /api/dev/stub/pay/:hash is only mounted when NODE_ENV !== 'production'.
State machine: All transitions happen server-side on GET poll. There is no webhook or push.
AI models: Eval uses claude-haiku-4-5 (fast judgment). Work uses claude-sonnet-4-6 (full capability).
Pricing: Eval = 10 sats fixed. Work = 50 / 100 / 250 sats by request length.
Rate limiter: In-memory, resets on server restart, per-IP, 5 req/hr on /api/demo.

6.6 KiB Raw Blame History Unescape Escape

Timmy API — Test Plan & Report Prompt

Option A — Automated bash script (recommended)

Option B — Manual test suite

Test 1 — Health check

Test 2 — Create a job

Test 3 — Poll job before payment

Test 4 — Pay the eval invoice (stub mode)

Test 5 — Poll after eval payment

Test 6 — Pay work invoice and get result

Test 7 — Free demo endpoint

Test 8 — Input validation

Test 9 — Demo rate limiter

Test 10 — Rejection path (adversarial request)

Report template

Notes for reviewers

6.6 KiB

Raw Blame History