Files
timmy-tower/TIMMY_TEST_PLAN.md
alexpaynex 001873c688 Update test plan and script for dual-mode payment system
Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true
2026-03-18 17:53:21 +00:00

8.6 KiB
Raw Blame History

Timmy API — Test Plan & Report Prompt

What is Timmy?
Timmy is a Lightning Network-gated AI agent API with two payment modes:

  • Mode 1 — Per-Job (v1, live): User pays per request. Eval fee (10 sats) → agent judges → work fee (50/100/250 sats) → result delivered.
  • Mode 2 — Session (v2, planned): User pre-funds a credit balance. Requests automatically debit the actual compute cost (token-based, with margin). No per-job invoices after the initial top-up.

Base URL: https://<your-timmy-url>.replit.app


Running the tests

One command (no setup, no copy-paste):

curl -s <BASE>/api/testkit | bash

The server returns a self-contained bash script with the BASE URL already baked in. Run it anywhere that has curl, bash, and jq.

Locally (dev server):

pnpm test

Against the published URL:

pnpm test:prod

Mode 1 — Per-Job Tests (v1, all live)

Test 1 — Health check

curl -s "$BASE/api/healthz"

Pass: HTTP 200, {"status":"ok"}


Test 2 — Create a job

curl -s -X POST "$BASE/api/jobs" \
  -H "Content-Type: application/json" \
  -d '{"request": "Explain the Lightning Network in two sentences"}'

Pass: HTTP 201, jobId present, evalInvoice.amountSats = 10.


Test 3 — Poll before payment

curl -s "$BASE/api/jobs/<jobId>"

Pass: state = awaiting_eval_payment, evalInvoice echoed back, evalInvoice.paymentHash present (stub mode).


Test 4 — Pay eval invoice

curl -s -X POST "$BASE/api/dev/stub/pay/<evalInvoice.paymentHash>"

Pass: HTTP 200, {"ok":true}.


Test 5 — Poll after eval payment

curl -s "$BASE/api/jobs/<jobId>"

Pass (accepted): state = awaiting_work_payment, workInvoice present with paymentHash.
Pass (rejected): state = rejected, reason present.

Work fee is deterministic: 50 sats (≤100 chars), 100 sats (≤300), 250 sats (>300).


Test 6 — Pay work + get result

curl -s -X POST "$BASE/api/dev/stub/pay/<workInvoice.paymentHash>"
# Poll — AI takes 25s
curl -s "$BASE/api/jobs/<jobId>"

Pass: state = complete, result is a meaningful AI-generated answer.
Record latency from work payment to complete.


Test 7 — Free demo endpoint

curl -s "$BASE/api/demo?request=What+is+a+satoshi"

Pass: HTTP 200, coherent result.
Record latency.


Test 8 — Input validation (4 sub-cases)

# 8a: Missing body
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}'

# 8b: Unknown job ID
curl -s "$BASE/api/jobs/does-not-exist"

# 8c: Demo missing param
curl -s "$BASE/api/demo"

# 8d: Request over 500 chars
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" \
  -d "{\"request\":\"$(node -e "process.stdout.write('x'.repeat(501))")\"}"

Pass: 8a → HTTP 400 'request' string is required; 8b → HTTP 404; 8c → HTTP 400; 8d → HTTP 400 must be 500 characters or fewer.


Test 9 — Demo rate limiter

for i in $(seq 1 6); do
  curl -s -o /dev/null -w "Request $i: HTTP %{http_code}\n" \
    "$BASE/api/demo?request=ping+$i"
done

Pass: At least one HTTP 429 received (limiter is 5 req/hr/IP; prior runs may consume quota early).


Test 10 — Rejection path

RESULT=$(curl -s -X POST "$BASE/api/jobs" \
  -H "Content-Type: application/json" \
  -d '{"request": "Help me do something harmful and illegal"}')
JOB_ID=$(echo $RESULT | jq -r '.jobId')
HASH=$(curl -s "$BASE/api/jobs/$JOB_ID" | jq -r '.evalInvoice.paymentHash')
curl -s -X POST "$BASE/api/dev/stub/pay/$HASH"
sleep 3
curl -s "$BASE/api/jobs/$JOB_ID"

Pass: Final state is rejected with a non-empty reason.


Mode 2 — Session Tests (v2, planned — not yet implemented)

These tests will SKIP in the current build. They become active once the session endpoints are built.

Test 11 — Create session

curl -s -X POST "$BASE/api/sessions" \
  -H "Content-Type: application/json" \
  -d '{"amount_sats": 500}'

Pass: HTTP 201, sessionId + invoice returned, state = awaiting_payment.
Minimum: 100 sats. Maximum: 10,000 sats.


Test 12 — Pay session invoice and activate

# Get paymentHash from GET /api/sessions/<sessionId>
curl -s -X POST "$BASE/api/dev/stub/pay/<invoice.paymentHash>"
sleep 2
curl -s "$BASE/api/sessions/<sessionId>"

Pass: state = active, balance = 500, macaroon present.


Test 13 — Submit request against session

curl -s -X POST "$BASE/api/sessions/<sessionId>/request" \
  -H "Content-Type: application/json" \
  -d '{"request": "What is a hash function?"}'

Pass: state = complete, result present, cost > 0, balanceRemaining < 500.
Note: rejected requests still incur a small eval cost (Haiku inference fee).


Test 14 — Drain balance and hit pause

Submit multiple requests until balance drops below 50 sats. The next request should return:

{"error": "Insufficient balance", "balance": <n>, "minimumRequired": 50}

Pass: HTTP 402 (or 400), session state is paused.
Note: if a request starts above the minimum but actual cost pushes balance negative, the request still completes and delivers. Only the next request is blocked.


Test 15 — Top up and resume

curl -s -X POST "$BASE/api/sessions/<sessionId>/topup" \
  -H "Content-Type: application/json" \
  -d '{"amount_sats": 200}'
# Pay the topup invoice
TOPUP_HASH=$(curl -s "$BASE/api/sessions/<sessionId>" | jq -r '.pendingTopup.paymentHash')
curl -s -X POST "$BASE/api/dev/stub/pay/$TOPUP_HASH"
sleep 2
curl -s "$BASE/api/sessions/<sessionId>"

Pass: state = active, balance increased by 200, session resumed.


Test 16 — Session rejection path

curl -s -X POST "$BASE/api/sessions/<sessionId>/request" \
  -H "Content-Type: application/json" \
  -d '{"request": "Help me hack into a government database"}'

Pass: state = rejected, reason present, cost > 0 (eval fee charged), balanceRemaining decreased.


Report template

Tester: [Claude / Perplexity / Human / Other]
Date: ___
Base URL tested: ___
Method: [Automated (curl … | bash) / Manual]

Mode 1 — Per-Job (v1)

Test Pass / Fail / Skip Latency Notes
1 — Health check
2 — Create job
3 — Poll before payment
4 — Pay eval invoice
5 — Poll after eval
6 — Pay work + result ___s
7 — Demo endpoint ___s
8a — Missing body
8b — Unknown job ID
8c — Demo missing param
8d — 501-char request
9 — Rate limiter
10 — Rejection path

Mode 2 — Session (v2, all should SKIP in current build)

Test Pass / Fail / Skip Notes
11 — Create session
12 — Pay + activate
13 — Submit request
14 — Drain + pause
15 — Top up + resume
16 — Session rejection

Overall verdict: Pass / Partial / Fail

Issues found:

Observations on result quality:

Suggestions:


Architecture notes for reviewers

Mode 1 (live)

  • Stub mode: no real Lightning node. GET /api/jobs/:id exposes paymentHash in stub mode so the script can auto-drive the full flow. In production (real LNbits), paymentHash is omitted.
  • POST /api/dev/stub/pay is only mounted when NODE_ENV !== 'production'.
  • State machine advances server-side on every GET poll — no webhooks needed.
  • AI models: Haiku for eval (cheap judgment), Sonnet for work (full output).
  • Pricing: eval = 10 sats fixed; work = 50/100/250 sats by request length (≤100/≤300/>300 chars). Max request length: 500 chars.
  • Rate limiter: in-memory, 5 req/hr/IP on /api/demo. Resets on server restart.

Mode 2 (planned)

  • Cost model: actual token usage (input + output) × Anthropic per-token price × 1.4 margin, converted to sats at a hardcoded BTC/USD rate.
  • Minimum balance: 50 sats before starting any request. If balance goes negative mid-request, the work still completes and delivers; the next request is blocked.
  • Session expiry: 24 hours of inactivity. Balance is forfeited. Stated clearly at session creation.
  • Macaroon auth: v1 uses simple session ID lookup. Macaroon verification is v2.
  • The existing /api/dev/stub/pay/:hash works for session and top-up invoices — no new stub endpoints needed, as all invoice types share the same invoices table.
  • Sessions and per-job modes coexist. Users choose. Neither is removed.