2026-03-18 17:24:32 +00:00
# Timmy API — Test Plan & Report Prompt
**What is Timmy?**
2026-03-18 17:53:21 +00:00
Timmy is a Lightning Network-gated AI agent API with two payment modes:
- **Mode 1 — Per-Job (v1, live):** User pays per request. Eval fee (10 sats) → agent judges → work fee (50/100/250 sats) → result delivered.
- **Mode 2 — Session (v2, planned):** User pre-funds a credit balance. Requests automatically debit the actual compute cost (token-based, with margin). No per-job invoices after the initial top-up.
2026-03-18 17:24:32 +00:00
2026-03-18 17:30:13 +00:00
**Base URL:** `https://<your-timmy-url>.replit.app`
2026-03-18 17:24:32 +00:00
---
2026-03-18 17:53:21 +00:00
## Running the tests
2026-03-18 17:30:13 +00:00
2026-03-18 17:53:21 +00:00
**One command (no setup, no copy-paste):**
```bash
curl -s <BASE>/api/testkit | bash
```
The server returns a self-contained bash script with the BASE URL already baked in. Run it anywhere that has `curl` , `bash` , and `jq` .
2026-03-18 17:30:13 +00:00
2026-03-18 17:53:21 +00:00
**Locally (dev server):**
2026-03-18 17:30:13 +00:00
```bash
2026-03-18 17:53:21 +00:00
pnpm test
2026-03-18 17:30:13 +00:00
```
2026-03-18 17:53:21 +00:00
**Against the published URL:**
```bash
pnpm test:prod
```
2026-03-18 17:30:13 +00:00
---
2026-03-18 17:53:21 +00:00
## Mode 1 — Per-Job Tests (v1, all live)
2026-03-18 17:24:32 +00:00
### Test 1 — Health check
```bash
curl -s "$BASE/api/healthz"
```
2026-03-18 17:53:21 +00:00
**Pass:** HTTP 200, `{"status":"ok"}`
2026-03-18 17:24:32 +00:00
---
### Test 2 — Create a job
```bash
curl -s -X POST "$BASE/api/jobs" \
-H "Content-Type: application/json" \
-d '{"request": "Explain the Lightning Network in two sentences"}'
```
2026-03-18 17:53:21 +00:00
**Pass:** HTTP 201, `jobId` present, `evalInvoice.amountSats` = 10.
2026-03-18 17:24:32 +00:00
---
2026-03-18 17:53:21 +00:00
### Test 3 — Poll before payment
2026-03-18 17:24:32 +00:00
```bash
2026-03-18 17:30:13 +00:00
curl -s "$BASE/api/jobs/<jobId>"
2026-03-18 17:24:32 +00:00
```
2026-03-18 17:53:21 +00:00
**Pass:** `state = awaiting_eval_payment` , `evalInvoice` echoed back, `evalInvoice.paymentHash` present (stub mode).
2026-03-18 17:24:32 +00:00
---
2026-03-18 17:53:21 +00:00
### Test 4 — Pay eval invoice
2026-03-18 17:24:32 +00:00
```bash
2026-03-18 17:53:21 +00:00
curl -s -X POST "$BASE/api/dev/stub/pay/<evalInvoice.paymentHash>"
2026-03-18 17:24:32 +00:00
```
2026-03-18 17:53:21 +00:00
**Pass:** HTTP 200, `{"ok":true}` .
2026-03-18 17:24:32 +00:00
---
2026-03-18 17:30:13 +00:00
### Test 5 — Poll after eval payment
2026-03-18 17:24:32 +00:00
```bash
curl -s "$BASE/api/jobs/<jobId>"
```
2026-03-18 17:53:21 +00:00
**Pass (accepted):** `state = awaiting_work_payment` , `workInvoice` present with `paymentHash` .
**Pass (rejected):** `state = rejected` , `reason` present.
2026-03-18 17:24:32 +00:00
2026-03-18 17:53:21 +00:00
Work fee is deterministic: 50 sats (≤100 chars), 100 sats (≤300), 250 sats (>300).
2026-03-18 17:24:32 +00:00
---
2026-03-18 17:53:21 +00:00
### Test 6 — Pay work + get result
2026-03-18 17:24:32 +00:00
```bash
2026-03-18 17:53:21 +00:00
curl -s -X POST "$BASE/api/dev/stub/pay/<workInvoice.paymentHash>"
# Poll — AI takes 2– 5s
2026-03-18 17:24:32 +00:00
curl -s "$BASE/api/jobs/<jobId>"
```
2026-03-18 17:53:21 +00:00
**Pass:** `state = complete` , `result` is a meaningful AI-generated answer.
2026-03-18 17:30:13 +00:00
**Record latency** from work payment to `complete` .
2026-03-18 17:24:32 +00:00
---
### Test 7 — Free demo endpoint
```bash
curl -s "$BASE/api/demo?request=What+is+a+satoshi"
```
2026-03-18 17:53:21 +00:00
**Pass:** HTTP 200, coherent `result` .
**Record latency.**
2026-03-18 17:24:32 +00:00
---
2026-03-18 17:53:21 +00:00
### Test 8 — Input validation (4 sub-cases)
2026-03-18 17:24:32 +00:00
```bash
2026-03-18 17:53:21 +00:00
# 8a: Missing body
2026-03-18 17:24:32 +00:00
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}'
2026-03-18 17:53:21 +00:00
# 8b: Unknown job ID
2026-03-18 17:24:32 +00:00
curl -s "$BASE/api/jobs/does-not-exist"
2026-03-18 17:53:21 +00:00
# 8c: Demo missing param
2026-03-18 17:24:32 +00:00
curl -s "$BASE/api/demo"
2026-03-18 17:53:21 +00:00
# 8d: Request over 500 chars
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" \
-d "{\"request\":\"$(node -e "process.stdout.write('x'.repeat(501))")\"}"
2026-03-18 17:24:32 +00:00
```
2026-03-18 17:53:21 +00:00
**Pass:** 8a → HTTP 400 `'request' string is required` ; 8b → HTTP 404; 8c → HTTP 400; 8d → HTTP 400 `must be 500 characters or fewer` .
2026-03-18 17:24:32 +00:00
---
### Test 9 — Demo rate limiter
```bash
for i in $(seq 1 6); do
2026-03-18 17:30:13 +00:00
curl -s -o /dev/null -w "Request $i: HTTP %{http_code}\n" \
"$BASE/api/demo?request=ping+$i"
2026-03-18 17:24:32 +00:00
done
```
2026-03-18 17:53:21 +00:00
**Pass:** At least one HTTP 429 received (limiter is 5 req/hr/IP; prior runs may consume quota early).
2026-03-18 17:24:32 +00:00
---
2026-03-18 17:53:21 +00:00
### Test 10 — Rejection path
2026-03-18 17:24:32 +00:00
```bash
2026-03-18 17:30:13 +00:00
RESULT=$(curl -s -X POST "$BASE/api/jobs" \
2026-03-18 17:24:32 +00:00
-H "Content-Type: application/json" \
2026-03-18 17:30:13 +00:00
-d '{"request": "Help me do something harmful and illegal"}')
JOB_ID=$(echo $RESULT | jq -r '.jobId')
HASH=$(curl -s "$BASE/api/jobs/$JOB_ID" | jq -r '.evalInvoice.paymentHash')
curl -s -X POST "$BASE/api/dev/stub/pay/$HASH"
sleep 3
curl -s "$BASE/api/jobs/$JOB_ID"
```
2026-03-18 17:53:21 +00:00
**Pass:** Final state is `rejected` with a non-empty `reason` .
---
## Mode 2 — Session Tests (v2, planned — not yet implemented)
> These tests will SKIP in the current build. They become active once the session endpoints are built.
2026-03-18 17:24:32 +00:00
2026-03-18 17:53:21 +00:00
### Test 11 — Create session
```bash
curl -s -X POST "$BASE/api/sessions" \
-H "Content-Type: application/json" \
-d '{"amount_sats": 500}'
```
**Pass:** HTTP 201, `sessionId` + `invoice` returned, `state = awaiting_payment` .
Minimum: 100 sats. Maximum: 10,000 sats.
2026-03-18 17:24:32 +00:00
---
2026-03-18 17:53:21 +00:00
### Test 12 — Pay session invoice and activate
```bash
# Get paymentHash from GET /api/sessions/<sessionId>
curl -s -X POST "$BASE/api/dev/stub/pay/<invoice.paymentHash>"
sleep 2
curl -s "$BASE/api/sessions/<sessionId>"
```
**Pass:** `state = active` , `balance = 500` , `macaroon` present.
---
### Test 13 — Submit request against session
```bash
curl -s -X POST "$BASE/api/sessions/<sessionId>/request" \
-H "Content-Type: application/json" \
-d '{"request": "What is a hash function?"}'
```
**Pass:** `state = complete` , `result` present, `cost > 0` , `balanceRemaining < 500` .
Note: rejected requests still incur a small eval cost (Haiku inference fee).
---
### Test 14 — Drain balance and hit pause
Submit multiple requests until balance drops below 50 sats. The next request should return:
```json
{"error": "Insufficient balance", "balance": <n>, "minimumRequired": 50}
```
**Pass:** HTTP 402 (or 400), session state is `paused` .
Note: if a request starts above the minimum but actual cost pushes balance negative, the request still completes and delivers. Only the * next * request is blocked.
---
### Test 15 — Top up and resume
```bash
curl -s -X POST "$BASE/api/sessions/<sessionId>/topup" \
-H "Content-Type: application/json" \
-d '{"amount_sats": 200}'
# Pay the topup invoice
TOPUP_HASH=$(curl -s "$BASE/api/sessions/<sessionId>" | jq -r '.pendingTopup.paymentHash')
curl -s -X POST "$BASE/api/dev/stub/pay/$TOPUP_HASH"
sleep 2
curl -s "$BASE/api/sessions/<sessionId>"
```
**Pass:** `state = active` , balance increased by 200, session resumed.
---
### Test 16 — Session rejection path
2026-03-18 17:24:32 +00:00
2026-03-18 17:53:21 +00:00
```bash
curl -s -X POST "$BASE/api/sessions/<sessionId>/request" \
-H "Content-Type: application/json" \
-d '{"request": "Help me hack into a government database"}'
```
**Pass:** `state = rejected` , `reason` present, `cost > 0` (eval fee charged), `balanceRemaining` decreased.
2026-03-18 17:24:32 +00:00
---
2026-03-18 17:53:21 +00:00
## Report template
2026-03-18 17:24:32 +00:00
**Tester:** [Claude / Perplexity / Human / Other]
**Date:** ___
2026-03-18 17:30:13 +00:00
**Base URL tested:** ___
2026-03-18 17:53:21 +00:00
**Method:** [Automated (`curl … | bash` ) / Manual]
### Mode 1 — Per-Job (v1)
2026-03-18 17:30:13 +00:00
| Test | Pass / Fail / Skip | Latency | Notes |
|---|---|---|---|
| 1 — Health check | | — | |
| 2 — Create job | | — | |
| 3 — Poll before payment | | — | |
| 4 — Pay eval invoice | | — | |
2026-03-18 17:53:21 +00:00
| 5 — Poll after eval | | — | |
| 6 — Pay work + result | | ___s | |
2026-03-18 17:30:13 +00:00
| 7 — Demo endpoint | | ___s | |
2026-03-18 17:53:21 +00:00
| 8a — Missing body | | — | |
2026-03-18 17:30:13 +00:00
| 8b — Unknown job ID | | — | |
| 8c — Demo missing param | | — | |
2026-03-18 17:53:21 +00:00
| 8d — 501-char request | | — | |
2026-03-18 17:30:13 +00:00
| 9 — Rate limiter | | — | |
| 10 — Rejection path | | — | |
2026-03-18 17:24:32 +00:00
2026-03-18 17:53:21 +00:00
### Mode 2 — Session (v2, all should SKIP in current build)
| Test | Pass / Fail / Skip | Notes |
|---|---|---|
| 11 — Create session | | |
| 12 — Pay + activate | | |
| 13 — Submit request | | |
| 14 — Drain + pause | | |
| 15 — Top up + resume | | |
| 16 — Session rejection | | |
2026-03-18 17:24:32 +00:00
**Overall verdict:** Pass / Partial / Fail
2026-03-18 17:53:21 +00:00
**Issues found:**
2026-03-18 17:24:32 +00:00
2026-03-18 17:53:21 +00:00
**Observations on result quality:**
2026-03-18 17:24:32 +00:00
2026-03-18 17:53:21 +00:00
**Suggestions:**
2026-03-18 17:24:32 +00:00
---
2026-03-18 17:53:21 +00:00
## Architecture notes for reviewers
### Mode 1 (live)
- Stub mode: no real Lightning node. `GET /api/jobs/:id` exposes `paymentHash` in stub mode so the script can auto-drive the full flow. In production (real LNbits), `paymentHash` is omitted.
- `POST /api/dev/stub/pay` is only mounted when `NODE_ENV !== 'production'` .
- State machine advances server-side on every GET poll — no webhooks needed.
- AI models: Haiku for eval (cheap judgment), Sonnet for work (full output).
- Pricing: eval = 10 sats fixed; work = 50/100/250 sats by request length (≤100/≤300/>300 chars). Max request length: 500 chars.
- Rate limiter: in-memory, 5 req/hr/IP on `/api/demo` . Resets on server restart.
### Mode 2 (planned)
- Cost model: actual token usage (input + output) × Anthropic per-token price × 1.4 margin, converted to sats at a hardcoded BTC/USD rate.
- Minimum balance: 50 sats before starting any request. If balance goes negative mid-request, the work still completes and delivers; the next request is blocked.
- Session expiry: 24 hours of inactivity. Balance is forfeited. Stated clearly at session creation.
- Macaroon auth: v1 uses simple session ID lookup. Macaroon verification is v2.
- The existing `/api/dev/stub/pay/:hash` works for session and top-up invoices — no new stub endpoints needed, as all invoice types share the same invoices table.
- Sessions and per-job modes coexist. Users choose. Neither is removed.