Compare commits
3 Commits
8db4587252
...
1138a100c2
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1138a100c2 | ||
|
|
3f8c67deaa | ||
|
|
be9470ca11 |
@@ -1,252 +1,112 @@
|
||||
# Timmy API — Test Plan & Report Prompt
|
||||
|
||||
**What is Timmy?**
|
||||
Timmy is a Lightning Network-gated AI agent API with two payment modes:
|
||||
**What is Timmy?**
|
||||
Timmy is a Lightning Network-gated AI agent API. Users pay Bitcoin (via Lightning) to submit requests to an AI agent (Claude). Two payment modes:
|
||||
|
||||
- **Mode 1 — Per-Job (v1, live):** User pays per request. Eval fee (10 sats) → agent judges → work fee (50/100/250 sats) → result delivered.
|
||||
- **Mode 2 — Session (v2, planned):** User pre-funds a credit balance. Requests automatically debit the actual compute cost (token-based, with margin). No per-job invoices after the initial top-up.
|
||||
- **Mode 1 — Per-Job (live):** Pay per request. Eval invoice (10 sats fixed) → Haiku judges the request → work invoice (dynamic, token-based) → Sonnet executes → result delivered.
|
||||
- **Mode 2 — Session (live):** Pre-fund a credit balance. Requests automatically debit actual compute cost (eval + work tokens × 1.4 margin, converted to sats at live BTC/USD). No per-job invoices once active.
|
||||
|
||||
**Base URL:** `https://<your-timmy-url>.replit.app`
|
||||
|
||||
---
|
||||
|
||||
## Running the tests
|
||||
|
||||
**One command (no setup, no copy-paste):**
|
||||
```bash
|
||||
curl -s <BASE>/api/testkit | bash
|
||||
**Live base URL:**
|
||||
```
|
||||
The server returns a self-contained bash script with the BASE URL already baked in. Run it anywhere that has `curl`, `bash`, and `jq`.
|
||||
|
||||
**Locally (dev server):**
|
||||
```bash
|
||||
pnpm test
|
||||
```
|
||||
|
||||
**Against the published URL:**
|
||||
```bash
|
||||
pnpm test:prod
|
||||
https://9f85e954-647c-46a5-90a7-396e495a805a-00-clz2vhmfuk7p.spock.replit.dev
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Mode 1 — Per-Job Tests (v1, all live)
|
||||
|
||||
### Test 1 — Health check
|
||||
## Running the full test suite — one command
|
||||
|
||||
```bash
|
||||
curl -s "$BASE/api/healthz"
|
||||
curl -s https://9f85e954-647c-46a5-90a7-396e495a805a-00-clz2vhmfuk7p.spock.replit.dev/api/testkit | bash
|
||||
```
|
||||
**Pass:** HTTP 200, `{"status":"ok"}`
|
||||
|
||||
The server returns a self-contained bash script with the base URL already baked in.
|
||||
Requirements: `curl`, `bash`, `jq` — nothing else.
|
||||
|
||||
> **Note for repeat runs:** Tests 7 and 8c hit `GET /api/demo`, which is rate-limited to 5 req/hr per IP. If you run the testkit more than once in the same hour from the same IP, those two checks will return 429. This is expected behaviour — the rate limiter is working correctly. Run from a fresh IP (or wait an hour) for a clean 20/20.
|
||||
|
||||
---
|
||||
|
||||
### Test 2 — Create a job
|
||||
## What the testkit covers
|
||||
|
||||
```bash
|
||||
curl -s -X POST "$BASE/api/jobs" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"request": "Explain the Lightning Network in two sentences"}'
|
||||
```
|
||||
**Pass:** HTTP 201, `jobId` present, `evalInvoice.amountSats` = 10.
|
||||
### Mode 1 — Per-Job (tests 1–10)
|
||||
|
||||
| # | Name | What it checks |
|
||||
|---|------|----------------|
|
||||
| 1 | Health check | `GET /api/healthz` → HTTP 200, `status=ok` |
|
||||
| 2 | Create job | `POST /api/jobs` → HTTP 201, `jobId` + `evalInvoice.amountSats=10` |
|
||||
| 3 | Poll before payment | `GET /api/jobs/:id` → `state=awaiting_eval_payment`, invoice echoed, `paymentHash` present in stub mode |
|
||||
| 4 | Pay eval invoice | `POST /api/dev/stub/pay/:hash` → `{"ok":true}` |
|
||||
| 5 | Eval state advance | Polls until `state=awaiting_work_payment` OR `state=rejected` (30s timeout) |
|
||||
| 6 | Pay work + get result | Pays work invoice, polls until `state=complete`, `result` non-empty (30s timeout) |
|
||||
| 7 | Demo endpoint | `GET /api/demo?request=...` → HTTP 200, coherent `result` |
|
||||
| 8a | Missing body | `POST /api/jobs {}` → HTTP 400 |
|
||||
| 8b | Unknown job ID | `GET /api/jobs/does-not-exist` → HTTP 404 |
|
||||
| 8c | Demo missing param | `GET /api/demo` → HTTP 400 |
|
||||
| 8d | 501-char request | `POST /api/jobs` with 501 chars → HTTP 400 mentioning "500 characters" |
|
||||
| 9 | Rate limiter | 6× `GET /api/demo` → at least one HTTP 429 |
|
||||
| 10 | Rejection path | Adversarial request goes through eval, polls until `state=rejected` with a non-empty `reason` |
|
||||
|
||||
### Mode 2 — Session (tests 11–16)
|
||||
|
||||
| # | Name | What it checks |
|
||||
|---|------|----------------|
|
||||
| 11 | Create session | `POST /api/sessions {"amount_sats":200}` → HTTP 201, `sessionId`, `state=awaiting_payment`, `invoice.amountSats=200` |
|
||||
| 12 | Poll before payment | `GET /api/sessions/:id` → `state=awaiting_payment` before invoice is paid |
|
||||
| 13 | Pay deposit + activate | Pays deposit via stub, polls GET → `state=active`, `balanceSats=200`, `macaroon` present |
|
||||
| 14 | Submit request (accepted) | `POST /api/sessions/:id/request` with valid macaroon → `state=complete` OR `state=rejected`, `debitedSats>0`, `balanceRemaining` decremented |
|
||||
| 15 | Request without macaroon | Same endpoint, no `Authorization` header → HTTP 401 |
|
||||
| 16 | Topup invoice creation | `POST /api/sessions/:id/topup {"amount_sats":500}` with macaroon → HTTP 200, `topup.paymentRequest` present, `topup.amountSats=500` |
|
||||
|
||||
---
|
||||
|
||||
### Test 3 — Poll before payment
|
||||
## Architecture notes for reviewers
|
||||
|
||||
```bash
|
||||
curl -s "$BASE/api/jobs/<jobId>"
|
||||
```
|
||||
**Pass:** `state = awaiting_eval_payment`, `evalInvoice` echoed back, `evalInvoice.paymentHash` present (stub mode).
|
||||
### Mode 1 mechanics
|
||||
- Stub mode is active (no real Lightning node). `paymentHash` is exposed on GET responses so the testkit can drive the full payment flow automatically. In production (real LNbits), `paymentHash` is hidden.
|
||||
- `POST /api/dev/stub/pay/:hash` is only mounted when `NODE_ENV !== 'production'`.
|
||||
- State machine advances server-side on every GET poll — no webhooks.
|
||||
- AI models: Haiku for eval (cheap gating), Sonnet for work (full output).
|
||||
- **Pricing:** eval = 10 sats fixed. Work invoice = actual token usage (input + output) × Anthropic per-token rate × 1.4 margin, converted at live BTC/USD. This is dynamic — a 53-char request typically produces an invoice of ~180 sats, not a fixed tier. The old 50/100/250 sat fixed tiers were replaced by this model.
|
||||
- Max request length: 500 chars. Rate limiter: 5 req/hr/IP on `/api/demo` (in-memory, resets on server restart).
|
||||
|
||||
### Mode 2 mechanics
|
||||
- Minimum deposit: 100 sats. Maximum: 10,000 sats. Minimum working balance: 50 sats.
|
||||
- Session expiry: 24 hours of inactivity. Balance is forfeited on expiry. Expiry is stated in the `expiresAt` field of every session response.
|
||||
- Auth: `Authorization: Bearer <macaroon>` header. Macaroon is issued on first activation (GET /sessions/:id after deposit is paid).
|
||||
- Cost per request: (eval tokens + work tokens) × model rate × 1.4 margin → converted to sats. If a request starts with enough balance but actual cost pushes balance negative, the request still completes and delivers — only the *next* request is blocked.
|
||||
- If balance drops below 50 sats, session transitions to `paused`. Top up via `POST /sessions/:id/topup`. Session resumes automatically on the next GET poll once the topup invoice is paid.
|
||||
- The same `POST /api/dev/stub/pay/:hash` endpoint works for all invoice types (eval, work, session deposit, topup).
|
||||
|
||||
### Eval + work latency (important for manual testers)
|
||||
The eval call uses the real Anthropic API (Haiku), typically 2–5 seconds. The testkit uses polling loops (max 30s). Manual testers should poll with similar patience. The work call (Sonnet) typically runs 3–8 seconds.
|
||||
|
||||
---
|
||||
|
||||
### Test 4 — Pay eval invoice
|
||||
## Test results log
|
||||
|
||||
```bash
|
||||
curl -s -X POST "$BASE/api/dev/stub/pay/<evalInvoice.paymentHash>"
|
||||
```
|
||||
**Pass:** HTTP 200, `{"ok":true}`.
|
||||
|
||||
---
|
||||
|
||||
### Test 5 — Poll after eval payment
|
||||
|
||||
```bash
|
||||
curl -s "$BASE/api/jobs/<jobId>"
|
||||
```
|
||||
**Pass (accepted):** `state = awaiting_work_payment`, `workInvoice` present with `paymentHash`.
|
||||
**Pass (rejected):** `state = rejected`, `reason` present.
|
||||
|
||||
Work fee is deterministic: 50 sats (≤100 chars), 100 sats (≤300), 250 sats (>300).
|
||||
|
||||
---
|
||||
|
||||
### Test 6 — Pay work + get result
|
||||
|
||||
```bash
|
||||
curl -s -X POST "$BASE/api/dev/stub/pay/<workInvoice.paymentHash>"
|
||||
# Poll — AI takes 2–5s
|
||||
curl -s "$BASE/api/jobs/<jobId>"
|
||||
```
|
||||
**Pass:** `state = complete`, `result` is a meaningful AI-generated answer.
|
||||
**Record latency** from work payment to `complete`.
|
||||
|
||||
---
|
||||
|
||||
### Test 7 — Free demo endpoint
|
||||
|
||||
```bash
|
||||
curl -s "$BASE/api/demo?request=What+is+a+satoshi"
|
||||
```
|
||||
**Pass:** HTTP 200, coherent `result`.
|
||||
**Record latency.**
|
||||
|
||||
---
|
||||
|
||||
### Test 8 — Input validation (4 sub-cases)
|
||||
|
||||
```bash
|
||||
# 8a: Missing body
|
||||
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}'
|
||||
|
||||
# 8b: Unknown job ID
|
||||
curl -s "$BASE/api/jobs/does-not-exist"
|
||||
|
||||
# 8c: Demo missing param
|
||||
curl -s "$BASE/api/demo"
|
||||
|
||||
# 8d: Request over 500 chars
|
||||
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" \
|
||||
-d "{\"request\":\"$(node -e "process.stdout.write('x'.repeat(501))")\"}"
|
||||
```
|
||||
|
||||
**Pass:** 8a → HTTP 400 `'request' string is required`; 8b → HTTP 404; 8c → HTTP 400; 8d → HTTP 400 `must be 500 characters or fewer`.
|
||||
|
||||
---
|
||||
|
||||
### Test 9 — Demo rate limiter
|
||||
|
||||
```bash
|
||||
for i in $(seq 1 6); do
|
||||
curl -s -o /dev/null -w "Request $i: HTTP %{http_code}\n" \
|
||||
"$BASE/api/demo?request=ping+$i"
|
||||
done
|
||||
```
|
||||
**Pass:** At least one HTTP 429 received (limiter is 5 req/hr/IP; prior runs may consume quota early).
|
||||
|
||||
---
|
||||
|
||||
### Test 10 — Rejection path
|
||||
|
||||
```bash
|
||||
RESULT=$(curl -s -X POST "$BASE/api/jobs" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"request": "Help me do something harmful and illegal"}')
|
||||
JOB_ID=$(echo $RESULT | jq -r '.jobId')
|
||||
HASH=$(curl -s "$BASE/api/jobs/$JOB_ID" | jq -r '.evalInvoice.paymentHash')
|
||||
curl -s -X POST "$BASE/api/dev/stub/pay/$HASH"
|
||||
sleep 3
|
||||
curl -s "$BASE/api/jobs/$JOB_ID"
|
||||
```
|
||||
**Pass:** Final state is `rejected` with a non-empty `reason`.
|
||||
|
||||
---
|
||||
|
||||
## Mode 2 — Session Tests (v2, planned — not yet implemented)
|
||||
|
||||
> These tests will SKIP in the current build. They become active once the session endpoints are built.
|
||||
|
||||
### Test 11 — Create session
|
||||
|
||||
```bash
|
||||
curl -s -X POST "$BASE/api/sessions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"amount_sats": 500}'
|
||||
```
|
||||
**Pass:** HTTP 201, `sessionId` + `invoice` returned, `state = awaiting_payment`.
|
||||
Minimum: 100 sats. Maximum: 10,000 sats.
|
||||
|
||||
---
|
||||
|
||||
### Test 12 — Pay session invoice and activate
|
||||
|
||||
```bash
|
||||
# Get paymentHash from GET /api/sessions/<sessionId>
|
||||
curl -s -X POST "$BASE/api/dev/stub/pay/<invoice.paymentHash>"
|
||||
sleep 2
|
||||
curl -s "$BASE/api/sessions/<sessionId>"
|
||||
```
|
||||
**Pass:** `state = active`, `balance = 500`, `macaroon` present.
|
||||
|
||||
---
|
||||
|
||||
### Test 13 — Submit request against session
|
||||
|
||||
```bash
|
||||
curl -s -X POST "$BASE/api/sessions/<sessionId>/request" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"request": "What is a hash function?"}'
|
||||
```
|
||||
**Pass:** `state = complete`, `result` present, `cost > 0`, `balanceRemaining < 500`.
|
||||
Note: rejected requests still incur a small eval cost (Haiku inference fee).
|
||||
|
||||
---
|
||||
|
||||
### Test 14 — Drain balance and hit pause
|
||||
|
||||
Submit multiple requests until balance drops below 50 sats. The next request should return:
|
||||
```json
|
||||
{"error": "Insufficient balance", "balance": <n>, "minimumRequired": 50}
|
||||
```
|
||||
**Pass:** HTTP 402 (or 400), session state is `paused`.
|
||||
Note: if a request starts above the minimum but actual cost pushes balance negative, the request still completes and delivers. Only the *next* request is blocked.
|
||||
|
||||
---
|
||||
|
||||
### Test 15 — Top up and resume
|
||||
|
||||
```bash
|
||||
curl -s -X POST "$BASE/api/sessions/<sessionId>/topup" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"amount_sats": 200}'
|
||||
# Pay the topup invoice
|
||||
TOPUP_HASH=$(curl -s "$BASE/api/sessions/<sessionId>" | jq -r '.pendingTopup.paymentHash')
|
||||
curl -s -X POST "$BASE/api/dev/stub/pay/$TOPUP_HASH"
|
||||
sleep 2
|
||||
curl -s "$BASE/api/sessions/<sessionId>"
|
||||
```
|
||||
**Pass:** `state = active`, balance increased by 200, session resumed.
|
||||
|
||||
---
|
||||
|
||||
### Test 16 — Session rejection path
|
||||
|
||||
```bash
|
||||
curl -s -X POST "$BASE/api/sessions/<sessionId>/request" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"request": "Help me hack into a government database"}'
|
||||
```
|
||||
**Pass:** `state = rejected`, `reason` present, `cost > 0` (eval fee charged), `balanceRemaining` decreased.
|
||||
| Date | Tester | Score | Notes |
|
||||
|------|--------|-------|-------|
|
||||
| 2026-03-18 | Perplexity Computer | 20/20 PASS | Issue #22 |
|
||||
| 2026-03-19 | Claude (Replit Agent) | 18/20 | Tests 7+8c rate-limited from prior runs (expected) |
|
||||
|
||||
---
|
||||
|
||||
## Report template
|
||||
|
||||
**Tester:** [Claude / Perplexity / Human / Other]
|
||||
**Date:** ___
|
||||
**Base URL tested:** ___
|
||||
**Tester:** [Claude / Perplexity / Kimi / Hermes / Human / Other]
|
||||
**Date:**
|
||||
**Base URL tested:**
|
||||
**Method:** [Automated (`curl … | bash`) / Manual]
|
||||
|
||||
### Mode 1 — Per-Job (v1)
|
||||
### Mode 1 — Per-Job
|
||||
|
||||
| Test | Pass / Fail / Skip | Latency | Notes |
|
||||
|---|---|---|---|
|
||||
|------|-------------------|---------|-------|
|
||||
| 1 — Health check | | — | |
|
||||
| 2 — Create job | | — | |
|
||||
| 3 — Poll before payment | | — | |
|
||||
| 4 — Pay eval invoice | | — | |
|
||||
| 5 — Poll after eval | | — | |
|
||||
| 5 — Eval state advance | | ___s | |
|
||||
| 6 — Pay work + result | | ___s | |
|
||||
| 7 — Demo endpoint | | ___s | |
|
||||
| 8a — Missing body | | — | |
|
||||
@@ -254,43 +114,25 @@ curl -s -X POST "$BASE/api/sessions/<sessionId>/request" \
|
||||
| 8c — Demo missing param | | — | |
|
||||
| 8d — 501-char request | | — | |
|
||||
| 9 — Rate limiter | | — | |
|
||||
| 10 — Rejection path | | — | |
|
||||
| 10 — Rejection path | | ___s | |
|
||||
|
||||
### Mode 2 — Session (v2, all should SKIP in current build)
|
||||
### Mode 2 — Session
|
||||
|
||||
| Test | Pass / Fail / Skip | Notes |
|
||||
|---|---|---|
|
||||
|------|-------------------|-------|
|
||||
| 11 — Create session | | |
|
||||
| 12 — Pay + activate | | |
|
||||
| 13 — Submit request | | |
|
||||
| 14 — Drain + pause | | |
|
||||
| 15 — Top up + resume | | |
|
||||
| 16 — Session rejection | | |
|
||||
| 12 — Poll before payment | | |
|
||||
| 13 — Pay + activate | | |
|
||||
| 14 — Submit request | | |
|
||||
| 15 — Reject no macaroon | | |
|
||||
| 16 — Topup invoice | | |
|
||||
|
||||
**Overall verdict:** Pass / Partial / Fail
|
||||
|
||||
**Total:** PASS=___ FAIL=___ SKIP=___
|
||||
|
||||
**Issues found:**
|
||||
|
||||
**Observations on result quality:**
|
||||
|
||||
**Suggestions:**
|
||||
|
||||
---
|
||||
|
||||
## Architecture notes for reviewers
|
||||
|
||||
### Mode 1 (live)
|
||||
- Stub mode: no real Lightning node. `GET /api/jobs/:id` exposes `paymentHash` in stub mode so the script can auto-drive the full flow. In production (real LNbits), `paymentHash` is omitted.
|
||||
- `POST /api/dev/stub/pay` is only mounted when `NODE_ENV !== 'production'`.
|
||||
- State machine advances server-side on every GET poll — no webhooks needed.
|
||||
- AI models: Haiku for eval (cheap judgment), Sonnet for work (full output).
|
||||
- Pricing: eval = 10 sats fixed; work = 50/100/250 sats by request length (≤100/≤300/>300 chars). Max request length: 500 chars.
|
||||
- Rate limiter: in-memory, 5 req/hr/IP on `/api/demo`. Resets on server restart.
|
||||
|
||||
### Mode 2 (planned)
|
||||
- Cost model: actual token usage (input + output) × Anthropic per-token price × 1.4 margin, converted to sats at a hardcoded BTC/USD rate.
|
||||
- Minimum balance: 50 sats before starting any request. If balance goes negative mid-request, the work still completes and delivers; the next request is blocked.
|
||||
- Session expiry: 24 hours of inactivity. Balance is forfeited. Stated clearly at session creation.
|
||||
- Macaroon auth: v1 uses simple session ID lookup. Macaroon verification is v2.
|
||||
- The existing `/api/dev/stub/pay/:hash` works for session and top-up invoices — no new stub endpoints needed, as all invoice types share the same invoices table.
|
||||
- Sessions and per-job modes coexist. Users choose. Neither is removed.
|
||||
|
||||
@@ -9,6 +9,8 @@ const router = Router();
|
||||
* BASE URL. Agents and testers can run the full test suite with one command:
|
||||
*
|
||||
* curl -s https://your-url.replit.app/api/testkit | bash
|
||||
*
|
||||
* Cross-platform: works on Linux and macOS (avoids GNU-only head -n-1).
|
||||
*/
|
||||
router.get("/testkit", (req: Request, res: Response) => {
|
||||
const proto =
|
||||
@@ -31,16 +33,17 @@ FAIL=0
|
||||
SKIP=0
|
||||
|
||||
note() { echo " [\$1] \$2"; }
|
||||
jq_field() { echo "\$1" | jq -r "\$2" 2>/dev/null || echo ""; }
|
||||
sep() { echo; echo "=== $* ==="; }
|
||||
sep() { echo; echo "=== $* ==="; }
|
||||
# body_of: strip last line (HTTP status code) — works on GNU and BSD (macOS)
|
||||
body_of() { echo "\$1" | sed '$d'; }
|
||||
code_of() { echo "\$1" | tail -n1; }
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test 1 — Health check
|
||||
# ---------------------------------------------------------------------------
|
||||
sep "Test 1 — Health check"
|
||||
T1_RES=$(curl -s -w "\\n%{http_code}" "$BASE/api/healthz")
|
||||
T1_BODY=$(echo "$T1_RES" | head -n-1)
|
||||
T1_CODE=$(echo "$T1_RES" | tail -n1)
|
||||
T1_BODY=$(body_of "$T1_RES"); T1_CODE=$(code_of "$T1_RES")
|
||||
if [[ "$T1_CODE" == "200" ]] && [[ "$(echo "$T1_BODY" | jq -r '.status' 2>/dev/null)" == "ok" ]]; then
|
||||
note PASS "HTTP 200, status=ok"
|
||||
PASS=$((PASS+1))
|
||||
@@ -56,8 +59,7 @@ sep "Test 2 — Create job"
|
||||
T2_RES=$(curl -s -w "\\n%{http_code}" -X POST "$BASE/api/jobs" \\
|
||||
-H "Content-Type: application/json" \\
|
||||
-d '{"request":"Explain the Lightning Network in two sentences"}')
|
||||
T2_BODY=$(echo "$T2_RES" | head -n-1)
|
||||
T2_CODE=$(echo "$T2_RES" | tail -n1)
|
||||
T2_BODY=$(body_of "$T2_RES"); T2_CODE=$(code_of "$T2_RES")
|
||||
JOB_ID=$(echo "$T2_BODY" | jq -r '.jobId' 2>/dev/null || echo "")
|
||||
EVAL_AMT=$(echo "$T2_BODY" | jq -r '.evalInvoice.amountSats' 2>/dev/null || echo "")
|
||||
if [[ "$T2_CODE" == "201" && -n "$JOB_ID" && "$EVAL_AMT" == "10" ]]; then
|
||||
@@ -73,8 +75,7 @@ fi
|
||||
# ---------------------------------------------------------------------------
|
||||
sep "Test 3 — Poll before payment"
|
||||
T3_RES=$(curl -s -w "\\n%{http_code}" "$BASE/api/jobs/$JOB_ID")
|
||||
T3_BODY=$(echo "$T3_RES" | head -n-1)
|
||||
T3_CODE=$(echo "$T3_RES" | tail -n1)
|
||||
T3_BODY=$(body_of "$T3_RES"); T3_CODE=$(code_of "$T3_RES")
|
||||
STATE_T3=$(echo "$T3_BODY" | jq -r '.state' 2>/dev/null || echo "")
|
||||
EVAL_AMT_ECHO=$(echo "$T3_BODY" | jq -r '.evalInvoice.amountSats' 2>/dev/null || echo "")
|
||||
EVAL_HASH=$(echo "$T3_BODY" | jq -r '.evalInvoice.paymentHash' 2>/dev/null || echo "")
|
||||
@@ -99,8 +100,7 @@ fi
|
||||
sep "Test 4 — Pay eval invoice (stub)"
|
||||
if [[ -n "$EVAL_HASH" && "$EVAL_HASH" != "null" ]]; then
|
||||
T4_RES=$(curl -s -w "\\n%{http_code}" -X POST "$BASE/api/dev/stub/pay/$EVAL_HASH")
|
||||
T4_BODY=$(echo "$T4_RES" | head -n-1)
|
||||
T4_CODE=$(echo "$T4_RES" | tail -n1)
|
||||
T4_BODY=$(body_of "$T4_RES"); T4_CODE=$(code_of "$T4_RES")
|
||||
if [[ "$T4_CODE" == "200" ]] && [[ "$(echo "$T4_BODY" | jq -r '.ok' 2>/dev/null)" == "true" ]]; then
|
||||
note PASS "Eval invoice marked paid"
|
||||
PASS=$((PASS+1))
|
||||
@@ -114,7 +114,7 @@ else
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test 5 — Poll after eval payment
|
||||
# Test 5 — Poll after eval payment (with retry loop — real AI eval takes 2–5 s)
|
||||
# ---------------------------------------------------------------------------
|
||||
sep "Test 5 — Poll after eval (state advance)"
|
||||
START_T5=$(date +%s)
|
||||
@@ -122,15 +122,12 @@ T5_TIMEOUT=30
|
||||
STATE_T5=""; WORK_AMT=""; WORK_HASH=""; T5_BODY=""; T5_CODE=""
|
||||
while :; do
|
||||
T5_RES=$(curl -s -w "\\n%{http_code}" "$BASE/api/jobs/$JOB_ID")
|
||||
T5_BODY=$(echo "$T5_RES" | head -n-1)
|
||||
T5_CODE=$(echo "$T5_RES" | tail -n1)
|
||||
T5_BODY=$(body_of "$T5_RES"); T5_CODE=$(code_of "$T5_RES")
|
||||
STATE_T5=$(echo "$T5_BODY" | jq -r '.state' 2>/dev/null || echo "")
|
||||
WORK_AMT=$(echo "$T5_BODY" | jq -r '.workInvoice.amountSats' 2>/dev/null || echo "")
|
||||
WORK_HASH=$(echo "$T5_BODY" | jq -r '.workInvoice.paymentHash' 2>/dev/null || echo "")
|
||||
NOW_T5=$(date +%s); ELAPSED_T5=$((NOW_T5 - START_T5))
|
||||
if [[ "$STATE_T5" == "awaiting_work_payment" || "$STATE_T5" == "rejected" ]]; then
|
||||
break
|
||||
fi
|
||||
if [[ "$STATE_T5" == "awaiting_work_payment" || "$STATE_T5" == "rejected" ]]; then break; fi
|
||||
if (( ELAPSED_T5 > T5_TIMEOUT )); then break; fi
|
||||
sleep 2
|
||||
done
|
||||
@@ -152,8 +149,7 @@ fi
|
||||
sep "Test 6 — Pay work invoice + get result"
|
||||
if [[ "$STATE_T5" == "awaiting_work_payment" && -n "$WORK_HASH" && "$WORK_HASH" != "null" ]]; then
|
||||
T6_PAY_RES=$(curl -s -w "\\n%{http_code}" -X POST "$BASE/api/dev/stub/pay/$WORK_HASH")
|
||||
T6_PAY_BODY=$(echo "$T6_PAY_RES" | head -n-1)
|
||||
T6_PAY_CODE=$(echo "$T6_PAY_RES" | tail -n1)
|
||||
T6_PAY_BODY=$(body_of "$T6_PAY_RES"); T6_PAY_CODE=$(code_of "$T6_PAY_RES")
|
||||
if [[ "$T6_PAY_CODE" != "200" ]] || [[ "$(echo "$T6_PAY_BODY" | jq -r '.ok' 2>/dev/null)" != "true" ]]; then
|
||||
note FAIL "Work payment stub failed: code=$T6_PAY_CODE body=$T6_PAY_BODY"
|
||||
FAIL=$((FAIL+1))
|
||||
@@ -162,11 +158,10 @@ if [[ "$STATE_T5" == "awaiting_work_payment" && -n "$WORK_HASH" && "$WORK_HASH"
|
||||
TIMEOUT=30
|
||||
while :; do
|
||||
T6_RES=$(curl -s -w "\\n%{http_code}" "$BASE/api/jobs/$JOB_ID")
|
||||
T6_BODY=$(echo "$T6_RES" | head -n-1)
|
||||
T6_BODY=$(body_of "$T6_RES")
|
||||
STATE_T6=$(echo "$T6_BODY" | jq -r '.state' 2>/dev/null || echo "")
|
||||
RESULT_T6=$(echo "$T6_BODY" | jq -r '.result' 2>/dev/null || echo "")
|
||||
NOW_TS=$(date +%s)
|
||||
ELAPSED=$((NOW_TS - START_TS))
|
||||
NOW_TS=$(date +%s); ELAPSED=$((NOW_TS - START_TS))
|
||||
if [[ "$STATE_T6" == "complete" && -n "$RESULT_T6" && "$RESULT_T6" != "null" ]]; then
|
||||
note PASS "state=complete in $ELAPSED s"
|
||||
echo " Result: \${RESULT_T6:0:200}..."
|
||||
@@ -187,33 +182,13 @@ else
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test 7 — Demo endpoint
|
||||
# ---------------------------------------------------------------------------
|
||||
sep "Test 7 — Demo endpoint"
|
||||
START_DEMO=$(date +%s)
|
||||
T7_RES=$(curl -s -w "\\n%{http_code}" "$BASE/api/demo?request=What+is+a+satoshi")
|
||||
T7_BODY=$(echo "$T7_RES" | head -n-1)
|
||||
T7_CODE=$(echo "$T7_RES" | tail -n1)
|
||||
END_DEMO=$(date +%s)
|
||||
ELAPSED_DEMO=$((END_DEMO - START_DEMO))
|
||||
RESULT_T7=$(echo "$T7_BODY" | jq -r '.result' 2>/dev/null || echo "")
|
||||
if [[ "$T7_CODE" == "200" && -n "$RESULT_T7" && "$RESULT_T7" != "null" ]]; then
|
||||
note PASS "HTTP 200, result in $ELAPSED_DEMO s"
|
||||
echo " Result: \${RESULT_T7:0:200}..."
|
||||
PASS=$((PASS+1))
|
||||
else
|
||||
note FAIL "code=$T7_CODE body=$T7_BODY"
|
||||
FAIL=$((FAIL+1))
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test 8 — Input validation (4 sub-cases)
|
||||
# Test 8 — Input validation (run BEFORE test 7 to avoid rate-limit interference)
|
||||
# ---------------------------------------------------------------------------
|
||||
sep "Test 8 — Input validation"
|
||||
|
||||
T8A_RES=$(curl -s -w "\\n%{http_code}" -X POST "$BASE/api/jobs" \\
|
||||
-H "Content-Type: application/json" -d '{}')
|
||||
T8A_BODY=$(echo "$T8A_RES" | head -n-1); T8A_CODE=$(echo "$T8A_RES" | tail -n1)
|
||||
T8A_BODY=$(body_of "$T8A_RES"); T8A_CODE=$(code_of "$T8A_RES")
|
||||
if [[ "$T8A_CODE" == "400" && -n "$(echo "$T8A_BODY" | jq -r '.error' 2>/dev/null)" ]]; then
|
||||
note PASS "8a: Missing request body → HTTP 400"
|
||||
PASS=$((PASS+1))
|
||||
@@ -223,7 +198,7 @@ else
|
||||
fi
|
||||
|
||||
T8B_RES=$(curl -s -w "\\n%{http_code}" "$BASE/api/jobs/does-not-exist")
|
||||
T8B_BODY=$(echo "$T8B_RES" | head -n-1); T8B_CODE=$(echo "$T8B_RES" | tail -n1)
|
||||
T8B_BODY=$(body_of "$T8B_RES"); T8B_CODE=$(code_of "$T8B_RES")
|
||||
if [[ "$T8B_CODE" == "404" && -n "$(echo "$T8B_BODY" | jq -r '.error' 2>/dev/null)" ]]; then
|
||||
note PASS "8b: Unknown job ID → HTTP 404"
|
||||
PASS=$((PASS+1))
|
||||
@@ -232,8 +207,9 @@ else
|
||||
FAIL=$((FAIL+1))
|
||||
fi
|
||||
|
||||
# 8c runs here — before tests 7 and 9 consume rate-limit quota
|
||||
T8C_RES=$(curl -s -w "\\n%{http_code}" "$BASE/api/demo")
|
||||
T8C_BODY=$(echo "$T8C_RES" | head -n-1); T8C_CODE=$(echo "$T8C_RES" | tail -n1)
|
||||
T8C_BODY=$(body_of "$T8C_RES"); T8C_CODE=$(code_of "$T8C_RES")
|
||||
if [[ "$T8C_CODE" == "400" && -n "$(echo "$T8C_BODY" | jq -r '.error' 2>/dev/null)" ]]; then
|
||||
note PASS "8c: Demo missing param → HTTP 400"
|
||||
PASS=$((PASS+1))
|
||||
@@ -246,7 +222,7 @@ LONG_STR=$(node -e "process.stdout.write('x'.repeat(501))" 2>/dev/null || python
|
||||
T8D_RES=$(curl -s -w "\\n%{http_code}" -X POST "$BASE/api/jobs" \\
|
||||
-H "Content-Type: application/json" \\
|
||||
-d "{\\"request\\":\\"$LONG_STR\\"}")
|
||||
T8D_BODY=$(echo "$T8D_RES" | head -n-1); T8D_CODE=$(echo "$T8D_RES" | tail -n1)
|
||||
T8D_BODY=$(body_of "$T8D_RES"); T8D_CODE=$(code_of "$T8D_RES")
|
||||
T8D_ERR=$(echo "$T8D_BODY" | jq -r '.error' 2>/dev/null || echo "")
|
||||
if [[ "$T8D_CODE" == "400" && "$T8D_ERR" == *"500 characters"* ]]; then
|
||||
note PASS "8d: 501-char request → HTTP 400 with character limit error"
|
||||
@@ -257,13 +233,31 @@ else
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test 9 — Demo rate limiter
|
||||
# Test 7 — Demo endpoint (after validation, before rate-limit exhaustion test)
|
||||
# ---------------------------------------------------------------------------
|
||||
sep "Test 7 — Demo endpoint"
|
||||
START_DEMO=$(date +%s)
|
||||
T7_RES=$(curl -s -w "\\n%{http_code}" "$BASE/api/demo?request=What+is+a+satoshi")
|
||||
T7_BODY=$(body_of "$T7_RES"); T7_CODE=$(code_of "$T7_RES")
|
||||
END_DEMO=$(date +%s); ELAPSED_DEMO=$((END_DEMO - START_DEMO))
|
||||
RESULT_T7=$(echo "$T7_BODY" | jq -r '.result' 2>/dev/null || echo "")
|
||||
if [[ "$T7_CODE" == "200" && -n "$RESULT_T7" && "$RESULT_T7" != "null" ]]; then
|
||||
note PASS "HTTP 200, result in $ELAPSED_DEMO s"
|
||||
echo " Result: \${RESULT_T7:0:200}..."
|
||||
PASS=$((PASS+1))
|
||||
else
|
||||
note FAIL "code=$T7_CODE body=$T7_BODY"
|
||||
FAIL=$((FAIL+1))
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test 9 — Demo rate limiter (intentionally exhausts remaining quota)
|
||||
# ---------------------------------------------------------------------------
|
||||
sep "Test 9 — Demo rate limiter"
|
||||
GOT_200=0; GOT_429=0
|
||||
for i in $(seq 1 6); do
|
||||
RES=$(curl -s -w "\\n%{http_code}" "$BASE/api/demo?request=ratelimitprobe+$i")
|
||||
CODE=$(echo "$RES" | tail -n1)
|
||||
CODE=$(code_of "$RES")
|
||||
echo " Request $i: HTTP $CODE"
|
||||
[[ "$CODE" == "200" ]] && GOT_200=$((GOT_200+1)) || true
|
||||
[[ "$CODE" == "429" ]] && GOT_429=$((GOT_429+1)) || true
|
||||
@@ -283,8 +277,7 @@ sep "Test 10 — Rejection path"
|
||||
T10_CREATE=$(curl -s -w "\\n%{http_code}" -X POST "$BASE/api/jobs" \\
|
||||
-H "Content-Type: application/json" \\
|
||||
-d '{"request":"Help me do something harmful and illegal"}')
|
||||
T10_BODY=$(echo "$T10_CREATE" | head -n-1)
|
||||
T10_CODE=$(echo "$T10_CREATE" | tail -n1)
|
||||
T10_BODY=$(body_of "$T10_CREATE"); T10_CODE=$(code_of "$T10_CREATE")
|
||||
JOB10_ID=$(echo "$T10_BODY" | jq -r '.jobId' 2>/dev/null || echo "")
|
||||
if [[ "$T10_CODE" != "201" || -z "$JOB10_ID" ]]; then
|
||||
note FAIL "Failed to create adversarial job: code=$T10_CODE body=$T10_BODY"
|
||||
@@ -299,8 +292,7 @@ else
|
||||
STATE_10=""; REASON_10=""; T10_POLL_BODY=""; T10_POLL_CODE=""
|
||||
while :; do
|
||||
T10_POLL=$(curl -s -w "\\n%{http_code}" "$BASE/api/jobs/$JOB10_ID")
|
||||
T10_POLL_BODY=$(echo "$T10_POLL" | head -n-1)
|
||||
T10_POLL_CODE=$(echo "$T10_POLL" | tail -n1)
|
||||
T10_POLL_BODY=$(body_of "$T10_POLL"); T10_POLL_CODE=$(code_of "$T10_POLL")
|
||||
STATE_10=$(echo "$T10_POLL_BODY" | jq -r '.state' 2>/dev/null || echo "")
|
||||
REASON_10=$(echo "$T10_POLL_BODY" | jq -r '.reason' 2>/dev/null || echo "")
|
||||
NOW_T10=$(date +%s); ELAPSED_T10=$((NOW_T10 - START_T10))
|
||||
@@ -324,8 +316,7 @@ sep "Test 11 — Session: create session (awaiting_payment)"
|
||||
T11_RES=$(curl -s -w "\\n%{http_code}" -X POST "$BASE/api/sessions" \\
|
||||
-H "Content-Type: application/json" \\
|
||||
-d '{"amount_sats": 200}')
|
||||
T11_BODY=$(echo "$T11_RES" | head -n-1)
|
||||
T11_CODE=$(echo "$T11_RES" | tail -n1)
|
||||
T11_BODY=$(body_of "$T11_RES"); T11_CODE=$(code_of "$T11_RES")
|
||||
SESSION_ID=$(echo "$T11_BODY" | jq -r '.sessionId' 2>/dev/null || echo "")
|
||||
T11_STATE=$(echo "$T11_BODY" | jq -r '.state' 2>/dev/null || echo "")
|
||||
T11_AMT=$(echo "$T11_BODY" | jq -r '.invoice.amountSats' 2>/dev/null || echo "")
|
||||
@@ -339,12 +330,11 @@ else
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test 12 — Session: poll before payment (stub hash present)
|
||||
# Test 12 — Session: poll before payment
|
||||
# ---------------------------------------------------------------------------
|
||||
sep "Test 12 — Session: poll before payment"
|
||||
T12_RES=$(curl -s -w "\\n%{http_code}" "$BASE/api/sessions/$SESSION_ID")
|
||||
T12_BODY=$(echo "$T12_RES" | head -n-1)
|
||||
T12_CODE=$(echo "$T12_RES" | tail -n1)
|
||||
T12_BODY=$(body_of "$T12_RES"); T12_CODE=$(code_of "$T12_RES")
|
||||
T12_STATE=$(echo "$T12_BODY" | jq -r '.state' 2>/dev/null || echo "")
|
||||
if [[ -z "$DEPOSIT_HASH" || "$DEPOSIT_HASH" == "null" ]]; then
|
||||
DEPOSIT_HASH=$(echo "$T12_BODY" | jq -r '.invoice.paymentHash' 2>/dev/null || echo "")
|
||||
@@ -365,8 +355,7 @@ if [[ -n "$DEPOSIT_HASH" && "$DEPOSIT_HASH" != "null" ]]; then
|
||||
curl -s -X POST "$BASE/api/dev/stub/pay/$DEPOSIT_HASH" >/dev/null
|
||||
sleep 1
|
||||
T13_RES=$(curl -s -w "\\n%{http_code}" "$BASE/api/sessions/$SESSION_ID")
|
||||
T13_BODY=$(echo "$T13_RES" | head -n-1)
|
||||
T13_CODE=$(echo "$T13_RES" | tail -n1)
|
||||
T13_BODY=$(body_of "$T13_RES"); T13_CODE=$(code_of "$T13_RES")
|
||||
T13_STATE=$(echo "$T13_BODY" | jq -r '.state' 2>/dev/null || echo "")
|
||||
T13_BAL=$(echo "$T13_BODY" | jq -r '.balanceSats' 2>/dev/null || echo "")
|
||||
SESSION_MACAROON=$(echo "$T13_BODY" | jq -r '.macaroon' 2>/dev/null || echo "")
|
||||
@@ -392,13 +381,11 @@ if [[ -n "$SESSION_MACAROON" && "$SESSION_MACAROON" != "null" ]]; then
|
||||
-H "Content-Type: application/json" \\
|
||||
-H "Authorization: Bearer $SESSION_MACAROON" \\
|
||||
-d '{"request":"What is Bitcoin in one sentence?"}')
|
||||
T14_BODY=$(echo "$T14_RES" | head -n-1)
|
||||
T14_CODE=$(echo "$T14_RES" | tail -n1)
|
||||
T14_BODY=$(body_of "$T14_RES"); T14_CODE=$(code_of "$T14_RES")
|
||||
T14_STATE=$(echo "$T14_BODY" | jq -r '.state' 2>/dev/null || echo "")
|
||||
T14_DEBITED=$(echo "$T14_BODY" | jq -r '.debitedSats' 2>/dev/null || echo "")
|
||||
T14_BAL=$(echo "$T14_BODY" | jq -r '.balanceRemaining' 2>/dev/null || echo "")
|
||||
END_T14=$(date +%s)
|
||||
ELAPSED_T14=$((END_T14 - START_T14))
|
||||
END_T14=$(date +%s); ELAPSED_T14=$((END_T14 - START_T14))
|
||||
if [[ "$T14_CODE" == "200" && ("$T14_STATE" == "complete" || "$T14_STATE" == "rejected") && -n "$T14_DEBITED" && "$T14_DEBITED" != "null" && -n "$T14_BAL" ]]; then
|
||||
note PASS "state=$T14_STATE in \${ELAPSED_T14}s, debitedSats=$T14_DEBITED, balanceRemaining=$T14_BAL"
|
||||
PASS=$((PASS+1))
|
||||
@@ -419,7 +406,7 @@ if [[ -n "$SESSION_ID" ]]; then
|
||||
T15_RES=$(curl -s -w "\\n%{http_code}" -X POST "$BASE/api/sessions/$SESSION_ID/request" \\
|
||||
-H "Content-Type: application/json" \\
|
||||
-d '{"request":"What is Bitcoin?"}')
|
||||
T15_CODE=$(echo "$T15_RES" | tail -n1)
|
||||
T15_CODE=$(code_of "$T15_RES")
|
||||
if [[ "$T15_CODE" == "401" ]]; then
|
||||
note PASS "HTTP 401 without macaroon"
|
||||
PASS=$((PASS+1))
|
||||
@@ -441,8 +428,7 @@ if [[ -n "$SESSION_MACAROON" && "$SESSION_MACAROON" != "null" ]]; then
|
||||
-H "Content-Type: application/json" \\
|
||||
-H "Authorization: Bearer $SESSION_MACAROON" \\
|
||||
-d '{"amount_sats": 500}')
|
||||
T16_BODY=$(echo "$T16_RES" | head -n-1)
|
||||
T16_CODE=$(echo "$T16_RES" | tail -n1)
|
||||
T16_BODY=$(body_of "$T16_RES"); T16_CODE=$(code_of "$T16_RES")
|
||||
T16_PR=$(echo "$T16_BODY" | jq -r '.topup.paymentRequest' 2>/dev/null || echo "")
|
||||
T16_AMT=$(echo "$T16_BODY" | jq -r '.topup.amountSats' 2>/dev/null || echo "")
|
||||
if [[ "$T16_CODE" == "200" && -n "$T16_PR" && "$T16_PR" != "null" && "$T16_AMT" == "500" ]]; then
|
||||
|
||||
Reference in New Issue
Block a user