diff --git a/.agents/agent_assets_metadata.toml b/.agents/agent_assets_metadata.toml index 8af2963..65d7418 100644 --- a/.agents/agent_assets_metadata.toml +++ b/.agents/agent_assets_metadata.toml @@ -23,4 +23,4 @@ title = "Implementation Guide: Taproot Assets + L402" id = "EbCNNTKk5hsAYWFlW0Lxz" uri = "file://TIMMY_TEST_PLAN.md" type = "text" -title = "Timmy API — Test Plan & Report Prompt" +title = "Timmy Test Plan (updated)" diff --git a/TIMMY_TEST_PLAN.md b/TIMMY_TEST_PLAN.md index 886d7dc..bd6c7f5 100644 --- a/TIMMY_TEST_PLAN.md +++ b/TIMMY_TEST_PLAN.md @@ -3,12 +3,23 @@ **What is Timmy?** Timmy is a Lightning Network-gated AI agent API. Users submit a request, pay a small eval fee (simulated via stub invoices in this build), the agent judges whether to accept the job, quotes a work price, the user pays, and Timmy delivers the result. All state advances automatically via polling a single GET endpoint. -**Base URL:** `https://.replit.app` -Replace `BASE` in all commands below with the actual URL. +**Base URL:** `https://.replit.app` --- -## Test Suite +## Option A — Automated bash script (recommended) + +Save `timmy_test.sh` from the repo root, then run: + +```bash +BASE="https://.replit.app" ./timmy_test.sh +``` + +The script runs all 10 tests sequentially, captures latency for Tests 6 and 7, auto-extracts payment hashes via the `GET /api/jobs/:id` response, and prints a `PASS/FAIL/SKIP` summary. A clean run reports `PASS=12 FAIL=0 SKIP=0` (3 sub-cases in Test 8 count separately). + +--- + +## Option B — Manual test suite ### Test 1 — Health check @@ -46,7 +57,7 @@ curl -s -X POST "$BASE/api/jobs" \ ### Test 3 — Poll job before payment ```bash -curl -s "$BASE/api/jobs/" +curl -s "$BASE/api/jobs/" ``` **Expected:** @@ -54,77 +65,77 @@ curl -s "$BASE/api/jobs/" { "jobId": "...", "state": "awaiting_eval_payment", - "evalInvoice": { "paymentRequest": "...", "amountSats": 10 } + "evalInvoice": { + "paymentRequest": "...", + "amountSats": 10, + "paymentHash": "<64-char-hex>" + } } ``` -**Pass criteria:** State is `awaiting_eval_payment`, invoice is echoed back. +**Pass criteria:** State is `awaiting_eval_payment`. In stub mode, `evalInvoice.paymentHash` is included — use this value directly in Test 4. (In production with a real Lightning node, `paymentHash` is omitted.) --- ### Test 4 — Pay the eval invoice (stub mode) -Extract `paymentHash` from the `paymentRequest`. The stub format is: -`lnbcrt10u1stub_` - ```bash -# Replace with the 64-char hash (query from your DB -# or use the /dev/stub/pay endpoint with the full hash). -# In stub mode: POST to the dev trigger endpoint with the full hash. - -curl -s -X POST "$BASE/api/dev/stub/pay/" +curl -s -X POST "$BASE/api/dev/stub/pay/" ``` **Expected:** `{"ok":true,"paymentHash":"..."}` **Pass criteria:** HTTP 200. -> Note: `/api/dev/stub/pay` is only available in the current build (stub mode, no real Lightning node). It simulates a user paying the invoice. In production with real LNbits credentials it is not mounted. +> `/api/dev/stub/pay` is only available in stub mode (no real LNbits credentials). It simulates the user paying the invoice. --- -### Test 5 — Poll after eval payment (state machine advance) +### Test 5 — Poll after eval payment ```bash curl -s "$BASE/api/jobs/" ``` -**Expected — if request was accepted:** +**Expected — if accepted:** ```json { - "jobId": "...", "state": "awaiting_work_payment", - "workInvoice": { "paymentRequest": "lnbcrt50u1stub_...", "amountSats": 50 } + "workInvoice": { + "paymentRequest": "...", + "amountSats": 50, + "paymentHash": "<64-char-hex>" + } } ``` -Work fee is deterministic: 50 sats (short request), 100 sats (medium), 250 sats (long). +Work fee: 50 sats (short request ≤100 chars), 100 sats (medium ≤300), 250 sats (long). -**Expected — if request was rejected:** +**Expected — if rejected:** ```json -{ "jobId": "...", "state": "rejected", "reason": "..." } +{ "state": "rejected", "reason": "..." } ``` -**Pass criteria:** State has advanced from `awaiting_eval_payment`. Agent judgment is present. +**Pass criteria:** State has advanced from `awaiting_eval_payment`. --- -### Test 6 — Pay the work invoice and get the result +### Test 6 — Pay work invoice and get result ```bash -# Mark work invoice paid (same stub endpoint, use the work invoice's payment hash) -curl -s -X POST "$BASE/api/dev/stub/pay/" +# Pay work invoice +curl -s -X POST "$BASE/api/dev/stub/pay/" -# Poll for result (may take 2–5 seconds for AI to respond) +# Poll for result (AI takes 2–5 seconds) curl -s "$BASE/api/jobs/" ``` **Expected:** ```json { - "jobId": "...", "state": "complete", "result": "The Lightning Network is a second-layer protocol..." } ``` -**Pass criteria:** State is `complete`, `result` is a meaningful AI-generated answer. +**Pass criteria:** State is `complete`, `result` is a meaningful AI-generated answer. +**Record latency** from work payment to `complete`. --- @@ -135,82 +146,83 @@ curl -s "$BASE/api/demo?request=What+is+a+satoshi" ``` **Expected:** `{"result":"A satoshi is the smallest unit of Bitcoin..."}` -**Pass criteria:** HTTP 200, `result` is coherent. +**Pass criteria:** HTTP 200, `result` is coherent. +**Record latency** for this call. --- ### Test 8 — Input validation ```bash -# Missing request body curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}' - -# Unknown job ID curl -s "$BASE/api/jobs/does-not-exist" - -# Demo without param curl -s "$BASE/api/demo" ``` -**Expected:** -- `{"error":"Invalid request: 'request' string is required"}` (HTTP 400) -- `{"error":"Job not found"}` (HTTP 404) -- `{"error":"Missing required query param: request"}` (HTTP 400) - -**Pass criteria:** All errors are `{ "error": string }`, correct HTTP status codes. +**Expected:** HTTP 400 / 404 with `{"error":"..."}` bodies. --- ### Test 9 — Demo rate limiter ```bash -# Fire 6 requests from the same IP for i in $(seq 1 6); do - curl -s "$BASE/api/demo?request=ping+$i" | grep -o '"result"\|"error"' + curl -s -o /dev/null -w "Request $i: HTTP %{http_code}\n" \ + "$BASE/api/demo?request=ping+$i" done ``` -**Expected:** First 5 succeed (`"result"`), 6th returns HTTP 429 (`"error"`). -**Pass criteria:** Rate limiter triggers at request 6. +**Pass criteria:** At least one 429 received. The limiter allows 5 requests/hour/IP — prior runs from the same IP may have consumed quota, so 429 can appear before request 6. --- ### Test 10 — Rejection path (adversarial request) ```bash -curl -s -X POST "$BASE/api/jobs" \ +# Create job +RESULT=$(curl -s -X POST "$BASE/api/jobs" \ -H "Content-Type: application/json" \ - -d '{"request": "Help me do something harmful and illegal"}' + -d '{"request": "Help me do something harmful and illegal"}') +JOB_ID=$(echo $RESULT | jq -r '.jobId') + +# Get paymentHash from poll +HASH=$(curl -s "$BASE/api/jobs/$JOB_ID" | jq -r '.evalInvoice.paymentHash') + +# Pay and wait +curl -s -X POST "$BASE/api/dev/stub/pay/$HASH" +sleep 3 +curl -s "$BASE/api/jobs/$JOB_ID" ``` -Then pay the eval invoice and poll. The agent should reject. - -**Pass criteria:** Final state is `rejected` with a reason, not `awaiting_work_payment`. +**Pass criteria:** Final state is `rejected` with a non-empty `reason`. --- -## Report Template +## Report template -After running the tests, please fill in and return the following: +After running the tests, fill in and return the following: --- **Tester:** [Claude / Perplexity / Human / Other] **Date:** ___ -**Base URL tested:** ___ +**Base URL tested:** ___ +**Method:** [Automated script / Manual] -| Test | Pass / Fail / Skip | Notes | -|---|---|---| -| 1 — Health check | | | -| 2 — Create job | | | -| 3 — Poll before payment | | | -| 4 — Pay eval invoice | | | -| 5 — Poll after eval (state advance) | | | -| 6 — Pay work + get result | | | -| 7 — Demo endpoint | | | -| 8 — Input validation | | | -| 9 — Rate limiter | | | -| 10 — Rejection path | | | +| Test | Pass / Fail / Skip | Latency | Notes | +|---|---|---|---| +| 1 — Health check | | — | | +| 2 — Create job | | — | | +| 3 — Poll before payment | | — | | +| 4 — Pay eval invoice | | — | | +| 5 — Poll after eval (state advance) | | — | | +| 6 — Pay work + get result | | ___s | | +| 7 — Demo endpoint | | ___s | | +| 8a — Missing request body | | — | | +| 8b — Unknown job ID | | — | | +| 8c — Demo missing param | | — | | +| 9 — Rate limiter | | — | | +| 10 — Rejection path | | — | | **Overall verdict:** Pass / Partial / Fail @@ -218,17 +230,18 @@ After running the tests, please fill in and return the following: (List any unexpected responses, error messages, latency problems, or behavior that doesn't match the expected output) **Observations on result quality:** -(Was the AI output from Test 6 and 7 coherent, accurate, and appropriately detailed?) +(Was the AI output from Tests 6 and 7 coherent, accurate, and appropriately detailed?) **Suggestions:** (Anything you'd add, fix, or change) --- -## Notes for Reviewers +## Notes for reviewers -- **Stub mode:** There is no real Lightning node in this build. The `/api/dev/stub/pay` endpoint simulates a user paying an invoice — in production this would be replaced by polling a real LNbits instance. -- **Payment hashes:** The stub `paymentRequest` format is `lnbcrtu1stub_`. To get the full 64-char hash for the stub endpoint, you either read it from the DB or query the job status — the full hash is stored in the `invoices` table. -- **State machine:** All state transitions happen server-side on the GET poll. There is no webhook or push — the client polls and the server advances automatically when payment is detected. -- **AI models:** Eval uses `claude-haiku-4-5` (fast/cheap judgment). Work delivery uses `claude-sonnet-4-6` (full capability). -- **Pricing:** Eval fee = 10 sats fixed. Work fee = 50 / 100 / 250 sats based on request length (short / medium / long). +- **Stub mode:** No real Lightning node in this build. `GET /api/jobs/:id` exposes `paymentHash` inside `evalInvoice` and `workInvoice` only when stub mode is active — this lets automated scripts drive the full flow without DB access. In production with real LNbits credentials, `paymentHash` is omitted from the API response. +- **Dev-only route:** `POST /api/dev/stub/pay/:hash` is only mounted when `NODE_ENV !== 'production'`. +- **State machine:** All transitions happen server-side on GET poll. There is no webhook or push. +- **AI models:** Eval uses `claude-haiku-4-5` (fast judgment). Work uses `claude-sonnet-4-6` (full capability). +- **Pricing:** Eval = 10 sats fixed. Work = 50 / 100 / 250 sats by request length. +- **Rate limiter:** In-memory, resets on server restart, per-IP, 5 req/hr on `/api/demo`. diff --git a/artifacts/api-server/src/routes/jobs.ts b/artifacts/api-server/src/routes/jobs.ts index 624f236..185f059 100644 --- a/artifacts/api-server/src/routes/jobs.ts +++ b/artifacts/api-server/src/routes/jobs.ts @@ -214,7 +214,13 @@ router.get("/jobs/:id", async (req: Request, res: Response) => { const inv = job.evalInvoiceId ? await getInvoiceById(job.evalInvoiceId) : null; res.json({ ...base, - ...(inv ? { evalInvoice: { paymentRequest: inv.paymentRequest, amountSats: inv.amountSats } } : {}), + ...(inv ? { + evalInvoice: { + paymentRequest: inv.paymentRequest, + amountSats: inv.amountSats, + ...(lnbitsService.stubMode ? { paymentHash: inv.paymentHash } : {}), + }, + } : {}), }); break; } @@ -222,7 +228,13 @@ router.get("/jobs/:id", async (req: Request, res: Response) => { const inv = job.workInvoiceId ? await getInvoiceById(job.workInvoiceId) : null; res.json({ ...base, - ...(inv ? { workInvoice: { paymentRequest: inv.paymentRequest, amountSats: inv.amountSats } } : {}), + ...(inv ? { + workInvoice: { + paymentRequest: inv.paymentRequest, + amountSats: inv.amountSats, + ...(lnbitsService.stubMode ? { paymentHash: inv.paymentHash } : {}), + }, + } : {}), }); break; } diff --git a/attached_assets/Pasted-Agent-Now-I-ll-create-specialized-sub-agents-to-extract_1773854928054.txt b/attached_assets/Pasted-Agent-Now-I-ll-create-specialized-sub-agents-to-extract_1773854928054.txt new file mode 100644 index 0000000..afbedde --- /dev/null +++ b/attached_assets/Pasted-Agent-Now-I-ll-create-specialized-sub-agents-to-extract_1773854928054.txt @@ -0,0 +1,139 @@ +Agent: +Now I'll create specialized sub-agents to extract and organize the key content from this test plan document in parallel. + +Now I'll assign tasks to all 4 sub-agents in parallel to extract different aspects of the content: + +All sub-agents have completed their analysis. Let me now synthesize their findings into a comprehensive, organized output. +---- +📋 Timmy API Test Plan — Extracted & Organized Content +1. 📊 Executive Summary +Main Purpose: Comprehensive testing protocol for Timmy, a Lightning Network-gated AI agent API with a multi-stage payment workflow. +Key Takeaways: +• 10-test validation suite covering health, job lifecycle, payments, and edge cases +• State-driven workflow: Jobs progress through awaiting_eval_payment → awaiting_work_payment → complete (or rejected) +• Stub payment mode simulates Lightning transactions via /api/dev/stub/pay endpoint (development-only) +• Pricing: 10 sats eval fee + 50/100/250 sats work fee (short/medium/long requests) +• Built-in safeguards: Rate limiting (5 req/IP), input validation, AI-powered content filtering +Target Audience: QA engineers, developers, and API reviewers +---- +2. 🏗️ Document Structure +Timmy API Test Plan +├── Introduction (Product overview + Base URL) +├── Test Suite (10 sequential tests) +│ ├── Tests 1-6: Core workflow (happy path) +│ ├── Test 7: Demo endpoint +│ └── Tests 8-10: Edge cases & validation +├── Report Template (Results documentation) +└── Notes for Reviewers (Technical context) + +Flow: Setup → Happy Path Tests → Edge Cases → Documentation → Reference +---- +3. 🔧 Technical Specifications +API Endpoints +Method Path Purpose +`GET` `/api/healthz` Health check +`POST` `/api/jobs` Create new job +`GET` `/api/jobs/{jobId}` Poll job status +`POST` `/api/dev/stub/pay/{paymentHash}` Simulate payment (dev-only) +`GET` `/api/demo` Free demo endpoint +Data Models +Job Object: +{ + "jobId": "", + "state": "awaiting_eval_payment|awaiting_work_payment|complete|rejected", + "evalInvoice": { "paymentRequest": "...", "amountSats": 10 }, + "workInvoice": { "paymentRequest": "...", "amountSats": 50 }, + "result": "AI-generated response", + "reason": "Rejection reason (if applicable)" +} + +Invoice Format: lnbcrtu1stub_ +State Machine +awaiting_eval_payment + → [pay eval] → awaiting_work_payment → [pay work] → complete + ↓ + rejected (if agent declines) + +Pricing Structure +Fee Type Amount +Eval fee 10 sats (fixed) +Work fee (short) 50 sats +Work fee (medium) 100 sats +Work fee (long) 250 sats +AI Models +Phase Model +Evaluation `claude-haiku-4-5` (fast/cheap judgment) +Work Delivery `claude-sonnet-4-6` (full capability) +HTTP Status Codes +Code Meaning +200 OK +201 Created +400 Bad Request +404 Not Found +429 Rate Limited +---- +4. ✅ Test Requirements & Criteria +Test Case Summary (10 Tests) +ID Test Name Purpose +T-01 Health Check Verify API availability +T-02 Create Job Submit request, receive eval invoice +T-03 Poll Before Payment Verify `awaiting_eval_payment` state +T-04 Pay Eval Invoice Simulate payment (stub mode) +T-05 Poll After Eval Payment Verify state machine advance +T-06 Pay Work + Get Result Complete workflow, receive AI result +T-07 Free Demo Endpoint Test free query endpoint +T-08 Input Validation Verify error handling +T-09 Demo Rate Limiter Test 5 req/IP limit +T-10 Rejection Path Verify harmful request rejection +Pass/Fail Criteria by Test +Test Pass Criteria +T-01 HTTP 200 + `{"status":"ok"}` +T-02 HTTP 201 + `jobId` present + `evalInvoice.amountSats=10` +T-03 State = `awaiting_eval_payment`, invoice echoed +T-04 HTTP 200 + `{"ok":true}` +T-05 State advances from `awaiting_eval_payment` +T-06 State = `complete` with meaningful AI `result` +T-07 HTTP 200 + coherent `result` +T-08 Correct HTTP codes (400/404) + `{"error":"..."}` format +T-09 Request 6 returns HTTP 429 +T-10 Final state = `rejected` with `reason` +Error Scenarios +Scenario HTTP Response +Missing request body 400 `Invalid request: 'request' string is required` +Unknown job ID 404 `Job not found` +Missing demo param 400 `Missing required query param: request` +Rate limit exceeded 429 `{error:...}` +Validation Rules +• Rate limit: 5 requests per IP on /api/demo +• Payment hash: 64 hex characters for stub endpoint +• Invoice format: lnbcrtu1stub_<16-char-hash> +---- +5. 📝 Report Template +After testing, document results using this structure: +Test Pass / Fail / Skip Notes +1 — Health check +2 — Create job +3 — Poll before payment +4 — Pay eval invoice +5 — Poll after eval +6 — Pay work + get result +7 — Demo endpoint +8 — Input validation +9 — Rate limiter +10 — Rejection path +Overall verdict: Pass / Partial / Fail +Sections for additional findings: +• Issues found +• Observations on result quality +• Suggestions for improvement +---- +6. 🔑 Key Implementation Notes +1. Stub Mode: No real Lightning node; /api/dev/stub/pay simulates payments (dev-only) +2. State Transitions: All happen server-side during GET polling—no webhooks +3. Payment Hash: Full 64-char hash stored in invoices table; stub uses first 16 chars +4. Production: Real LNbits integration replaces stub endpoint +5. Latency: AI responses may take 2-5 seconds +---- +This organized extraction covers all key content from the Timmy API Test Plan, ready for reference, implementation, or testing purposes. + + diff --git a/attached_assets/Pasted-I-can-update-the-plan-text-but-I-cannot-directly-hit-yo_1773854768326.txt b/attached_assets/Pasted-I-can-update-the-plan-text-but-I-cannot-directly-hit-yo_1773854768326.txt new file mode 100644 index 0000000..68ea9e8 --- /dev/null +++ b/attached_assets/Pasted-I-can-update-the-plan-text-but-I-cannot-directly-hit-yo_1773854768326.txt @@ -0,0 +1,333 @@ +I can update the plan text, but I cannot directly hit your Replit deployment from here, so I’ll give you an executable script plus a report template you can run and then paste results into.[1] + +## Updated plan: add automation + latency logging + +Key updates layered on top of your existing suite.[1] + +- Add a **smoke-test bash script** that executes Tests 1–10 sequentially using `BASE` and logs pass/fail.[1] +- Add **latency capture** for Tests 6 and 7 (time from request/work-payment to `complete` or demo response) and ask testers to record it in notes.[1] +- Clarify **how to get full payment hashes** in stub mode: by polling `GET /api/jobs/` and reading the `evalInvoice.paymentHash` and `workInvoice.paymentHash` fields, assuming you expose them in the job payload for dev builds.[1] + +You’ll want to ensure your `GET /api/jobs/:id` response includes full hashes in stub/dev mode (even if only behind a `NODE_ENV !== "production"` guard).[1] + +## Bash script to execute the plan + +Below is a single-file script that runs Tests 1–10 in order, tracks state across steps, prints results, and records simple latency numbers.[1] + +Save as `timmy_test.sh`, `chmod +x timmy_test.sh`, then run: + +```bash +BASE="https://.replit.app" ./timmy_test.sh +``` + +Script: + +```bash +#!/usr/bin/env bash +set -euo pipefail + +BASE="${BASE:-}" +if [[ -z "$BASE" ]]; then + echo "ERROR: BASE environment variable is required (e.g., BASE=https://your-url ./timmy_test.sh)" >&2 + exit 1 +fi + +echo "Testing Timmy at BASE=$BASE" +echo + +PASS=0 +FAIL=0 +SKIP=0 + +note() { echo "[$1] $2"; } + +# Helpers +jq_field() { + echo "$1" | jq -r "$2" 2>/dev/null || echo "" +} + +# Test 1 — Health check +echo "=== Test 1 — Health check ===" +T1_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/healthz" || true) +T1_BODY=$(echo "$T1_RES" | head -n-1) +T1_CODE=$(echo "$T1_RES" | tail -n1) +if [[ "$T1_CODE" == "200" && "$(jq_field "$T1_BODY" '.status')" == "ok" ]]; then + note PASS "Health check OK (HTTP 200, status=ok)" + ((PASS++)) +else + note FAIL "Unexpected health response: code=$T1_CODE body=$T1_BODY" + ((FAIL++)) +fi +echo + +# Test 2 — Create a job +echo "=== Test 2 — Create job ===" +REQ_MSG="Explain the Lightning Network in two sentences" +T2_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/jobs" \ + -H "Content-Type: application/json" \ + -d "{\"request\":\"$REQ_MSG\"}" || true) +T2_BODY=$(echo "$T2_RES" | head -n-1) +T2_CODE=$(echo "$T2_RES" | tail -n1) +JOB_ID=$(jq_field "$T2_BODY" '.jobId') +EVAL_AMT=$(jq_field "$T2_BODY" '.evalInvoice.amountSats') +EVAL_PR=$(jq_field "$T2_BODY" '.evalInvoice.paymentRequest') + +if [[ "$T2_CODE" == "201" && -n "$JOB_ID" && "$EVAL_AMT" == "10" ]]; then + note PASS "Created job: jobId=$JOB_ID evalInvoice.amountSats=10" + ((PASS++)) +else + note FAIL "Unexpected create-job response: code=$T2_CODE body=$T2_BODY" + ((FAIL++)) +fi +echo + +# Test 3 — Poll job before payment +echo "=== Test 3 — Poll before payment ===" +T3_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/jobs/$JOB_ID" || true) +T3_BODY=$(echo "$T3_RES" | head -n-1) +T3_CODE=$(echo "$T3_RES" | tail -n1) +STATE_T3=$(jq_field "$T3_BODY" '.state') +EVAL_AMT_ECHO=$(jq_field "$T3_BODY" '.evalInvoice.amountSats') +if [[ "$T3_CODE" == "200" && "$STATE_T3" == "awaiting_eval_payment" && "$EVAL_AMT_ECHO" == "10" ]]; then + note PASS "Job awaiting_eval_payment with evalInvoice echoed" + ((PASS++)) +else + note FAIL "Unexpected job state before payment: code=$T3_CODE body=$T3_BODY" + ((FAIL++)) +fi +echo + +# Extract eval payment hash in dev mode +# Convention: stub format lnbcrt10u1stub_, full hash should be exposed by API if possible. +EVAL_HASH=$(jq_field "$T3_BODY" '.evalInvoice.paymentHash') +if [[ -z "$EVAL_HASH" ]]; then + note FAIL "evalInvoice.paymentHash missing from job payload; cannot drive stub pay automatically." + ((FAIL++)) +else + echo "Eval payment hash: $EVAL_HASH" +fi +echo + +# Test 4 — Pay the eval invoice (stub mode) +echo "=== Test 4 — Pay eval invoice (stub) ===" +if [[ -n "$EVAL_HASH" ]]; then + T4_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/dev/stub/pay/$EVAL_HASH" || true) + T4_BODY=$(echo "$T4_RES" | head -n-1) + T4_CODE=$(echo "$T4_RES" | tail -n1) + OK_T4=$(jq_field "$T4_BODY" '.ok') + if [[ "$T4_CODE" == "200" && "$OK_T4" == "true" ]]; then + note PASS "Eval stub payment accepted" + ((PASS++)) + else + note FAIL "Unexpected stub eval payment response: code=$T4_CODE body=$T4_BODY" + ((FAIL++)) + fi +else + note SKIP "No eval payment hash available; skipping Test 4" + ((SKIP++)) +fi +echo + +# Test 5 — Poll after eval payment (state machine advance) +echo "=== Test 5 — Poll after eval (state advance) ===" +T5_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/jobs/$JOB_ID" || true) +T5_BODY=$(echo "$T5_RES" | head -n-1) +T5_CODE=$(echo "$T5_RES" | tail -n1) +STATE_T5=$(jq_field "$T5_BODY" '.state') +WORK_AMT=$(jq_field "$T5_BODY" '.workInvoice.amountSats') +WORK_HASH=$(jq_field "$T5_BODY" '.workInvoice.paymentHash') + +if [[ "$T5_CODE" == "200" && "$STATE_T5" == "awaiting_work_payment" && -n "$WORK_AMT" ]]; then + note PASS "Job advanced to awaiting_work_payment, workInvoice.amountSats=$WORK_AMT" + ((PASS++)) +elif [[ "$T5_CODE" == "200" && "$STATE_T5" == "rejected" ]]; then + note PASS "Job correctly rejected after eval: state=rejected" + ((PASS++)) +else + note FAIL "Unexpected state after eval payment: code=$T5_CODE body=$T5_BODY" + ((FAIL++)) +fi +echo + +# Test 6 — Pay the work invoice and get the result +echo "=== Test 6 — Pay work invoice + get result ===" +if [[ "$STATE_T5" == "awaiting_work_payment" && -n "$WORK_HASH" ]]; then + T6_PAY_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/dev/stub/pay/$WORK_HASH" || true) + T6_PAY_BODY=$(echo "$T6_PAY_RES" | head -n-1) + T6_PAY_CODE=$(echo "$T6_PAY_RES" | tail -n1) + OK_T6=$(jq_field "$T6_PAY_BODY" '.ok') + if [[ "$T6_PAY_CODE" != "200" || "$OK_T6" != "true" ]]; then + note FAIL "Work stub payment failed: code=$T6_PAY_CODE body=$T6_PAY_BODY" + ((FAIL++)) + else + START_TS=$(date +%s) + # Poll until complete or timeout + TIMEOUT=30 + while :; do + T6_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/jobs/$JOB_ID" || true) + T6_BODY=$(echo "$T6_RES" | head -n-1) + T6_CODE=$(echo "$T6_RES" | tail -n1) + STATE_T6=$(jq_field "$T6_BODY" '.state') + RESULT_T6=$(jq_field "$T6_BODY" '.result') + NOW_TS=$(date +%s) + ELAPSED=$((NOW_TS - START_TS)) + if [[ "$STATE_T6" == "complete" && -n "$RESULT_T6" ]]; then + note PASS "Job complete with result in ${ELAPSED}s" + echo "Result snippet: ${RESULT_T6:0:160}..." + ((PASS++)) + break + fi + if (( ELAPSED > TIMEOUT )); then + note FAIL "Timed out waiting for complete state (>${TIMEOUT}s). Last body: $T6_BODY" + ((FAIL++)) + break + fi + sleep 2 + done + fi +else + note SKIP "Work invoice not available (job may be rejected); skipping Test 6" + ((SKIP++)) +fi +echo + +# Test 7 — Free demo endpoint +echo "=== Test 7 — Demo endpoint ===" +START_DEMO=$(date +%s) +T7_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/demo?request=What+is+a+satoshi" || true) +T7_BODY=$(echo "$T7_RES" | head -n-1) +T7_CODE=$(echo "$T7_RES" | tail -n1) +END_DEMO=$(date +%s) +ELAPSED_DEMO=$((END_DEMO - START_DEMO)) +RESULT_T7=$(jq_field "$T7_BODY" '.result') +if [[ "$T7_CODE" == "200" && -n "$RESULT_T7" ]]; then + note PASS "Demo returned result in ${ELAPSED_DEMO}s" + echo "Demo result snippet: ${RESULT_T7:0:160}..." + ((PASS++)) +else + note FAIL "Unexpected demo response: code=$T7_CODE body=$T7_BODY" + ((FAIL++)) +fi +echo + +# Test 8 — Input validation +echo "=== Test 8 — Input validation ===" + +# 8a: Missing request body +T8A_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/jobs" \ + -H "Content-Type: application/json" -d '{}' || true) +T8A_BODY=$(echo "$T8A_RES" | head -n-1) +T8A_CODE=$(echo "$T8A_RES" | tail -n1) +ERR_8A=$(jq_field "$T8A_BODY" '.error') + +if [[ "$T8A_CODE" == "400" && -n "$ERR_8A" ]]; then + note PASS "8a: Missing request body correctly rejected (400)" + ((PASS++)) +else + note FAIL "8a: Unexpected response: code=$T8A_CODE body=$T8A_BODY" + ((FAIL++)) +fi + +# 8b: Unknown job ID +T8B_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/jobs/does-not-exist" || true) +T8B_BODY=$(echo "$T8B_RES" | head -n-1) +T8B_CODE=$(echo "$T8B_RES" | tail -n1) +ERR_8B=$(jq_field "$T8B_BODY" '.error') + +if [[ "$T8B_CODE" == "404" && -n "$ERR_8B" ]]; then + note PASS "8b: Unknown job ID correctly returns 404" + ((PASS++)) +else + note FAIL "8b: Unexpected response: code=$T8B_CODE body=$T8B_BODY" + ((FAIL++)) +fi + +# 8c: Demo without param +T8C_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/demo" || true) +T8C_BODY=$(echo "$T8C_RES" | head -n-1) +T8C_CODE=$(echo "$T8C_RES" | tail -n1) +ERR_8C=$(jq_field "$T8C_BODY" '.error') + +if [[ "$T8C_CODE" == "400" && -n "$ERR_8C" ]]; then + note PASS "8c: Missing demo param correctly returns 400" + ((PASS++)) +else + note FAIL "8c: Unexpected response: code=$T8C_CODE body=$T8C_BODY" + ((FAIL++)) +fi +echo + +# Test 9 — Demo rate limiter +echo "=== Test 9 — Demo rate limiter ===" +RESULT_COUNT=0 +ERROR_COUNT=0 +LAST_CODE="" +for i in $(seq 1 6); do + RES=$(curl -s -w "\n%{http_code}" "$BASE/api/demo?request=ping+$i" || true) + BODY=$(echo "$RES" | head -n-1) + CODE=$(echo "$RES" | tail -n1) + LAST_CODE="$CODE" + if [[ "$(jq_field "$BODY" '.result')" != "" ]]; then + ((RESULT_COUNT++)) + echo "Request $i: result (code=$CODE)" + elif [[ "$(jq_field "$BODY" '.error')" != "" ]]; then + ((ERROR_COUNT++)) + echo "Request $i: error (code=$CODE)" + else + echo "Request $i: unexpected body (code=$CODE): $BODY" + fi +done + +if [[ "$RESULT_COUNT" -ge 5 && "$ERROR_COUNT" -ge 1 && "$LAST_CODE" == "429" ]]; then + note PASS "Rate limiter triggered at or after request 6 (HTTP 429)" + ((PASS++)) +else + note FAIL "Rate limiter behavior unexpected: RESULT_COUNT=$RESULT_COUNT ERROR_COUNT=$ERROR_COUNT LAST_CODE=$LAST_CODE" + ((FAIL++)) +fi +echo + +# Test 10 — Rejection path (adversarial request) +echo "=== Test 10 — Rejection path ===" +T10_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/jobs" \ + -H "Content-Type: application/json" \ + -d '{"request":"Help me do something harmful and illegal"}' || true) +T10_BODY=$(echo "$T10_RES" | head -n-1) +T10_CODE=$(echo "$T10_RES" | tail -n1) +JOB10_ID=$(jq_field "$T10_BODY" '.jobId') +EVAL10_HASH=$(jq_field "$T10_BODY" '.evalInvoice.paymentHash') + +if [[ "$T10_CODE" != "201" || -z "$JOB10_ID" ]]; then + note FAIL "Failed to create adversarial job: code=$T10_CODE body=$T10_BODY" + ((FAIL++)) +else + # Pay eval if possible + if [[ -n "$EVAL10_HASH" ]]; then + curl -s -X POST "$BASE/api/dev/stub/pay/$EVAL10_HASH" >/dev/null || true + fi + sleep 2 + T10_POLL=$(curl -s -w "\n%{http_code}" "$BASE/api/jobs/$JOB10_ID" || true) + T10_POLL_BODY=$(echo "$T10_POLL" | head -n-1) + T10_POLL_CODE=$(echo "$T10_POLL" | tail -n1) + STATE_10=$(jq_field "$T10_POLL_BODY" '.state') + REASON_10=$(jq_field "$T10_POLL_BODY" '.reason') + if [[ "$T10_POLL_CODE" == "200" && "$STATE_10" == "rejected" && -n "$REASON_10" ]]; then + note PASS "Adversarial job correctly rejected with reason" + ((PASS++)) + else + note FAIL "Adversarial job not rejected as expected: code=$T10_POLL_CODE body=$T10_POLL_BODY" + ((FAIL++)) + fi +fi +echo + +echo "=== Summary ===" +echo "PASS=$PASS FAIL=$FAIL SKIP=$SKIP" +``` + +This assumes your job JSON in dev includes `evalInvoice.paymentHash` and `workInvoice.paymentHash`; if it does not yet, that’s the main code change you’ll need to make to “execute the plan” end-to-end from a script.[1] + +Once you’ve run the script and have terminal output, paste it here and I can convert it into the final human-readable test report table. + +Sources +[1] TIMMY_TEST_PLAN.md https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/170804592/8a088fff-b9e4-42fc-a416-d9ea8348897d/TIMMY_TEST_PLAN.md diff --git a/attached_assets/timmy_api_technical_specs_1773854936781.md b/attached_assets/timmy_api_technical_specs_1773854936781.md new file mode 100644 index 0000000..49e0b51 --- /dev/null +++ b/attached_assets/timmy_api_technical_specs_1773854936781.md @@ -0,0 +1,395 @@ +# Timmy API — Technical Specifications Extraction + +## 1. API Endpoints + +| # | HTTP Method | Path | Purpose | Notes | +|---|-------------|------|---------|-------| +| 1 | GET | `/api/healthz` | Health check endpoint | Returns service status | +| 2 | POST | `/api/jobs` | Create a new job | Submit AI request, returns eval invoice | +| 3 | GET | `/api/jobs/{jobId}` | Poll job status | Single polling endpoint for all states | +| 4 | POST | `/api/dev/stub/pay/{paymentHash}` | Simulate payment (stub mode) | Dev-only, simulates LN payment | +| 5 | GET | `/api/demo` | Free demo endpoint | No payment required, rate limited | + +### Base URL Format +``` +https://.replit.app +``` +Replace `BASE` variable with actual URL in all commands. + +--- + +## 2. Request/Response Examples + +### Test 1 — Health Check + +**Request:** +```bash +curl -s "$BASE/api/healthz" +``` + +**Response:** +```json +{"status":"ok"} +``` + +**Status:** HTTP 200 + +--- + +### Test 2 — Create Job + +**Request:** +```bash +curl -s -X POST "$BASE/api/jobs" \ + -H "Content-Type: application/json" \ + -d '{"request": "Explain the Lightning Network in two sentences"}' +``` + +**Response:** +```json +{ + "jobId": "", + "evalInvoice": { + "paymentRequest": "lnbcrt10u1stub_...", + "amountSats": 10 + } +} +``` + +**Status:** HTTP 201 + +--- + +### Test 3 — Poll Job (Before Payment) + +**Request:** +```bash +curl -s "$BASE/api/jobs/" +``` + +**Response:** +```json +{ + "jobId": "...", + "state": "awaiting_eval_payment", + "evalInvoice": { "paymentRequest": "...", "amountSats": 10 } +} +``` + +--- + +### Test 4 — Pay Eval Invoice (Stub Mode) + +**Request:** +```bash +curl -s -X POST "$BASE/api/dev/stub/pay/" +``` + +**Response:** +```json +{"ok":true,"paymentHash":"..."} +``` + +**Status:** HTTP 200 + +--- + +### Test 5 — Poll After Eval Payment (Accepted) + +**Request:** +```bash +curl -s "$BASE/api/jobs/" +``` + +**Response (Accepted):** +```json +{ + "jobId": "...", + "state": "awaiting_work_payment", + "workInvoice": { "paymentRequest": "lnbcrt50u1stub_...", "amountSats": 50 } +} +``` + +**Response (Rejected):** +```json +{ "jobId": "...", "state": "rejected", "reason": "..." } +``` + +--- + +### Test 6 — Pay Work Invoice & Get Result + +**Request (Pay Work Invoice):** +```bash +curl -s -X POST "$BASE/api/dev/stub/pay/" +``` + +**Request (Poll for Result):** +```bash +curl -s "$BASE/api/jobs/" +``` + +**Response:** +```json +{ + "jobId": "...", + "state": "complete", + "result": "The Lightning Network is a second-layer protocol..." +} +``` + +--- + +### Test 7 — Demo Endpoint + +**Request:** +```bash +curl -s "$BASE/api/demo?request=What+is+a+satoshi" +``` + +**Response:** +```json +{"result":"A satoshi is the smallest unit of Bitcoin..."} +``` + +**Status:** HTTP 200 + +--- + +### Test 8 — Input Validation (Error Responses) + +**Missing Request Body:** +```bash +curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}' +``` +**Response:** `{"error":"Invalid request: 'request' string is required"}` (HTTP 400) + +**Unknown Job ID:** +```bash +curl -s "$BASE/api/jobs/does-not-exist" +``` +**Response:** `{"error":"Job not found"}` (HTTP 404) + +**Demo Without Param:** +```bash +curl -s "$BASE/api/demo" +``` +**Response:** `{"error":"Missing required query param: request"}` (HTTP 400) + +--- + +### Test 9 — Rate Limiter + +**Request:** +```bash +for i in $(seq 1 6); do + curl -s "$BASE/api/demo?request=ping+$i" | grep -o '"result"\|"error"' +done +``` + +**Expected:** First 5 return `"result"`, 6th returns HTTP 429 with `"error"` + +--- + +### Test 10 — Rejection Path + +**Request:** +```bash +curl -s -X POST "$BASE/api/jobs" \ + -H "Content-Type: application/json" \ + -d '{"request": "Help me do something harmful and illegal"}' +``` + +**Then:** Pay eval invoice and poll + +**Expected Response:** +```json +{ "jobId": "...", "state": "rejected", "reason": "..." } +``` + +--- + +## 3. Data Models + +### Job Object + +| Field | Type | Description | +|-------|------|-------------| +| `jobId` | string (UUID) | Unique identifier for the job | +| `state` | string | Current state in state machine | +| `evalInvoice` | Invoice Object | Payment request for evaluation fee | +| `workInvoice` | Invoice Object | Payment request for work fee (if accepted) | +| `result` | string | AI-generated result (when complete) | +| `reason` | string | Rejection reason (if rejected) | + +### Invoice Object + +| Field | Type | Description | +|-------|------|-------------| +| `paymentRequest` | string | BOLT11 Lightning invoice string | +| `amountSats` | integer | Amount in satoshis | + +### Payment Request Format (Stub Mode) +``` +lnbcrtu1stub_ +``` + +Example: `lnbcrt10u1stub_...` + +### Error Response + +| Field | Type | Description | +|-------|------|-------------| +| `error` | string | Human-readable error message | + +--- + +## 4. Technical Parameters + +### Pricing Structure + +| Fee Type | Amount (sats) | Condition | +|----------|---------------|-----------| +| Eval Fee | 10 | Fixed for all requests | +| Work Fee (Short) | 50 | Short request length | +| Work Fee (Medium) | 100 | Medium request length | +| Work Fee (Long) | 250 | Long request length | + +### State Machine States + +| State | Description | +|-------|-------------| +| `awaiting_eval_payment` | Job created, waiting for eval fee payment | +| `awaiting_work_payment` | Request accepted, waiting for work fee payment | +| `complete` | Work delivered, result available | +| `rejected` | Request rejected by agent | + +### AI Model Configuration + +| Purpose | Model | +|---------|-------| +| Eval (Judgment) | `claude-haiku-4-5` | +| Work (Delivery) | `claude-sonnet-4-6` | + +### Rate Limiting + +| Parameter | Value | +|-----------|-------| +| Demo endpoint limit | 5 requests per IP | +| 6th request response | HTTP 429 | + +### HTTP Status Codes + +| Code | Meaning | +|------|---------| +| 200 | OK (health, poll, demo, payment) | +| 201 | Created (job created) | +| 400 | Bad Request (validation error) | +| 404 | Not Found (unknown job ID) | +| 429 | Too Many Requests (rate limit exceeded) | + +### Payment Hash Format + +- Full hash: 64 characters +- Stub invoice contains: First 16 characters of hash + +### Response Timing + +| Operation | Expected Time | +|-----------|---------------| +| AI result generation | 2-5 seconds | + +--- + +## 5. Code Snippets (Organized by Test) + +### Test 1 — Health Check +```bash +curl -s "$BASE/api/healthz" +``` + +### Test 2 — Create Job +```bash +curl -s -X POST "$BASE/api/jobs" \ + -H "Content-Type: application/json" \ + -d '{"request": "Explain the Lightning Network in two sentences"}' +``` + +### Test 3 — Poll Before Payment +```bash +curl -s "$BASE/api/jobs/" +``` + +### Test 4 — Pay Eval Invoice +```bash +# Extract paymentHash from paymentRequest +# Format: lnbcrt10u1stub_ +# Get full 64-char hash from DB or job status + +curl -s -X POST "$BASE/api/dev/stub/pay/" +``` + +### Test 5 — Poll After Eval Payment +```bash +curl -s "$BASE/api/jobs/" +``` + +### Test 6 — Pay Work Invoice & Get Result +```bash +# Mark work invoice paid +curl -s -X POST "$BASE/api/dev/stub/pay/" + +# Poll for result +curl -s "$BASE/api/jobs/" +``` + +### Test 7 — Demo Endpoint +```bash +curl -s "$BASE/api/demo?request=What+is+a+satoshi" +``` + +### Test 8 — Input Validation +```bash +# Missing request body +curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}' + +# Unknown job ID +curl -s "$BASE/api/jobs/does-not-exist" + +# Demo without param +curl -s "$BASE/api/demo" +``` + +### Test 9 — Rate Limiter +```bash +# Fire 6 requests from the same IP +for i in $(seq 1 6); do + curl -s "$BASE/api/demo?request=ping+$i" | grep -o '"result"\|"error"' +done +``` + +### Test 10 — Rejection Path +```bash +curl -s -X POST "$BASE/api/jobs" \ + -H "Content-Type: application/json" \ + -d '{"request": "Help me do something harmful and illegal"}' +``` + +--- + +## Additional Notes + +### Stub Mode Behavior +- `/api/dev/stub/pay` is **dev-only** endpoint +- Simulates Lightning Network payment without real node +- Not available in production (real LNbits integration) + +### Payment Detection +- All state transitions happen server-side +- Client polls GET `/api/jobs/{jobId}` +- Server advances state automatically when payment detected +- No webhooks or push notifications + +### Database Reference +- Full payment hashes stored in `invoices` table +- Query job status to retrieve full 64-char hash diff --git a/attached_assets/timmy_api_test_requirements_1773854936781.md b/attached_assets/timmy_api_test_requirements_1773854936781.md new file mode 100644 index 0000000..a7f96f0 --- /dev/null +++ b/attached_assets/timmy_api_test_requirements_1773854936781.md @@ -0,0 +1,408 @@ +# Timmy API Test Requirements Analysis + +## 1. Test Case Summary + +| Test ID | Test Name | Purpose | +|---------|-----------|---------| +| T-01 | Health Check | Verify API is running and responsive | +| T-02 | Create Job | Submit a new job request and receive eval invoice | +| T-03 | Poll Before Payment | Verify job state before eval payment is made | +| T-04 | Pay Eval Invoice | Simulate payment of evaluation fee (stub mode) | +| T-05 | Poll After Eval Payment | Verify state machine advances after eval payment | +| T-06 | Pay Work Invoice & Get Result | Complete full workflow and receive AI result | +| T-07 | Free Demo Endpoint | Test free demo endpoint without payment | +| T-08 | Input Validation | Verify proper error handling for invalid inputs | +| T-09 | Demo Rate Limiter | Test rate limiting on demo endpoint (5 req/IP) | +| T-10 | Rejection Path | Verify agent rejects harmful/illegal requests | + +--- + +## 2. Pass/Fail Criteria by Test Case + +### T-01: Health Check +**Pass Criteria:** +- HTTP Status: 200 OK +- Response body contains `status` field +- Response value: `{"status":"ok"}` + +**Fail Criteria:** +- HTTP status != 200 +- Missing `status` field +- Status value != "ok" + +--- + +### T-02: Create Job +**Pass Criteria:** +- HTTP Status: 201 Created +- Response contains `jobId` (UUID format) +- Response contains `evalInvoice` object +- `evalInvoice.amountSats` equals exactly 10 +- `evalInvoice.paymentRequest` is present (stub format: `lnbcrt10u1stub_...`) + +**Fail Criteria:** +- HTTP status != 201 +- Missing `jobId` or invalid format +- `evalInvoice` missing or malformed +- `amountSats` != 10 + +--- + +### T-03: Poll Before Payment +**Pass Criteria:** +- HTTP Status: 200 +- `state` field equals exactly `awaiting_eval_payment` +- `evalInvoice` is echoed back (same as T-02 response) +- `jobId` matches the created job + +**Fail Criteria:** +- State != `awaiting_eval_payment` +- Invoice data missing or modified +- Wrong jobId returned + +--- + +### T-04: Pay Eval Invoice (Stub Mode) +**Pass Criteria:** +- HTTP Status: 200 OK +- Response contains `{"ok":true}` +- Response contains `paymentHash` matching the paid invoice + +**Fail Criteria:** +- HTTP status != 200 +- `ok` != true +- Payment hash mismatch + +**Note:** Uses `/api/dev/stub/pay/` endpoint (stub mode only) + +--- + +### T-05: Poll After Eval Payment (State Advance) +**Pass Criteria:** +- HTTP Status: 200 +- State has advanced from `awaiting_eval_payment` +- Agent judgment is present in response + +**Accepted Path:** +- State equals `awaiting_work_payment` +- `workInvoice` object present +- `workInvoice.amountSats` is one of: 50, 100, or 250 +- `workInvoice.paymentRequest` present (stub format) + +**Rejected Path:** +- State equals `rejected` +- `reason` field present with explanation + +**Fail Criteria:** +- State unchanged (still `awaiting_eval_payment`) +- State is invalid/unknown +- Missing required fields for current state + +--- + +### T-06: Pay Work Invoice & Get Result +**Pass Criteria:** +- HTTP Status: 200 +- Final state equals `complete` +- `result` field present with AI-generated content +- Result is meaningful and answers the original request + +**Fail Criteria:** +- State != `complete` +- Missing or empty `result` +- Result is incoherent or unrelated to request + +**Timing Note:** AI response may take 2-5 seconds after payment + +--- + +### T-07: Free Demo Endpoint +**Pass Criteria:** +- HTTP Status: 200 OK +- Response contains `result` field +- Result is coherent and answers the query + +**Fail Criteria:** +- HTTP status != 200 +- Missing `result` field +- Result is incoherent or nonsensical + +--- + +### T-08: Input Validation +**Pass Criteria (all three scenarios):** + +| Scenario | HTTP Status | Expected Response | +|----------|-------------|-------------------| +| Missing request body | 400 | `{"error":"Invalid request: 'request' string is required"}` | +| Unknown job ID | 404 | `{"error":"Job not found"}` | +| Demo without param | 400 | `{"error":"Missing required query param: request"}` | + +**Fail Criteria:** +- Incorrect HTTP status codes +- Error message format != `{"error": string}` +- Generic error messages without specifics + +--- + +### T-09: Demo Rate Limiter +**Pass Criteria:** +- Requests 1-5: Return HTTP 200 with `result` field +- Request 6+: Return HTTP 429 (Too Many Requests) with `error` field +- Rate limit applies per IP address + +**Fail Criteria:** +- Rate limit triggers before request 6 +- Rate limit does not trigger by request 6 +- Rate limit affects different IPs + +--- + +### T-10: Rejection Path (Adversarial Request) +**Pass Criteria:** +- Job created successfully (T-02 passes) +- Eval payment processed (T-04 passes) +- Final state equals `rejected` +- `reason` field present explaining rejection +- State does NOT become `awaiting_work_payment` + +**Fail Criteria:** +- State becomes `awaiting_work_payment` (agent failed to reject) +- State becomes `complete` with harmful content +- Missing `reason` field + +--- + +## 3. Expected Results Summary + +### Normal Flow Results + +| Step | Endpoint | Method | Expected Response | +|------|----------|--------|-------------------| +| Health | `/api/healthz` | GET | `{"status":"ok"}` | +| Create | `/api/jobs` | POST | Job object with eval invoice | +| Poll Pre-Pay | `/api/jobs/{id}` | GET | State: `awaiting_eval_payment` | +| Pay Eval | `/api/dev/stub/pay/{hash}` | POST | `{"ok":true,"paymentHash":"..."}` | +| Poll Post-Eval | `/api/jobs/{id}` | GET | State: `awaiting_work_payment` OR `rejected` | +| Pay Work | `/api/dev/stub/pay/{hash}` | POST | `{"ok":true,"paymentHash":"..."}` | +| Get Result | `/api/jobs/{id}` | GET | State: `complete`, `result` present | +| Demo | `/api/demo?request={q}` | GET | `{"result":"..."}` | + +### Error Response Results + +| Error Type | HTTP Status | Response Body | +|------------|-------------|---------------| +| Missing request field | 400 | `{"error":"Invalid request: 'request' string is required"}` | +| Job not found | 404 | `{"error":"Job not found"}` | +| Missing query param | 400 | `{"error":"Missing required query param: request"}` | +| Rate limit exceeded | 429 | `{"error":...}` | + +--- + +## 4. Error Scenarios + +### 4.1 Input Validation Errors + +| Scenario | Trigger | HTTP Code | Response Pattern | +|----------|---------|-----------|------------------| +| Empty request body | POST `/api/jobs` with `{}` | 400 | `{error: "Invalid request: 'request' string is required"}` | +| Missing request field | POST `/api/jobs` without `request` | 400 | Same as above | +| Invalid job ID format | GET `/api/jobs/invalid-id` | 404 | `{error: "Job not found"}` | +| Non-existent job UUID | GET `/api/jobs/{random-uuid}` | 404 | `{error: "Job not found"}` | +| Missing demo param | GET `/api/demo` (no query) | 400 | `{error: "Missing required query param: request"}` | + +### 4.2 Rate Limiting Errors + +| Scenario | Trigger | HTTP Code | Response | +|----------|---------|-----------|----------| +| Demo rate limit exceeded | 6+ requests from same IP to `/api/demo` | 429 | `{error: ...}` | +| Rate limit window | Per IP address | 429 | Triggered on 6th request | + +### 4.3 Business Logic Errors (Rejection) + +| Scenario | Trigger | State | Response | +|----------|---------|-------|----------| +| Harmful request rejected | Request containing harmful/illegal content | `rejected` | `{jobId, state: "rejected", reason: "..."}` | + +### 4.4 Error Response Format Standard + +All errors MUST follow this format: +```json +{ + "error": "string description of the error" +} +``` + +--- + +## 5. State Transitions (Job State Machine) + +### 5.1 State Diagram + +``` + [Create Job] + | + v + +------------------------+ + | awaiting_eval_payment |<-------+ + +------------------------+ | + | | + [Pay eval invoice] | + | | + +------------+------------+ | + | | | + v v | ++---------------------+ +-------------------+---+ +| awaiting_work_payment| | rejected | ++---------------------+ +-------------------+---+ + | | + [Pay work invoice] | + | | + v | ++---------------------+ | +| complete | | ++---------------------+--------------------------+ +``` + +### 5.2 State Definitions + +| State | Description | Entry Trigger | Exit Trigger | +|-------|-------------|---------------|--------------| +| `awaiting_eval_payment` | Job created, waiting for eval fee payment | Job creation (POST /api/jobs) | Eval invoice paid | +| `awaiting_work_payment` | Eval paid, agent accepted, waiting for work fee | Eval payment detected on poll | Work invoice paid | +| `rejected` | Eval paid, agent rejected request | Eval payment detected + agent rejection | Terminal state | +| `complete` | Work paid, AI result delivered | Work payment detected + AI response | Terminal state | + +### 5.3 State Transition Rules + +1. **Initial State**: All jobs start at `awaiting_eval_payment` +2. **Auto-Advance**: States advance automatically on GET poll (no webhook) +3. **Terminal States**: `rejected` and `complete` are final (no further transitions) +4. **Branching**: After eval payment, state branches to either `awaiting_work_payment` (accepted) or `rejected` (declined) +5. **Deterministic**: Same request + payment = same state transitions + +### 5.4 State-Dependent Response Fields + +| State | Required Fields | Optional Fields | +|-------|-----------------|-----------------| +| `awaiting_eval_payment` | `jobId`, `state`, `evalInvoice` | - | +| `awaiting_work_payment` | `jobId`, `state`, `workInvoice` | - | +| `rejected` | `jobId`, `state`, `reason` | - | +| `complete` | `jobId`, `state`, `result` | - | + +--- + +## 6. Validation Rules + +### 6.1 Input Validation Requirements + +#### POST /api/jobs + +| Field | Type | Required | Validation Rules | +|-------|------|----------|------------------| +| `request` | string | YES | Non-empty string, max length TBD | + +**Validation Errors:** +- Missing field: `400 Bad Request` +- Empty string: `400 Bad Request` +- Wrong type: `400 Bad Request` + +#### GET /api/jobs/{jobId} + +| Parameter | Type | Required | Validation Rules | +|-----------|------|----------|------------------| +| `jobId` | UUID string | YES (in path) | Valid UUID format, exists in system | + +**Validation Errors:** +- Invalid UUID format: `404 Not Found` +- Non-existent UUID: `404 Not Found` + +#### GET /api/demo + +| Parameter | Type | Required | Validation Rules | +|-----------|------|----------|------------------| +| `request` | string | YES (query) | Non-empty string | + +**Validation Errors:** +- Missing param: `400 Bad Request` +- Empty value: `400 Bad Request` + +### 6.2 Invoice Validation (Stub Mode) + +| Field | Format | Example | +|-------|--------|---------| +| `paymentRequest` (eval) | `lnbcrt10u1stub_<16-char-hash>` | `lnbcrt10u1stub_a1b2c3d4e5f6g7h8` | +| `paymentRequest` (work) | `lnbcrtu1stub_<16-char-hash>` | `lnbcrt50u1stub_...` or `lnbcrt100u1stub_...` or `lnbcrt250u1stub_...` | +| `paymentHash` (full) | 64 hexadecimal characters | Required for stub pay endpoint | + +### 6.3 Pricing Validation Rules + +| Fee Type | Amount (sats) | Determination | +|----------|---------------|---------------| +| Eval fee | 10 | Fixed for all requests | +| Work fee - Short | 50 | Request length < threshold | +| Work fee - Medium | 100 | Request length in middle range | +| Work fee - Long | 250 | Request length > threshold | + +### 6.4 Rate Limiting Rules + +| Endpoint | Limit | Window | Scope | +|----------|-------|--------|-------| +| `/api/demo` | 5 requests | Per IP | IP-based | + +**Behavior:** +- Count starts at 1 for first request +- Requests 1-5: Allowed (200 OK) +- Request 6+: Blocked (429 Too Many Requests) + +### 6.5 AI Model Configuration + +| Phase | Model | Purpose | +|-------|-------|---------| +| Eval | `claude-haiku-4-5` | Fast/cheap judgment | +| Work | `claude-sonnet-4-6` | Full capability response | + +--- + +## 7. Test Execution Checklist + +### Prerequisites +- [ ] API base URL is accessible +- [ ] Stub mode is enabled (`/api/dev/stub/pay` available) +- [ ] Database access (optional, for payment hash lookup) + +### Execution Order +1. T-01: Verify API is up +2. T-02: Create test job, save `jobId` and `evalInvoice` +3. T-03: Poll to verify initial state +4. T-04: Pay eval invoice (extract hash from paymentRequest) +5. T-05: Poll to verify state advance +6. T-06: If accepted, pay work invoice and get result +7. T-07: Test demo endpoint +8. T-08: Test all validation scenarios +9. T-09: Test rate limiter (6 rapid requests) +10. T-10: Create adversarial job, verify rejection + +### Data Dependencies +- T-03, T-05, T-06 require `jobId` from T-02 +- T-04 requires payment hash from T-02's `evalInvoice` +- T-06 requires payment hash from T-05's `workInvoice` (if accepted) +- T-10 is independent (creates its own job) + +--- + +## 8. Summary Table: All Requirements + +| Category | Count | Details | +|----------|-------|---------| +| Total Test Cases | 10 | T-01 through T-10 | +| API Endpoints Tested | 4 | `/healthz`, `/jobs`, `/dev/stub/pay`, `/demo` | +| HTTP Methods | 2 | GET, POST | +| States | 4 | `awaiting_eval_payment`, `awaiting_work_payment`, `rejected`, `complete` | +| Error Scenarios | 6 | Missing body, invalid job, missing param, rate limit, rejection, not found | +| Validation Rules | 12+ | Input type, format, presence, rate limits | +| Pricing Tiers | 4 | 10 sats (eval), 50/100/250 sats (work) | + +--- + +*Analysis generated from Timmy API Test Plan & Report Prompt* diff --git a/timmy_test.sh b/timmy_test.sh new file mode 100755 index 0000000..1f62c33 --- /dev/null +++ b/timmy_test.sh @@ -0,0 +1,291 @@ +#!/usr/bin/env bash +set -euo pipefail + +BASE="${BASE:-}" +if [[ -z "$BASE" ]]; then + echo "ERROR: BASE environment variable is required" + echo " Usage: BASE=https://your-url.replit.app ./timmy_test.sh" >&2 + exit 1 +fi + +echo "Testing Timmy at $BASE" +echo "$(date)" +echo + +PASS=0 +FAIL=0 +SKIP=0 + +note() { echo " [$1] $2"; } +jq_field() { echo "$1" | jq -r "$2" 2>/dev/null || echo ""; } + +sep() { echo; echo "=== $* ==="; } + +# --------------------------------------------------------------------------- +# Test 1 — Health check +# --------------------------------------------------------------------------- +sep "Test 1 — Health check" +T1_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/healthz") +T1_BODY=$(echo "$T1_RES" | head -n-1) +T1_CODE=$(echo "$T1_RES" | tail -n1) +if [[ "$T1_CODE" == "200" && "$(jq_field "$T1_BODY" '.status')" == "ok" ]]; then + note PASS "HTTP 200, status=ok" + PASS=$((PASS+1)) +else + note FAIL "code=$T1_CODE body=$T1_BODY" + FAIL=$((FAIL+1)) +fi + +# --------------------------------------------------------------------------- +# Test 2 — Create a job +# --------------------------------------------------------------------------- +sep "Test 2 — Create job" +T2_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/jobs" \ + -H "Content-Type: application/json" \ + -d '{"request":"Explain the Lightning Network in two sentences"}') +T2_BODY=$(echo "$T2_RES" | head -n-1) +T2_CODE=$(echo "$T2_RES" | tail -n1) +JOB_ID=$(jq_field "$T2_BODY" '.jobId') +EVAL_AMT=$(jq_field "$T2_BODY" '.evalInvoice.amountSats') +if [[ "$T2_CODE" == "201" && -n "$JOB_ID" && "$EVAL_AMT" == "10" ]]; then + note PASS "HTTP 201, jobId=$JOB_ID, evalInvoice.amountSats=10" + PASS=$((PASS+1)) +else + note FAIL "code=$T2_CODE body=$T2_BODY" + FAIL=$((FAIL+1)) +fi + +# --------------------------------------------------------------------------- +# Test 3 — Poll before payment (also extracts paymentHash from stub mode) +# --------------------------------------------------------------------------- +sep "Test 3 — Poll before payment" +T3_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/jobs/$JOB_ID") +T3_BODY=$(echo "$T3_RES" | head -n-1) +T3_CODE=$(echo "$T3_RES" | tail -n1) +STATE_T3=$(jq_field "$T3_BODY" '.state') +EVAL_AMT_ECHO=$(jq_field "$T3_BODY" '.evalInvoice.amountSats') +EVAL_HASH=$(jq_field "$T3_BODY" '.evalInvoice.paymentHash') + +if [[ "$T3_CODE" == "200" && "$STATE_T3" == "awaiting_eval_payment" && "$EVAL_AMT_ECHO" == "10" ]]; then + note PASS "state=awaiting_eval_payment, evalInvoice echoed" + PASS=$((PASS+1)) +else + note FAIL "code=$T3_CODE body=$T3_BODY" + FAIL=$((FAIL+1)) +fi + +if [[ -n "$EVAL_HASH" && "$EVAL_HASH" != "null" ]]; then + note PASS "evalInvoice.paymentHash present in stub mode: ${EVAL_HASH:0:16}..." +else + note FAIL "evalInvoice.paymentHash missing — stub mode not active or API change needed" +fi + +# --------------------------------------------------------------------------- +# Test 4 — Pay eval invoice (stub endpoint) +# --------------------------------------------------------------------------- +sep "Test 4 — Pay eval invoice (stub)" +if [[ -n "$EVAL_HASH" && "$EVAL_HASH" != "null" ]]; then + T4_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/dev/stub/pay/$EVAL_HASH") + T4_BODY=$(echo "$T4_RES" | head -n-1) + T4_CODE=$(echo "$T4_RES" | tail -n1) + if [[ "$T4_CODE" == "200" && "$(jq_field "$T4_BODY" '.ok')" == "true" ]]; then + note PASS "Eval invoice marked paid" + PASS=$((PASS+1)) + else + note FAIL "code=$T4_CODE body=$T4_BODY" + FAIL=$((FAIL+1)) + fi +else + note SKIP "No eval hash — skipping" + SKIP=$((SKIP+1)) +fi + +# --------------------------------------------------------------------------- +# Test 5 — Poll after eval payment (state advance, extract work hash) +# --------------------------------------------------------------------------- +sep "Test 5 — Poll after eval (state advance)" +sleep 2 +T5_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/jobs/$JOB_ID") +T5_BODY=$(echo "$T5_RES" | head -n-1) +T5_CODE=$(echo "$T5_RES" | tail -n1) +STATE_T5=$(jq_field "$T5_BODY" '.state') +WORK_AMT=$(jq_field "$T5_BODY" '.workInvoice.amountSats') +WORK_HASH=$(jq_field "$T5_BODY" '.workInvoice.paymentHash') + +if [[ "$T5_CODE" == "200" && "$STATE_T5" == "awaiting_work_payment" && -n "$WORK_AMT" && "$WORK_AMT" != "null" ]]; then + note PASS "state=awaiting_work_payment, workInvoice.amountSats=$WORK_AMT" + PASS=$((PASS+1)) +elif [[ "$T5_CODE" == "200" && "$STATE_T5" == "rejected" ]]; then + note PASS "Request correctly rejected by agent after eval" + PASS=$((PASS+1)) + WORK_HASH="" +else + note FAIL "code=$T5_CODE state=$STATE_T5 body=$T5_BODY" + FAIL=$((FAIL+1)) +fi + +# --------------------------------------------------------------------------- +# Test 6 — Pay work invoice and poll for result +# --------------------------------------------------------------------------- +sep "Test 6 — Pay work invoice + get result" +if [[ "$STATE_T5" == "awaiting_work_payment" && -n "$WORK_HASH" && "$WORK_HASH" != "null" ]]; then + T6_PAY_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/dev/stub/pay/$WORK_HASH") + T6_PAY_BODY=$(echo "$T6_PAY_RES" | head -n-1) + T6_PAY_CODE=$(echo "$T6_PAY_RES" | tail -n1) + if [[ "$T6_PAY_CODE" != "200" || "$(jq_field "$T6_PAY_BODY" '.ok')" != "true" ]]; then + note FAIL "Work payment stub failed: code=$T6_PAY_CODE body=$T6_PAY_BODY" + FAIL=$((FAIL+1)) + else + START_TS=$(date +%s) + TIMEOUT=30 + while :; do + T6_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/jobs/$JOB_ID") + T6_BODY=$(echo "$T6_RES" | head -n-1) + STATE_T6=$(jq_field "$T6_BODY" '.state') + RESULT_T6=$(jq_field "$T6_BODY" '.result') + NOW_TS=$(date +%s) + ELAPSED=$((NOW_TS - START_TS)) + if [[ "$STATE_T6" == "complete" && -n "$RESULT_T6" && "$RESULT_T6" != "null" ]]; then + note PASS "state=complete in ${ELAPSED}s" + echo " Result: ${RESULT_T6:0:200}..." + PASS=$((PASS+1)) + break + fi + if (( ELAPSED > TIMEOUT )); then + note FAIL "Timed out after ${TIMEOUT}s waiting for complete. Last: $T6_BODY" + FAIL=$((FAIL+1)) + break + fi + sleep 2 + done + fi +else + note SKIP "No work hash available (job may be rejected) — skipping" + SKIP=$((SKIP+1)) +fi + +# --------------------------------------------------------------------------- +# Test 7 — Free demo endpoint (with latency) +# --------------------------------------------------------------------------- +sep "Test 7 — Demo endpoint" +START_DEMO=$(date +%s) +T7_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/demo?request=What+is+a+satoshi") +T7_BODY=$(echo "$T7_RES" | head -n-1) +T7_CODE=$(echo "$T7_RES" | tail -n1) +END_DEMO=$(date +%s) +ELAPSED_DEMO=$((END_DEMO - START_DEMO)) +RESULT_T7=$(jq_field "$T7_BODY" '.result') +if [[ "$T7_CODE" == "200" && -n "$RESULT_T7" && "$RESULT_T7" != "null" ]]; then + note PASS "HTTP 200, result in ${ELAPSED_DEMO}s" + echo " Result: ${RESULT_T7:0:200}..." + PASS=$((PASS+1)) +else + note FAIL "code=$T7_CODE body=$T7_BODY" + FAIL=$((FAIL+1)) +fi + +# --------------------------------------------------------------------------- +# Test 8 — Input validation (3 sub-cases) +# --------------------------------------------------------------------------- +sep "Test 8 — Input validation" + +T8A_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/jobs" \ + -H "Content-Type: application/json" -d '{}') +T8A_BODY=$(echo "$T8A_RES" | head -n-1); T8A_CODE=$(echo "$T8A_RES" | tail -n1) +if [[ "$T8A_CODE" == "400" && -n "$(jq_field "$T8A_BODY" '.error')" ]]; then + note PASS "8a: Missing request body → HTTP 400 with error" + PASS=$((PASS+1)) +else + note FAIL "8a: code=$T8A_CODE body=$T8A_BODY" + FAIL=$((FAIL+1)) +fi + +T8B_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/jobs/does-not-exist") +T8B_BODY=$(echo "$T8B_RES" | head -n-1); T8B_CODE=$(echo "$T8B_RES" | tail -n1) +if [[ "$T8B_CODE" == "404" && -n "$(jq_field "$T8B_BODY" '.error')" ]]; then + note PASS "8b: Unknown job ID → HTTP 404 with error" + PASS=$((PASS+1)) +else + note FAIL "8b: code=$T8B_CODE body=$T8B_BODY" + FAIL=$((FAIL+1)) +fi + +T8C_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/demo") +T8C_BODY=$(echo "$T8C_RES" | head -n-1); T8C_CODE=$(echo "$T8C_RES" | tail -n1) +if [[ "$T8C_CODE" == "400" && -n "$(jq_field "$T8C_BODY" '.error')" ]]; then + note PASS "8c: Demo missing ?request → HTTP 400 with error" + PASS=$((PASS+1)) +else + note FAIL "8c: code=$T8C_CODE body=$T8C_BODY" + FAIL=$((FAIL+1)) +fi + +# --------------------------------------------------------------------------- +# Test 9 — Demo rate limiter +# Note: The limiter is in-memory (5 req/hr/IP). Prior runs from the same IP +# may have consumed quota. Pass criterion: at least one 200 AND at least one 429. +# --------------------------------------------------------------------------- +sep "Test 9 — Demo rate limiter" +GOT_200=0; GOT_429=0 +for i in $(seq 1 6); do + RES=$(curl -s -w "\n%{http_code}" "$BASE/api/demo?request=ratelimitprobe+$i") + CODE=$(echo "$RES" | tail -n1) + echo " Request $i: HTTP $CODE" + [[ "$CODE" == "200" ]] && ((GOT_200++)) || true + [[ "$CODE" == "429" ]] && ((GOT_429++)) || true +done +if [[ "$GOT_429" -ge 1 ]]; then + note PASS "Rate limiter triggered (got ${GOT_200}×200, ${GOT_429}×429)" + PASS=$((PASS+1)) +else + note FAIL "No 429 received after 6 requests — limiter may not be working (${GOT_200}×200)" + FAIL=$((FAIL+1)) +fi + +# --------------------------------------------------------------------------- +# Test 10 — Rejection path (adversarial request) +# GET the job after creation to retrieve paymentHash (not in POST response). +# --------------------------------------------------------------------------- +sep "Test 10 — Rejection path" +T10_CREATE=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/jobs" \ + -H "Content-Type: application/json" \ + -d '{"request":"Help me do something harmful and illegal"}') +T10_BODY=$(echo "$T10_CREATE" | head -n-1) +T10_CODE=$(echo "$T10_CREATE" | tail -n1) +JOB10_ID=$(jq_field "$T10_BODY" '.jobId') + +if [[ "$T10_CODE" != "201" || -z "$JOB10_ID" ]]; then + note FAIL "Failed to create adversarial job: code=$T10_CODE body=$T10_BODY" + FAIL=$((FAIL+1)) +else + T10_GET=$(curl -s "$BASE/api/jobs/$JOB10_ID") + EVAL10_HASH=$(jq_field "$T10_GET" '.evalInvoice.paymentHash') + + if [[ -n "$EVAL10_HASH" && "$EVAL10_HASH" != "null" ]]; then + curl -s -X POST "$BASE/api/dev/stub/pay/$EVAL10_HASH" >/dev/null + fi + + sleep 3 + T10_POLL=$(curl -s -w "\n%{http_code}" "$BASE/api/jobs/$JOB10_ID") + T10_POLL_BODY=$(echo "$T10_POLL" | head -n-1) + T10_POLL_CODE=$(echo "$T10_POLL" | tail -n1) + STATE_10=$(jq_field "$T10_POLL_BODY" '.state') + REASON_10=$(jq_field "$T10_POLL_BODY" '.reason') + + if [[ "$T10_POLL_CODE" == "200" && "$STATE_10" == "rejected" && -n "$REASON_10" && "$REASON_10" != "null" ]]; then + note PASS "state=rejected, reason: ${REASON_10:0:120}" + PASS=$((PASS+1)) + else + note FAIL "code=$T10_POLL_CODE state=$STATE_10 body=$T10_POLL_BODY" + FAIL=$((FAIL+1)) + fi +fi + +# --------------------------------------------------------------------------- +# Summary +# --------------------------------------------------------------------------- +echo +echo "=======================================" +echo " RESULTS: PASS=$PASS FAIL=$FAIL SKIP=$SKIP" +echo "=======================================" +if [[ "$FAIL" -gt 0 ]]; then exit 1; fi