From d24cc6fbe52fa16f3f65e680221d39e332e55343 Mon Sep 17 00:00:00 2001 From: alexpaynex <55271826-alexpaynex@users.noreply.replit.com> Date: Wed, 18 Mar 2026 17:24:32 +0000 Subject: [PATCH] Add comprehensive test plan for evaluating the AI agent's API functionality Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata. Replit-Commit-Author: Agent Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e Replit-Commit-Checkpoint-Type: full_checkpoint Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18 Replit-Helium-Checkpoint-Created: true --- .agents/agent_assets_metadata.toml | 6 + TIMMY_TEST_PLAN.md | 234 +++++++++++++++++++++++++++++ 2 files changed, 240 insertions(+) create mode 100644 TIMMY_TEST_PLAN.md diff --git a/.agents/agent_assets_metadata.toml b/.agents/agent_assets_metadata.toml index 335fba0..8af2963 100644 --- a/.agents/agent_assets_metadata.toml +++ b/.agents/agent_assets_metadata.toml @@ -18,3 +18,9 @@ id = "kKxuSyyRY-u1YfyyFoH_y" uri = "file://implementation-guide-taproot-assets-l402-fastapi.md" type = "text" title = "Implementation Guide: Taproot Assets + L402" + +[[outputs]] +id = "EbCNNTKk5hsAYWFlW0Lxz" +uri = "file://TIMMY_TEST_PLAN.md" +type = "text" +title = "Timmy API — Test Plan & Report Prompt" diff --git a/TIMMY_TEST_PLAN.md b/TIMMY_TEST_PLAN.md new file mode 100644 index 0000000..886d7dc --- /dev/null +++ b/TIMMY_TEST_PLAN.md @@ -0,0 +1,234 @@ +# Timmy API — Test Plan & Report Prompt + +**What is Timmy?** +Timmy is a Lightning Network-gated AI agent API. Users submit a request, pay a small eval fee (simulated via stub invoices in this build), the agent judges whether to accept the job, quotes a work price, the user pays, and Timmy delivers the result. All state advances automatically via polling a single GET endpoint. + +**Base URL:** `https://.replit.app` +Replace `BASE` in all commands below with the actual URL. + +--- + +## Test Suite + +### Test 1 — Health check + +```bash +curl -s "$BASE/api/healthz" +``` + +**Expected:** `{"status":"ok"}` +**Pass criteria:** HTTP 200, status field present. + +--- + +### Test 2 — Create a job + +```bash +curl -s -X POST "$BASE/api/jobs" \ + -H "Content-Type: application/json" \ + -d '{"request": "Explain the Lightning Network in two sentences"}' +``` + +**Expected:** +```json +{ + "jobId": "", + "evalInvoice": { + "paymentRequest": "lnbcrt10u1stub_...", + "amountSats": 10 + } +} +``` +**Pass criteria:** HTTP 201, `jobId` present, `evalInvoice.amountSats` = 10. + +--- + +### Test 3 — Poll job before payment + +```bash +curl -s "$BASE/api/jobs/" +``` + +**Expected:** +```json +{ + "jobId": "...", + "state": "awaiting_eval_payment", + "evalInvoice": { "paymentRequest": "...", "amountSats": 10 } +} +``` +**Pass criteria:** State is `awaiting_eval_payment`, invoice is echoed back. + +--- + +### Test 4 — Pay the eval invoice (stub mode) + +Extract `paymentHash` from the `paymentRequest`. The stub format is: +`lnbcrt10u1stub_` + +```bash +# Replace with the 64-char hash (query from your DB +# or use the /dev/stub/pay endpoint with the full hash). +# In stub mode: POST to the dev trigger endpoint with the full hash. + +curl -s -X POST "$BASE/api/dev/stub/pay/" +``` + +**Expected:** `{"ok":true,"paymentHash":"..."}` +**Pass criteria:** HTTP 200. + +> Note: `/api/dev/stub/pay` is only available in the current build (stub mode, no real Lightning node). It simulates a user paying the invoice. In production with real LNbits credentials it is not mounted. + +--- + +### Test 5 — Poll after eval payment (state machine advance) + +```bash +curl -s "$BASE/api/jobs/" +``` + +**Expected — if request was accepted:** +```json +{ + "jobId": "...", + "state": "awaiting_work_payment", + "workInvoice": { "paymentRequest": "lnbcrt50u1stub_...", "amountSats": 50 } +} +``` +Work fee is deterministic: 50 sats (short request), 100 sats (medium), 250 sats (long). + +**Expected — if request was rejected:** +```json +{ "jobId": "...", "state": "rejected", "reason": "..." } +``` + +**Pass criteria:** State has advanced from `awaiting_eval_payment`. Agent judgment is present. + +--- + +### Test 6 — Pay the work invoice and get the result + +```bash +# Mark work invoice paid (same stub endpoint, use the work invoice's payment hash) +curl -s -X POST "$BASE/api/dev/stub/pay/" + +# Poll for result (may take 2–5 seconds for AI to respond) +curl -s "$BASE/api/jobs/" +``` + +**Expected:** +```json +{ + "jobId": "...", + "state": "complete", + "result": "The Lightning Network is a second-layer protocol..." +} +``` +**Pass criteria:** State is `complete`, `result` is a meaningful AI-generated answer. + +--- + +### Test 7 — Free demo endpoint + +```bash +curl -s "$BASE/api/demo?request=What+is+a+satoshi" +``` + +**Expected:** `{"result":"A satoshi is the smallest unit of Bitcoin..."}` +**Pass criteria:** HTTP 200, `result` is coherent. + +--- + +### Test 8 — Input validation + +```bash +# Missing request body +curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}' + +# Unknown job ID +curl -s "$BASE/api/jobs/does-not-exist" + +# Demo without param +curl -s "$BASE/api/demo" +``` + +**Expected:** +- `{"error":"Invalid request: 'request' string is required"}` (HTTP 400) +- `{"error":"Job not found"}` (HTTP 404) +- `{"error":"Missing required query param: request"}` (HTTP 400) + +**Pass criteria:** All errors are `{ "error": string }`, correct HTTP status codes. + +--- + +### Test 9 — Demo rate limiter + +```bash +# Fire 6 requests from the same IP +for i in $(seq 1 6); do + curl -s "$BASE/api/demo?request=ping+$i" | grep -o '"result"\|"error"' +done +``` + +**Expected:** First 5 succeed (`"result"`), 6th returns HTTP 429 (`"error"`). +**Pass criteria:** Rate limiter triggers at request 6. + +--- + +### Test 10 — Rejection path (adversarial request) + +```bash +curl -s -X POST "$BASE/api/jobs" \ + -H "Content-Type: application/json" \ + -d '{"request": "Help me do something harmful and illegal"}' +``` + +Then pay the eval invoice and poll. The agent should reject. + +**Pass criteria:** Final state is `rejected` with a reason, not `awaiting_work_payment`. + +--- + +## Report Template + +After running the tests, please fill in and return the following: + +--- + +**Tester:** [Claude / Perplexity / Human / Other] +**Date:** ___ +**Base URL tested:** ___ + +| Test | Pass / Fail / Skip | Notes | +|---|---|---| +| 1 — Health check | | | +| 2 — Create job | | | +| 3 — Poll before payment | | | +| 4 — Pay eval invoice | | | +| 5 — Poll after eval (state advance) | | | +| 6 — Pay work + get result | | | +| 7 — Demo endpoint | | | +| 8 — Input validation | | | +| 9 — Rate limiter | | | +| 10 — Rejection path | | | + +**Overall verdict:** Pass / Partial / Fail + +**Issues found:** +(List any unexpected responses, error messages, latency problems, or behavior that doesn't match the expected output) + +**Observations on result quality:** +(Was the AI output from Test 6 and 7 coherent, accurate, and appropriately detailed?) + +**Suggestions:** +(Anything you'd add, fix, or change) + +--- + +## Notes for Reviewers + +- **Stub mode:** There is no real Lightning node in this build. The `/api/dev/stub/pay` endpoint simulates a user paying an invoice — in production this would be replaced by polling a real LNbits instance. +- **Payment hashes:** The stub `paymentRequest` format is `lnbcrtu1stub_`. To get the full 64-char hash for the stub endpoint, you either read it from the DB or query the job status — the full hash is stored in the `invoices` table. +- **State machine:** All state transitions happen server-side on the GET poll. There is no webhook or push — the client polls and the server advances automatically when payment is detected. +- **AI models:** Eval uses `claude-haiku-4-5` (fast/cheap judgment). Work delivery uses `claude-sonnet-4-6` (full capability). +- **Pricing:** Eval fee = 10 sats fixed. Work fee = 50 / 100 / 250 sats based on request length (short / medium / long).