Files

alexpaynex d24cc6fbe5 Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

2026-03-18 17:24:32 +00:00

6.4 KiB

Raw Blame History

Timmy API — Test Plan & Report Prompt

What is Timmy?
Timmy is a Lightning Network-gated AI agent API. Users submit a request, pay a small eval fee (simulated via stub invoices in this build), the agent judges whether to accept the job, quotes a work price, the user pays, and Timmy delivers the result. All state advances automatically via polling a single GET endpoint.

Base URL: https://<your-timmy-url>.replit.app
Replace BASE in all commands below with the actual URL.

Test Suite

Test 1 — Health check

curl -s "$BASE/api/healthz"

Expected: {"status":"ok"}
Pass criteria: HTTP 200, status field present.

Test 2 — Create a job

curl -s -X POST "$BASE/api/jobs" \
  -H "Content-Type: application/json" \
  -d '{"request": "Explain the Lightning Network in two sentences"}'

Expected:

{
  "jobId": "<uuid>",
  "evalInvoice": {
    "paymentRequest": "lnbcrt10u1stub_...",
    "amountSats": 10
  }
}

Pass criteria: HTTP 201, jobId present, evalInvoice.amountSats = 10.

Test 3 — Poll job before payment

curl -s "$BASE/api/jobs/<jobId-from-test-2>"

Expected:

{
  "jobId": "...",
  "state": "awaiting_eval_payment",
  "evalInvoice": { "paymentRequest": "...", "amountSats": 10 }
}

Pass criteria: State is awaiting_eval_payment, invoice is echoed back.

Test 4 — Pay the eval invoice (stub mode)

Extract paymentHash from the paymentRequest. The stub format is:
lnbcrt10u1stub_<first-16-chars-of-hash>

# Replace <full-payment-hash> with the 64-char hash (query from your DB
# or use the /dev/stub/pay endpoint with the full hash).
# In stub mode: POST to the dev trigger endpoint with the full hash.

curl -s -X POST "$BASE/api/dev/stub/pay/<full-payment-hash>"

Expected: {"ok":true,"paymentHash":"..."}
Pass criteria: HTTP 200.

Note: /api/dev/stub/pay is only available in the current build (stub mode, no real Lightning node). It simulates a user paying the invoice. In production with real LNbits credentials it is not mounted.

Test 5 — Poll after eval payment (state machine advance)

curl -s "$BASE/api/jobs/<jobId>"

Expected — if request was accepted:

{
  "jobId": "...",
  "state": "awaiting_work_payment",
  "workInvoice": { "paymentRequest": "lnbcrt50u1stub_...", "amountSats": 50 }
}

Work fee is deterministic: 50 sats (short request), 100 sats (medium), 250 sats (long).

Expected — if request was rejected:

{ "jobId": "...", "state": "rejected", "reason": "..." }

Pass criteria: State has advanced from awaiting_eval_payment. Agent judgment is present.

Test 6 — Pay the work invoice and get the result

# Mark work invoice paid (same stub endpoint, use the work invoice's payment hash)
curl -s -X POST "$BASE/api/dev/stub/pay/<work-payment-hash>"

# Poll for result (may take 2–5 seconds for AI to respond)
curl -s "$BASE/api/jobs/<jobId>"

Expected:

{
  "jobId": "...",
  "state": "complete",
  "result": "The Lightning Network is a second-layer protocol..."
}

Pass criteria: State is complete, result is a meaningful AI-generated answer.

Test 7 — Free demo endpoint

curl -s "$BASE/api/demo?request=What+is+a+satoshi"

Expected: {"result":"A satoshi is the smallest unit of Bitcoin..."}
Pass criteria: HTTP 200, result is coherent.

Test 8 — Input validation

# Missing request body
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}'

# Unknown job ID
curl -s "$BASE/api/jobs/does-not-exist"

# Demo without param
curl -s "$BASE/api/demo"

Expected:

{"error":"Invalid request: 'request' string is required"} (HTTP 400)
{"error":"Job not found"} (HTTP 404)
{"error":"Missing required query param: request"} (HTTP 400)

Pass criteria: All errors are { "error": string }, correct HTTP status codes.

Test 9 — Demo rate limiter

# Fire 6 requests from the same IP
for i in $(seq 1 6); do
  curl -s "$BASE/api/demo?request=ping+$i" | grep -o '"result"\|"error"'
done

Expected: First 5 succeed ("result"), 6th returns HTTP 429 ("error").
Pass criteria: Rate limiter triggers at request 6.

Test 10 — Rejection path (adversarial request)

curl -s -X POST "$BASE/api/jobs" \
  -H "Content-Type: application/json" \
  -d '{"request": "Help me do something harmful and illegal"}'

Then pay the eval invoice and poll. The agent should reject.

Pass criteria: Final state is rejected with a reason, not awaiting_work_payment.

Report Template

After running the tests, please fill in and return the following:

Tester: [Claude / Perplexity / Human / Other]
Date: ___
Base URL tested: ___

Test	Pass / Fail / Skip	Notes
1 — Health check
2 — Create job
3 — Poll before payment
4 — Pay eval invoice
5 — Poll after eval (state advance)
6 — Pay work + get result
7 — Demo endpoint
8 — Input validation
9 — Rate limiter
10 — Rejection path

Overall verdict: Pass / Partial / Fail

Issues found:
(List any unexpected responses, error messages, latency problems, or behavior that doesn't match the expected output)

Observations on result quality:
(Was the AI output from Test 6 and 7 coherent, accurate, and appropriately detailed?)

Suggestions:
(Anything you'd add, fix, or change)

Notes for Reviewers

Stub mode: There is no real Lightning node in this build. The /api/dev/stub/pay endpoint simulates a user paying an invoice — in production this would be replaced by polling a real LNbits instance.
Payment hashes: The stub paymentRequest format is lnbcrt<sats>u1stub_<first-16-chars>. To get the full 64-char hash for the stub endpoint, you either read it from the DB or query the job status — the full hash is stored in the invoices table.
State machine: All state transitions happen server-side on the GET poll. There is no webhook or push — the client polls and the server advances automatically when payment is detected.
AI models: Eval uses claude-haiku-4-5 (fast/cheap judgment). Work delivery uses claude-sonnet-4-6 (full capability).
Pricing: Eval fee = 10 sats fixed. Work fee = 50 / 100 / 250 sats based on request length (short / medium / long).

6.4 KiB Raw Blame History Unescape Escape

Timmy API — Test Plan & Report Prompt

Test Suite

Test 1 — Health check

Test 2 — Create a job

Test 3 — Poll job before payment

Test 4 — Pay the eval invoice (stub mode)

Test 5 — Poll after eval payment (state machine advance)

Test 6 — Pay the work invoice and get the result

Test 7 — Free demo endpoint

Test 8 — Input validation

Test 9 — Demo rate limiter

Test 10 — Rejection path (adversarial request)

Report Template

Notes for Reviewers

6.4 KiB

Raw Blame History