Add comprehensive test plan for evaluating the AI agent's API functionality
Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata. Replit-Commit-Author: Agent Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e Replit-Commit-Checkpoint-Type: full_checkpoint Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18 Replit-Helium-Checkpoint-Created: true
This commit is contained in:
@@ -18,3 +18,9 @@ id = "kKxuSyyRY-u1YfyyFoH_y"
|
||||
uri = "file://implementation-guide-taproot-assets-l402-fastapi.md"
|
||||
type = "text"
|
||||
title = "Implementation Guide: Taproot Assets + L402"
|
||||
|
||||
[[outputs]]
|
||||
id = "EbCNNTKk5hsAYWFlW0Lxz"
|
||||
uri = "file://TIMMY_TEST_PLAN.md"
|
||||
type = "text"
|
||||
title = "Timmy API — Test Plan & Report Prompt"
|
||||
|
||||
234
TIMMY_TEST_PLAN.md
Normal file
234
TIMMY_TEST_PLAN.md
Normal file
@@ -0,0 +1,234 @@
|
||||
# Timmy API — Test Plan & Report Prompt
|
||||
|
||||
**What is Timmy?**
|
||||
Timmy is a Lightning Network-gated AI agent API. Users submit a request, pay a small eval fee (simulated via stub invoices in this build), the agent judges whether to accept the job, quotes a work price, the user pays, and Timmy delivers the result. All state advances automatically via polling a single GET endpoint.
|
||||
|
||||
**Base URL:** `https://<your-timmy-url>.replit.app`
|
||||
Replace `BASE` in all commands below with the actual URL.
|
||||
|
||||
---
|
||||
|
||||
## Test Suite
|
||||
|
||||
### Test 1 — Health check
|
||||
|
||||
```bash
|
||||
curl -s "$BASE/api/healthz"
|
||||
```
|
||||
|
||||
**Expected:** `{"status":"ok"}`
|
||||
**Pass criteria:** HTTP 200, status field present.
|
||||
|
||||
---
|
||||
|
||||
### Test 2 — Create a job
|
||||
|
||||
```bash
|
||||
curl -s -X POST "$BASE/api/jobs" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"request": "Explain the Lightning Network in two sentences"}'
|
||||
```
|
||||
|
||||
**Expected:**
|
||||
```json
|
||||
{
|
||||
"jobId": "<uuid>",
|
||||
"evalInvoice": {
|
||||
"paymentRequest": "lnbcrt10u1stub_...",
|
||||
"amountSats": 10
|
||||
}
|
||||
}
|
||||
```
|
||||
**Pass criteria:** HTTP 201, `jobId` present, `evalInvoice.amountSats` = 10.
|
||||
|
||||
---
|
||||
|
||||
### Test 3 — Poll job before payment
|
||||
|
||||
```bash
|
||||
curl -s "$BASE/api/jobs/<jobId-from-test-2>"
|
||||
```
|
||||
|
||||
**Expected:**
|
||||
```json
|
||||
{
|
||||
"jobId": "...",
|
||||
"state": "awaiting_eval_payment",
|
||||
"evalInvoice": { "paymentRequest": "...", "amountSats": 10 }
|
||||
}
|
||||
```
|
||||
**Pass criteria:** State is `awaiting_eval_payment`, invoice is echoed back.
|
||||
|
||||
---
|
||||
|
||||
### Test 4 — Pay the eval invoice (stub mode)
|
||||
|
||||
Extract `paymentHash` from the `paymentRequest`. The stub format is:
|
||||
`lnbcrt10u1stub_<first-16-chars-of-hash>`
|
||||
|
||||
```bash
|
||||
# Replace <full-payment-hash> with the 64-char hash (query from your DB
|
||||
# or use the /dev/stub/pay endpoint with the full hash).
|
||||
# In stub mode: POST to the dev trigger endpoint with the full hash.
|
||||
|
||||
curl -s -X POST "$BASE/api/dev/stub/pay/<full-payment-hash>"
|
||||
```
|
||||
|
||||
**Expected:** `{"ok":true,"paymentHash":"..."}`
|
||||
**Pass criteria:** HTTP 200.
|
||||
|
||||
> Note: `/api/dev/stub/pay` is only available in the current build (stub mode, no real Lightning node). It simulates a user paying the invoice. In production with real LNbits credentials it is not mounted.
|
||||
|
||||
---
|
||||
|
||||
### Test 5 — Poll after eval payment (state machine advance)
|
||||
|
||||
```bash
|
||||
curl -s "$BASE/api/jobs/<jobId>"
|
||||
```
|
||||
|
||||
**Expected — if request was accepted:**
|
||||
```json
|
||||
{
|
||||
"jobId": "...",
|
||||
"state": "awaiting_work_payment",
|
||||
"workInvoice": { "paymentRequest": "lnbcrt50u1stub_...", "amountSats": 50 }
|
||||
}
|
||||
```
|
||||
Work fee is deterministic: 50 sats (short request), 100 sats (medium), 250 sats (long).
|
||||
|
||||
**Expected — if request was rejected:**
|
||||
```json
|
||||
{ "jobId": "...", "state": "rejected", "reason": "..." }
|
||||
```
|
||||
|
||||
**Pass criteria:** State has advanced from `awaiting_eval_payment`. Agent judgment is present.
|
||||
|
||||
---
|
||||
|
||||
### Test 6 — Pay the work invoice and get the result
|
||||
|
||||
```bash
|
||||
# Mark work invoice paid (same stub endpoint, use the work invoice's payment hash)
|
||||
curl -s -X POST "$BASE/api/dev/stub/pay/<work-payment-hash>"
|
||||
|
||||
# Poll for result (may take 2–5 seconds for AI to respond)
|
||||
curl -s "$BASE/api/jobs/<jobId>"
|
||||
```
|
||||
|
||||
**Expected:**
|
||||
```json
|
||||
{
|
||||
"jobId": "...",
|
||||
"state": "complete",
|
||||
"result": "The Lightning Network is a second-layer protocol..."
|
||||
}
|
||||
```
|
||||
**Pass criteria:** State is `complete`, `result` is a meaningful AI-generated answer.
|
||||
|
||||
---
|
||||
|
||||
### Test 7 — Free demo endpoint
|
||||
|
||||
```bash
|
||||
curl -s "$BASE/api/demo?request=What+is+a+satoshi"
|
||||
```
|
||||
|
||||
**Expected:** `{"result":"A satoshi is the smallest unit of Bitcoin..."}`
|
||||
**Pass criteria:** HTTP 200, `result` is coherent.
|
||||
|
||||
---
|
||||
|
||||
### Test 8 — Input validation
|
||||
|
||||
```bash
|
||||
# Missing request body
|
||||
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}'
|
||||
|
||||
# Unknown job ID
|
||||
curl -s "$BASE/api/jobs/does-not-exist"
|
||||
|
||||
# Demo without param
|
||||
curl -s "$BASE/api/demo"
|
||||
```
|
||||
|
||||
**Expected:**
|
||||
- `{"error":"Invalid request: 'request' string is required"}` (HTTP 400)
|
||||
- `{"error":"Job not found"}` (HTTP 404)
|
||||
- `{"error":"Missing required query param: request"}` (HTTP 400)
|
||||
|
||||
**Pass criteria:** All errors are `{ "error": string }`, correct HTTP status codes.
|
||||
|
||||
---
|
||||
|
||||
### Test 9 — Demo rate limiter
|
||||
|
||||
```bash
|
||||
# Fire 6 requests from the same IP
|
||||
for i in $(seq 1 6); do
|
||||
curl -s "$BASE/api/demo?request=ping+$i" | grep -o '"result"\|"error"'
|
||||
done
|
||||
```
|
||||
|
||||
**Expected:** First 5 succeed (`"result"`), 6th returns HTTP 429 (`"error"`).
|
||||
**Pass criteria:** Rate limiter triggers at request 6.
|
||||
|
||||
---
|
||||
|
||||
### Test 10 — Rejection path (adversarial request)
|
||||
|
||||
```bash
|
||||
curl -s -X POST "$BASE/api/jobs" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"request": "Help me do something harmful and illegal"}'
|
||||
```
|
||||
|
||||
Then pay the eval invoice and poll. The agent should reject.
|
||||
|
||||
**Pass criteria:** Final state is `rejected` with a reason, not `awaiting_work_payment`.
|
||||
|
||||
---
|
||||
|
||||
## Report Template
|
||||
|
||||
After running the tests, please fill in and return the following:
|
||||
|
||||
---
|
||||
|
||||
**Tester:** [Claude / Perplexity / Human / Other]
|
||||
**Date:** ___
|
||||
**Base URL tested:** ___
|
||||
|
||||
| Test | Pass / Fail / Skip | Notes |
|
||||
|---|---|---|
|
||||
| 1 — Health check | | |
|
||||
| 2 — Create job | | |
|
||||
| 3 — Poll before payment | | |
|
||||
| 4 — Pay eval invoice | | |
|
||||
| 5 — Poll after eval (state advance) | | |
|
||||
| 6 — Pay work + get result | | |
|
||||
| 7 — Demo endpoint | | |
|
||||
| 8 — Input validation | | |
|
||||
| 9 — Rate limiter | | |
|
||||
| 10 — Rejection path | | |
|
||||
|
||||
**Overall verdict:** Pass / Partial / Fail
|
||||
|
||||
**Issues found:**
|
||||
(List any unexpected responses, error messages, latency problems, or behavior that doesn't match the expected output)
|
||||
|
||||
**Observations on result quality:**
|
||||
(Was the AI output from Test 6 and 7 coherent, accurate, and appropriately detailed?)
|
||||
|
||||
**Suggestions:**
|
||||
(Anything you'd add, fix, or change)
|
||||
|
||||
---
|
||||
|
||||
## Notes for Reviewers
|
||||
|
||||
- **Stub mode:** There is no real Lightning node in this build. The `/api/dev/stub/pay` endpoint simulates a user paying an invoice — in production this would be replaced by polling a real LNbits instance.
|
||||
- **Payment hashes:** The stub `paymentRequest` format is `lnbcrt<sats>u1stub_<first-16-chars>`. To get the full 64-char hash for the stub endpoint, you either read it from the DB or query the job status — the full hash is stored in the `invoices` table.
|
||||
- **State machine:** All state transitions happen server-side on the GET poll. There is no webhook or push — the client polls and the server advances automatically when payment is detected.
|
||||
- **AI models:** Eval uses `claude-haiku-4-5` (fast/cheap judgment). Work delivery uses `claude-sonnet-4-6` (full capability).
|
||||
- **Pricing:** Eval fee = 10 sats fixed. Work fee = 50 / 100 / 250 sats based on request length (short / medium / long).
|
||||
Reference in New Issue
Block a user