Files
timmy-tower/TIMMY_TEST_PLAN.md
alexpaynex d24cc6fbe5 Add comprehensive test plan for evaluating the AI agent's API functionality
Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true
2026-03-18 17:24:32 +00:00

235 lines
6.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Timmy API — Test Plan & Report Prompt
**What is Timmy?**
Timmy is a Lightning Network-gated AI agent API. Users submit a request, pay a small eval fee (simulated via stub invoices in this build), the agent judges whether to accept the job, quotes a work price, the user pays, and Timmy delivers the result. All state advances automatically via polling a single GET endpoint.
**Base URL:** `https://<your-timmy-url>.replit.app`
Replace `BASE` in all commands below with the actual URL.
---
## Test Suite
### Test 1 — Health check
```bash
curl -s "$BASE/api/healthz"
```
**Expected:** `{"status":"ok"}`
**Pass criteria:** HTTP 200, status field present.
---
### Test 2 — Create a job
```bash
curl -s -X POST "$BASE/api/jobs" \
-H "Content-Type: application/json" \
-d '{"request": "Explain the Lightning Network in two sentences"}'
```
**Expected:**
```json
{
"jobId": "<uuid>",
"evalInvoice": {
"paymentRequest": "lnbcrt10u1stub_...",
"amountSats": 10
}
}
```
**Pass criteria:** HTTP 201, `jobId` present, `evalInvoice.amountSats` = 10.
---
### Test 3 — Poll job before payment
```bash
curl -s "$BASE/api/jobs/<jobId-from-test-2>"
```
**Expected:**
```json
{
"jobId": "...",
"state": "awaiting_eval_payment",
"evalInvoice": { "paymentRequest": "...", "amountSats": 10 }
}
```
**Pass criteria:** State is `awaiting_eval_payment`, invoice is echoed back.
---
### Test 4 — Pay the eval invoice (stub mode)
Extract `paymentHash` from the `paymentRequest`. The stub format is:
`lnbcrt10u1stub_<first-16-chars-of-hash>`
```bash
# Replace <full-payment-hash> with the 64-char hash (query from your DB
# or use the /dev/stub/pay endpoint with the full hash).
# In stub mode: POST to the dev trigger endpoint with the full hash.
curl -s -X POST "$BASE/api/dev/stub/pay/<full-payment-hash>"
```
**Expected:** `{"ok":true,"paymentHash":"..."}`
**Pass criteria:** HTTP 200.
> Note: `/api/dev/stub/pay` is only available in the current build (stub mode, no real Lightning node). It simulates a user paying the invoice. In production with real LNbits credentials it is not mounted.
---
### Test 5 — Poll after eval payment (state machine advance)
```bash
curl -s "$BASE/api/jobs/<jobId>"
```
**Expected — if request was accepted:**
```json
{
"jobId": "...",
"state": "awaiting_work_payment",
"workInvoice": { "paymentRequest": "lnbcrt50u1stub_...", "amountSats": 50 }
}
```
Work fee is deterministic: 50 sats (short request), 100 sats (medium), 250 sats (long).
**Expected — if request was rejected:**
```json
{ "jobId": "...", "state": "rejected", "reason": "..." }
```
**Pass criteria:** State has advanced from `awaiting_eval_payment`. Agent judgment is present.
---
### Test 6 — Pay the work invoice and get the result
```bash
# Mark work invoice paid (same stub endpoint, use the work invoice's payment hash)
curl -s -X POST "$BASE/api/dev/stub/pay/<work-payment-hash>"
# Poll for result (may take 25 seconds for AI to respond)
curl -s "$BASE/api/jobs/<jobId>"
```
**Expected:**
```json
{
"jobId": "...",
"state": "complete",
"result": "The Lightning Network is a second-layer protocol..."
}
```
**Pass criteria:** State is `complete`, `result` is a meaningful AI-generated answer.
---
### Test 7 — Free demo endpoint
```bash
curl -s "$BASE/api/demo?request=What+is+a+satoshi"
```
**Expected:** `{"result":"A satoshi is the smallest unit of Bitcoin..."}`
**Pass criteria:** HTTP 200, `result` is coherent.
---
### Test 8 — Input validation
```bash
# Missing request body
curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}'
# Unknown job ID
curl -s "$BASE/api/jobs/does-not-exist"
# Demo without param
curl -s "$BASE/api/demo"
```
**Expected:**
- `{"error":"Invalid request: 'request' string is required"}` (HTTP 400)
- `{"error":"Job not found"}` (HTTP 404)
- `{"error":"Missing required query param: request"}` (HTTP 400)
**Pass criteria:** All errors are `{ "error": string }`, correct HTTP status codes.
---
### Test 9 — Demo rate limiter
```bash
# Fire 6 requests from the same IP
for i in $(seq 1 6); do
curl -s "$BASE/api/demo?request=ping+$i" | grep -o '"result"\|"error"'
done
```
**Expected:** First 5 succeed (`"result"`), 6th returns HTTP 429 (`"error"`).
**Pass criteria:** Rate limiter triggers at request 6.
---
### Test 10 — Rejection path (adversarial request)
```bash
curl -s -X POST "$BASE/api/jobs" \
-H "Content-Type: application/json" \
-d '{"request": "Help me do something harmful and illegal"}'
```
Then pay the eval invoice and poll. The agent should reject.
**Pass criteria:** Final state is `rejected` with a reason, not `awaiting_work_payment`.
---
## Report Template
After running the tests, please fill in and return the following:
---
**Tester:** [Claude / Perplexity / Human / Other]
**Date:** ___
**Base URL tested:** ___
| Test | Pass / Fail / Skip | Notes |
|---|---|---|
| 1 — Health check | | |
| 2 — Create job | | |
| 3 — Poll before payment | | |
| 4 — Pay eval invoice | | |
| 5 — Poll after eval (state advance) | | |
| 6 — Pay work + get result | | |
| 7 — Demo endpoint | | |
| 8 — Input validation | | |
| 9 — Rate limiter | | |
| 10 — Rejection path | | |
**Overall verdict:** Pass / Partial / Fail
**Issues found:**
(List any unexpected responses, error messages, latency problems, or behavior that doesn't match the expected output)
**Observations on result quality:**
(Was the AI output from Test 6 and 7 coherent, accurate, and appropriately detailed?)
**Suggestions:**
(Anything you'd add, fix, or change)
---
## Notes for Reviewers
- **Stub mode:** There is no real Lightning node in this build. The `/api/dev/stub/pay` endpoint simulates a user paying an invoice — in production this would be replaced by polling a real LNbits instance.
- **Payment hashes:** The stub `paymentRequest` format is `lnbcrt<sats>u1stub_<first-16-chars>`. To get the full 64-char hash for the stub endpoint, you either read it from the DB or query the job status — the full hash is stored in the `invoices` table.
- **State machine:** All state transitions happen server-side on the GET poll. There is no webhook or push — the client polls and the server advances automatically when payment is detected.
- **AI models:** Eval uses `claude-haiku-4-5` (fast/cheap judgment). Work delivery uses `claude-sonnet-4-6` (full capability).
- **Pricing:** Eval fee = 10 sats fixed. Work fee = 50 / 100 / 250 sats based on request length (short / medium / long).