2026-03-18 17:24:32 +00:00
# Timmy API — Test Plan & Report Prompt
2026-03-18 21:01:13 -04:00
**What is Timmy?**
Timmy is a Lightning Network-gated AI agent API. Users pay Bitcoin (via Lightning) to submit requests to an AI agent (Claude). Two payment modes:
2026-03-18 17:53:21 +00:00
2026-03-18 21:01:13 -04:00
- **Mode 1 — Per-Job (live):** Pay per request. Eval invoice (10 sats fixed) → Haiku judges the request → work invoice (dynamic, token-based) → Sonnet executes → result delivered.
- **Mode 2 — Session (live):** Pre-fund a credit balance. Requests automatically debit actual compute cost (eval + work tokens × 1.4 margin, converted to sats at live BTC/USD). No per-job invoices once active.
2026-03-18 17:24:32 +00:00
2026-03-18 21:01:13 -04:00
**Live base URL:**
2026-03-18 17:24:32 +00:00
```
2026-03-18 21:01:13 -04:00
https://9f85e954-647c-46a5-90a7-396e495a805a-00-clz2vhmfuk7p.spock.replit.dev
2026-03-18 17:24:32 +00:00
```
---
2026-03-18 21:01:13 -04:00
## Running the full test suite — one command
2026-03-18 17:24:32 +00:00
```bash
2026-03-18 21:01:13 -04:00
curl -s https://9f85e954-647c-46a5-90a7-396e495a805a-00-clz2vhmfuk7p.spock.replit.dev/api/testkit | bash
2026-03-18 17:24:32 +00:00
```
2026-03-18 21:01:13 -04:00
The server returns a self-contained bash script with the base URL already baked in.
Requirements: `curl` , `bash` , `jq` — nothing else.
2026-03-18 17:24:32 +00:00
2026-03-18 21:01:13 -04:00
> **Note for repeat runs:** Tests 7 and 8c hit `GET /api/demo`, which is rate-limited to 5 req/hr per IP. If you run the testkit more than once in the same hour from the same IP, those two checks will return 429. This is expected behaviour — the rate limiter is working correctly. Run from a fresh IP (or wait an hour) for a clean 20/20.
2026-03-18 17:24:32 +00:00
---
2026-03-18 21:01:13 -04:00
## What the testkit covers
2026-03-18 17:24:32 +00:00
2026-03-18 21:01:13 -04:00
### Mode 1 — Per-Job (tests 1– 10)
2026-03-18 17:24:32 +00:00
2026-03-18 21:01:13 -04:00
| # | Name | What it checks |
|---|------|----------------|
| 1 | Health check | `GET /api/healthz` → HTTP 200, `status=ok` |
| 2 | Create job | `POST /api/jobs` → HTTP 201, `jobId` + `evalInvoice.amountSats=10` |
| 3 | Poll before payment | `GET /api/jobs/:id` → `state=awaiting_eval_payment` , invoice echoed, `paymentHash` present in stub mode |
| 4 | Pay eval invoice | `POST /api/dev/stub/pay/:hash` → `{"ok":true}` |
| 5 | Eval state advance | Polls until `state=awaiting_work_payment` OR `state=rejected` (30s timeout) |
| 6 | Pay work + get result | Pays work invoice, polls until `state=complete` , `result` non-empty (30s timeout) |
| 7 | Demo endpoint | `GET /api/demo?request=...` → HTTP 200, coherent `result` |
| 8a | Missing body | `POST /api/jobs {}` → HTTP 400 |
| 8b | Unknown job ID | `GET /api/jobs/does-not-exist` → HTTP 404 |
| 8c | Demo missing param | `GET /api/demo` → HTTP 400 |
| 8d | 501-char request | `POST /api/jobs` with 501 chars → HTTP 400 mentioning "500 characters" |
| 9 | Rate limiter | 6× `GET /api/demo` → at least one HTTP 429 |
| 10 | Rejection path | Adversarial request goes through eval, polls until `state=rejected` with a non-empty `reason` |
2026-03-18 17:24:32 +00:00
2026-03-18 21:01:13 -04:00
### Mode 2 — Session (tests 11– 16)
2026-03-18 17:24:32 +00:00
2026-03-18 21:01:13 -04:00
| # | Name | What it checks |
|---|------|----------------|
| 11 | Create session | `POST /api/sessions {"amount_sats":200}` → HTTP 201, `sessionId` , `state=awaiting_payment` , `invoice.amountSats=200` |
| 12 | Poll before payment | `GET /api/sessions/:id` → `state=awaiting_payment` before invoice is paid |
| 13 | Pay deposit + activate | Pays deposit via stub, polls GET → `state=active` , `balanceSats=200` , `macaroon` present |
| 14 | Submit request (accepted) | `POST /api/sessions/:id/request` with valid macaroon → `state=complete` OR `state=rejected` , `debitedSats>0` , `balanceRemaining` decremented |
| 15 | Request without macaroon | Same endpoint, no `Authorization` header → HTTP 401 |
| 16 | Topup invoice creation | `POST /api/sessions/:id/topup {"amount_sats":500}` with macaroon → HTTP 200, `topup.paymentRequest` present, `topup.amountSats=500` |
2026-03-18 17:24:32 +00:00
task/35: Testkit T25–T36 — Nostr identity + trust engine coverage
## 12 new tests added to artifacts/api-server/src/routes/testkit.ts
Inserted before the Summary block, after T24 (cost ledger).
T25 — POST /identity/challenge: HTTP 200, nonce=64-char hex, expiresAt=ISO
T26 — POST /identity/verify {}: HTTP 400, non-empty error
T27 — POST /identity/verify fake nonce: HTTP 401, error contains "Nonce not found"
(uses a plausible-looking event structure to hit the nonce check, not the
signature check — tests the right layer)
T28 — GET /identity/me no header: HTTP 401, error contains "Missing"
T29 — GET /identity/me invalid token: HTTP 401 (Invalid/expired wording)
T30 — POST /sessions bad X-Nostr-Token: HTTP 401, "Invalid or expired", no sessionId
T31 — POST /jobs bad X-Nostr-Token: HTTP 401, "Invalid or expired"
T32 — POST /sessions anonymous: HTTP 201, trust_tier="anonymous"; captures T32_SESSION_ID
T33 — POST /jobs anonymous: HTTP 201, trust_tier="anonymous"; captures T33_JOB_ID
T34 — GET /jobs/:id (using T33_JOB_ID): HTTP 200, trust_tier non-null and "anonymous"
T35 — GET /sessions/:id (using T32_SESSION_ID): HTTP 200, trust_tier="anonymous"
T36 — Full challenge→sign→verify E2E: inline node CJS script generates ephemeral secp256k1
keypair via nostr-tools CJS bundle, POSTs challenge, signs kind=27235 event with
finalizeEvent(), verifies → nostr_token, GETs /identity/me, asserts tier=new,
interactionCount=0, pubkey matches. Guard: SKIP if node not in PATH or script fails.
## nostr-tools import strategy
nostr-tools v2 is ESM-only. CJS workaround: the package ships a CJS bundle at
lib/cjs/index.js. T36 uses require() with the absolute path to that bundle.
Falls back to bare require('nostr-tools') for portability, exits with code 1 if
neither works — bash guard catches this and marks T36 SKIP (not FAIL).
## Stubs T37–T40 added as bash block comments after T36
Format: `# FUTURE T3N: <description>` so they are grepped easily.
Covers: GET /api/estimate (cost preview), anonymous Lightning gate, trusted free tier,
Timmy-initiates-zap. Does not affect PASS/FAIL totals.
## TIMMY_TEST_PLAN.md updated
New "Nostr identity + trust engine (tests 25–36)" section added to the test table.
## TypeScript: 0 errors. All 12 tests smoke-tested individually against localhost:8080.
T25-T35: all correct HTTP status codes and JSON fields verified via curl.
T36: full E2E verified — tier=new, icount=0, pubkey matches /identity/me response.
2026-03-19 21:09:50 +00:00
### Nostr identity + trust engine (tests 25– 36)
| # | Name | What it checks |
|---|------|----------------|
| 25 | Challenge nonce | `POST /api/identity/challenge` → HTTP 200, `nonce` is 64-char hex, `expiresAt` is ISO in future |
| 26 | Verify: missing event | `POST /api/identity/verify {}` → HTTP 400, non-empty `error` |
| 27 | Verify: unknown nonce | `POST /api/identity/verify` with fake nonce in content → HTTP 401, `error` contains "Nonce not found" |
| 28 | Me: no token | `GET /api/identity/me` without header → HTTP 401, `error` contains "Missing" |
| 29 | Me: invalid token | `GET /api/identity/me` with `X-Nostr-Token: totally.invalid.token` → HTTP 401 |
| 30 | Sessions: bogus token | `POST /api/sessions` with `X-Nostr-Token: badtoken` → HTTP 401, no `sessionId` in response |
| 31 | Jobs: bogus token | `POST /api/jobs` with `X-Nostr-Token: badtoken` → HTTP 401 |
| 32 | Sessions anonymous tier | `POST /api/sessions` (no token) → HTTP 201, `trust_tier == "anonymous"` |
| 33 | Jobs anonymous tier | `POST /api/jobs` (no token) → HTTP 201, `trust_tier == "anonymous"` |
| 34 | GET jobs/:id includes tier | `GET /api/jobs/:id` → HTTP 200, `trust_tier` non-null (anonymous job → `"anonymous"` ) |
| 35 | GET sessions/:id includes tier | `GET /api/sessions/:id` → HTTP 200, `trust_tier == "anonymous"` |
| 36 | Full challenge→sign→verify | Inline node script: generate keypair, challenge, sign kind=27235 event, verify → token; GET /identity/me → tier=new, pubkey matches |
2026-03-18 17:24:32 +00:00
---
2026-03-18 21:01:13 -04:00
## Architecture notes for reviewers
2026-03-18 17:53:21 +00:00
2026-03-18 21:01:13 -04:00
### Mode 1 mechanics
- Stub mode is active (no real Lightning node). `paymentHash` is exposed on GET responses so the testkit can drive the full payment flow automatically. In production (real LNbits), `paymentHash` is hidden.
- `POST /api/dev/stub/pay/:hash` is only mounted when `NODE_ENV !== 'production'` .
- State machine advances server-side on every GET poll — no webhooks.
- AI models: Haiku for eval (cheap gating), Sonnet for work (full output).
- **Pricing:** eval = 10 sats fixed. Work invoice = actual token usage (input + output) × Anthropic per-token rate × 1.4 margin, converted at live BTC/USD. This is dynamic — a 53-char request typically produces an invoice of ~180 sats, not a fixed tier. The old 50/100/250 sat fixed tiers were replaced by this model.
- Max request length: 500 chars. Rate limiter: 5 req/hr/IP on `/api/demo` (in-memory, resets on server restart).
2026-03-18 17:53:21 +00:00
2026-03-18 21:01:13 -04:00
### Mode 2 mechanics
- Minimum deposit: 100 sats. Maximum: 10,000 sats. Minimum working balance: 50 sats.
- Session expiry: 24 hours of inactivity. Balance is forfeited on expiry. Expiry is stated in the `expiresAt` field of every session response.
- Auth: `Authorization: Bearer <macaroon>` header. Macaroon is issued on first activation (GET /sessions/:id after deposit is paid).
- Cost per request: (eval tokens + work tokens) × model rate × 1.4 margin → converted to sats. If a request starts with enough balance but actual cost pushes balance negative, the request still completes and delivers — only the * next * request is blocked.
- If balance drops below 50 sats, session transitions to `paused` . Top up via `POST /sessions/:id/topup` . Session resumes automatically on the next GET poll once the topup invoice is paid.
- The same `POST /api/dev/stub/pay/:hash` endpoint works for all invoice types (eval, work, session deposit, topup).
2026-03-18 17:53:21 +00:00
2026-03-18 21:01:13 -04:00
### Eval + work latency (important for manual testers)
The eval call uses the real Anthropic API (Haiku), typically 2– 5 seconds. The testkit uses polling loops (max 30s). Manual testers should poll with similar patience. The work call (Sonnet) typically runs 3– 8 seconds.
2026-03-18 17:53:21 +00:00
---
2026-03-18 21:01:13 -04:00
## Test results log
2026-03-18 17:53:21 +00:00
2026-03-18 21:01:13 -04:00
| Date | Tester | Score | Notes |
|------|--------|-------|-------|
| 2026-03-18 | Perplexity Computer | 20/20 PASS | Issue #22 |
| 2026-03-18 | Hermes (Claude Opus 4) | 19/20 (pre-fix) | Issue #23 ; 1 failure = test ordering bug (8c hit rate limiter before param check). Fixed in testkit v4. |
| 2026-03-19 | Replit Agent (post-fix) | 20/20 PASS | Verified on fresh server after testkit v4 — all fixes confirmed |
2026-03-19 01:04:50 +00:00
| 2026-03-18 | Claude Opus 4.6 | 14/20 (pre-fix) | Issue #25 ; 2 failures = same rate-limit ordering as Hermes. Fixed in testkit v4. |
2026-03-18 17:24:32 +00:00
---
2026-03-18 17:53:21 +00:00
## Report template
2026-03-18 21:01:13 -04:00
**Tester:** [Claude / Perplexity / Kimi / Hermes / Human / Other]
**Date:**
**Base URL tested:**
2026-03-18 17:53:21 +00:00
**Method:** [Automated (`curl … | bash` ) / Manual]
2026-03-18 21:01:13 -04:00
### Mode 1 — Per-Job
2026-03-18 17:30:13 +00:00
| Test | Pass / Fail / Skip | Latency | Notes |
2026-03-18 21:01:13 -04:00
|------|-------------------|---------|-------|
2026-03-18 17:30:13 +00:00
| 1 — Health check | | — | |
| 2 — Create job | | — | |
| 3 — Poll before payment | | — | |
| 4 — Pay eval invoice | | — | |
2026-03-18 21:01:13 -04:00
| 5 — Eval state advance | | ___s | |
2026-03-18 17:53:21 +00:00
| 6 — Pay work + result | | ___s | |
2026-03-18 17:30:13 +00:00
| 7 — Demo endpoint | | ___s | |
2026-03-18 17:53:21 +00:00
| 8a — Missing body | | — | |
2026-03-18 17:30:13 +00:00
| 8b — Unknown job ID | | — | |
| 8c — Demo missing param | | — | |
2026-03-18 17:53:21 +00:00
| 8d — 501-char request | | — | |
2026-03-18 17:30:13 +00:00
| 9 — Rate limiter | | — | |
2026-03-18 21:01:13 -04:00
| 10 — Rejection path | | ___s | |
2026-03-18 17:24:32 +00:00
2026-03-18 21:01:13 -04:00
### Mode 2 — Session
2026-03-18 17:53:21 +00:00
| Test | Pass / Fail / Skip | Notes |
2026-03-18 21:01:13 -04:00
|------|-------------------|-------|
2026-03-18 17:53:21 +00:00
| 11 — Create session | | |
2026-03-18 21:01:13 -04:00
| 12 — Poll before payment | | |
| 13 — Pay + activate | | |
| 14 — Submit request | | |
| 15 — Reject no macaroon | | |
| 16 — Topup invoice | | |
2026-03-18 17:53:21 +00:00
2026-03-18 17:24:32 +00:00
**Overall verdict:** Pass / Partial / Fail
2026-03-18 21:01:13 -04:00
**Total:** PASS=___ FAIL=___ SKIP=___
2026-03-18 17:53:21 +00:00
**Issues found:**
2026-03-18 17:24:32 +00:00
2026-03-18 17:53:21 +00:00
**Observations on result quality:**
2026-03-18 17:24:32 +00:00
2026-03-18 17:53:21 +00:00
**Suggestions:**