Files

alexpaynex c7bb5de5e6 task/35: Testkit T25–T36 — Nostr identity + trust engine coverage

## 12 new tests added to artifacts/api-server/src/routes/testkit.ts
Inserted before the Summary block, after T24 (cost ledger).

T25 — POST /identity/challenge: HTTP 200, nonce=64-char hex, expiresAt=ISO
T26 — POST /identity/verify {}: HTTP 400, non-empty error
T27 — POST /identity/verify fake nonce: HTTP 401, error contains "Nonce not found"
       (uses a plausible-looking event structure to hit the nonce check, not the
       signature check — tests the right layer)
T28 — GET /identity/me no header: HTTP 401, error contains "Missing"
T29 — GET /identity/me invalid token: HTTP 401 (Invalid/expired wording)
T30 — POST /sessions bad X-Nostr-Token: HTTP 401, "Invalid or expired", no sessionId
T31 — POST /jobs bad X-Nostr-Token: HTTP 401, "Invalid or expired"
T32 — POST /sessions anonymous: HTTP 201, trust_tier="anonymous"; captures T32_SESSION_ID
T33 — POST /jobs anonymous: HTTP 201, trust_tier="anonymous"; captures T33_JOB_ID
T34 — GET /jobs/:id (using T33_JOB_ID): HTTP 200, trust_tier non-null and "anonymous"
T35 — GET /sessions/:id (using T32_SESSION_ID): HTTP 200, trust_tier="anonymous"
T36 — Full challenge→sign→verify E2E: inline node CJS script generates ephemeral secp256k1
      keypair via nostr-tools CJS bundle, POSTs challenge, signs kind=27235 event with
      finalizeEvent(), verifies → nostr_token, GETs /identity/me, asserts tier=new,
      interactionCount=0, pubkey matches. Guard: SKIP if node not in PATH or script fails.

## nostr-tools import strategy
nostr-tools v2 is ESM-only. CJS workaround: the package ships a CJS bundle at
lib/cjs/index.js. T36 uses require() with the absolute path to that bundle.
Falls back to bare require('nostr-tools') for portability, exits with code 1 if
neither works — bash guard catches this and marks T36 SKIP (not FAIL).

## Stubs T37–T40 added as bash block comments after T36
Format: `# FUTURE T3N: <description>` so they are grepped easily.
Covers: GET /api/estimate (cost preview), anonymous Lightning gate, trusted free tier,
Timmy-initiates-zap. Does not affect PASS/FAIL totals.

## TIMMY_TEST_PLAN.md updated
New "Nostr identity + trust engine (tests 25–36)" section added to the test table.

## TypeScript: 0 errors. All 12 tests smoke-tested individually against localhost:8080.
T25-T35: all correct HTTP status codes and JSON fields verified via curl.
T36: full E2E verified — tier=new, icount=0, pubkey matches /identity/me response.

2026-03-19 21:09:50 +00:00

8.8 KiB

Raw Blame History

Timmy API — Test Plan & Report Prompt

What is Timmy? Timmy is a Lightning Network-gated AI agent API. Users pay Bitcoin (via Lightning) to submit requests to an AI agent (Claude). Two payment modes:

Mode 1 — Per-Job (live): Pay per request. Eval invoice (10 sats fixed) → Haiku judges the request → work invoice (dynamic, token-based) → Sonnet executes → result delivered.
Mode 2 — Session (live): Pre-fund a credit balance. Requests automatically debit actual compute cost (eval + work tokens × 1.4 margin, converted to sats at live BTC/USD). No per-job invoices once active.

Live base URL:

https://9f85e954-647c-46a5-90a7-396e495a805a-00-clz2vhmfuk7p.spock.replit.dev

Running the full test suite — one command

curl -s https://9f85e954-647c-46a5-90a7-396e495a805a-00-clz2vhmfuk7p.spock.replit.dev/api/testkit | bash

The server returns a self-contained bash script with the base URL already baked in. Requirements: curl, bash, jq — nothing else.

Note for repeat runs: Tests 7 and 8c hit GET /api/demo, which is rate-limited to 5 req/hr per IP. If you run the testkit more than once in the same hour from the same IP, those two checks will return 429. This is expected behaviour — the rate limiter is working correctly. Run from a fresh IP (or wait an hour) for a clean 20/20.

What the testkit covers

Mode 1 — Per-Job (tests 1–10)

#	Name	What it checks
1	Health check	`GET /api/healthz` → HTTP 200, `status=ok`
2	Create job	`POST /api/jobs` → HTTP 201, `jobId` + `evalInvoice.amountSats=10`
3	Poll before payment	`GET /api/jobs/:id` → `state=awaiting_eval_payment`, invoice echoed, `paymentHash` present in stub mode
4	Pay eval invoice	`POST /api/dev/stub/pay/:hash` → `{"ok":true}`
5	Eval state advance	Polls until `state=awaiting_work_payment` OR `state=rejected` (30s timeout)
6	Pay work + get result	Pays work invoice, polls until `state=complete`, `result` non-empty (30s timeout)
7	Demo endpoint	`GET /api/demo?request=...` → HTTP 200, coherent `result`
8a	Missing body	`POST /api/jobs {}` → HTTP 400
8b	Unknown job ID	`GET /api/jobs/does-not-exist` → HTTP 404
8c	Demo missing param	`GET /api/demo` → HTTP 400
8d	501-char request	`POST /api/jobs` with 501 chars → HTTP 400 mentioning "500 characters"
9	Rate limiter	6× `GET /api/demo` → at least one HTTP 429
10	Rejection path	Adversarial request goes through eval, polls until `state=rejected` with a non-empty `reason`

Mode 2 — Session (tests 11–16)

#	Name	What it checks
11	Create session	`POST /api/sessions {"amount_sats":200}` → HTTP 201, `sessionId`, `state=awaiting_payment`, `invoice.amountSats=200`
12	Poll before payment	`GET /api/sessions/:id` → `state=awaiting_payment` before invoice is paid
13	Pay deposit + activate	Pays deposit via stub, polls GET → `state=active`, `balanceSats=200`, `macaroon` present
14	Submit request (accepted)	`POST /api/sessions/:id/request` with valid macaroon → `state=complete` OR `state=rejected`, `debitedSats>0`, `balanceRemaining` decremented
15	Request without macaroon	Same endpoint, no `Authorization` header → HTTP 401
16	Topup invoice creation	`POST /api/sessions/:id/topup {"amount_sats":500}` with macaroon → HTTP 200, `topup.paymentRequest` present, `topup.amountSats=500`

Nostr identity + trust engine (tests 25–36)

#	Name	What it checks
25	Challenge nonce	`POST /api/identity/challenge` → HTTP 200, `nonce` is 64-char hex, `expiresAt` is ISO in future
26	Verify: missing event	`POST /api/identity/verify {}` → HTTP 400, non-empty `error`
27	Verify: unknown nonce	`POST /api/identity/verify` with fake nonce in content → HTTP 401, `error` contains "Nonce not found"
28	Me: no token	`GET /api/identity/me` without header → HTTP 401, `error` contains "Missing"
29	Me: invalid token	`GET /api/identity/me` with `X-Nostr-Token: totally.invalid.token` → HTTP 401
30	Sessions: bogus token	`POST /api/sessions` with `X-Nostr-Token: badtoken` → HTTP 401, no `sessionId` in response
31	Jobs: bogus token	`POST /api/jobs` with `X-Nostr-Token: badtoken` → HTTP 401
32	Sessions anonymous tier	`POST /api/sessions` (no token) → HTTP 201, `trust_tier == "anonymous"`
33	Jobs anonymous tier	`POST /api/jobs` (no token) → HTTP 201, `trust_tier == "anonymous"`
34	GET jobs/:id includes tier	`GET /api/jobs/:id` → HTTP 200, `trust_tier` non-null (anonymous job → `"anonymous"`)
35	GET sessions/:id includes tier	`GET /api/sessions/:id` → HTTP 200, `trust_tier == "anonymous"`
36	Full challenge→sign→verify	Inline node script: generate keypair, challenge, sign kind=27235 event, verify → token; GET /identity/me → tier=new, pubkey matches

Architecture notes for reviewers

Mode 1 mechanics

Stub mode is active (no real Lightning node). paymentHash is exposed on GET responses so the testkit can drive the full payment flow automatically. In production (real LNbits), paymentHash is hidden.
POST /api/dev/stub/pay/:hash is only mounted when NODE_ENV !== 'production'.
State machine advances server-side on every GET poll — no webhooks.
AI models: Haiku for eval (cheap gating), Sonnet for work (full output).
Pricing: eval = 10 sats fixed. Work invoice = actual token usage (input + output) × Anthropic per-token rate × 1.4 margin, converted at live BTC/USD. This is dynamic — a 53-char request typically produces an invoice of ~180 sats, not a fixed tier. The old 50/100/250 sat fixed tiers were replaced by this model.
Max request length: 500 chars. Rate limiter: 5 req/hr/IP on /api/demo (in-memory, resets on server restart).

Mode 2 mechanics

Minimum deposit: 100 sats. Maximum: 10,000 sats. Minimum working balance: 50 sats.
Session expiry: 24 hours of inactivity. Balance is forfeited on expiry. Expiry is stated in the expiresAt field of every session response.
Auth: Authorization: Bearer <macaroon> header. Macaroon is issued on first activation (GET /sessions/:id after deposit is paid).
Cost per request: (eval tokens + work tokens) × model rate × 1.4 margin → converted to sats. If a request starts with enough balance but actual cost pushes balance negative, the request still completes and delivers — only the next request is blocked.
If balance drops below 50 sats, session transitions to paused. Top up via POST /sessions/:id/topup. Session resumes automatically on the next GET poll once the topup invoice is paid.
The same POST /api/dev/stub/pay/:hash endpoint works for all invoice types (eval, work, session deposit, topup).

Eval + work latency (important for manual testers)

The eval call uses the real Anthropic API (Haiku), typically 2–5 seconds. The testkit uses polling loops (max 30s). Manual testers should poll with similar patience. The work call (Sonnet) typically runs 3–8 seconds.

Test results log

Date	Tester	Score	Notes
2026-03-18	Perplexity Computer	20/20 PASS	Issue #22
2026-03-18	Hermes (Claude Opus 4)	19/20 (pre-fix)	Issue #23; 1 failure = test ordering bug (8c hit rate limiter before param check). Fixed in testkit v4.
2026-03-19	Replit Agent (post-fix)	20/20 PASS	Verified on fresh server after testkit v4 — all fixes confirmed
2026-03-18	Claude Opus 4.6	14/20 (pre-fix)	Issue #25; 2 failures = same rate-limit ordering as Hermes. Fixed in testkit v4.

Report template

Tester: [Claude / Perplexity / Kimi / Hermes / Human / Other] Date: Base URL tested: Method: [Automated (curl … | bash) / Manual]

Mode 1 — Per-Job

Test	Pass / Fail / Skip	Latency	Notes
1 — Health check		—
2 — Create job		—
3 — Poll before payment		—
4 — Pay eval invoice		—
5 — Eval state advance		___s
6 — Pay work + result		___s
7 — Demo endpoint		___s
8a — Missing body		—
8b — Unknown job ID		—
8c — Demo missing param		—
8d — 501-char request		—
9 — Rate limiter		—
10 — Rejection path		___s

Mode 2 — Session

Test	Pass / Fail / Skip	Notes
11 — Create session
12 — Poll before payment
13 — Pay + activate
14 — Submit request
15 — Reject no macaroon
16 — Topup invoice

Overall verdict: Pass / Partial / Fail

Total: PASS=___ FAIL=___ SKIP=___

Issues found:

Observations on result quality:

Suggestions:

8.8 KiB Raw Blame History Unescape Escape