TIMMY_TEST_PLAN.md

# Timmy API — Test Plan & Report Prompt

**What is Timmy?**
Timmy is a Lightning Network-gated AI agent API. Users pay Bitcoin (via Lightning) to submit requests to an AI agent (Claude). Two payment modes:

- **Mode 1 — Per-Job (live):** Pay per request. Eval invoice (10 sats fixed) → Haiku judges the request → work invoice (dynamic, token-based) → Sonnet executes → result delivered.
- **Mode 2 — Session (live):** Pre-fund a credit balance. Requests automatically debit actual compute cost (eval + work tokens × 1.4 margin, converted to sats at live BTC/USD). No per-job invoices once active.

**Live base URL:**
```
https://9f85e954-647c-46a5-90a7-396e495a805a-00-clz2vhmfuk7p.spock.replit.dev
```

---

## Running the full test suite — one command

```bash
curl -s https://9f85e954-647c-46a5-90a7-396e495a805a-00-clz2vhmfuk7p.spock.replit.dev/api/testkit | bash
```

The server returns a self-contained bash script with the base URL already baked in.
Requirements: `curl`, `bash`, `jq` — nothing else.

> **Note for repeat runs:** Tests 7 and 8c hit `GET /api/demo`, which is rate-limited to 5 req/hr per IP. If you run the testkit more than once in the same hour from the same IP, those two checks will return 429. This is expected behaviour — the rate limiter is working correctly. Run from a fresh IP (or wait an hour) for a clean 20/20.

---

## What the testkit covers

### Mode 1 — Per-Job (tests 1–10)

| # | Name | What it checks |
|---|------|----------------|
| 1 | Health check | `GET /api/healthz` → HTTP 200, `status=ok` |
| 2 | Create job | `POST /api/jobs` → HTTP 201, `jobId` + `evalInvoice.amountSats=10` |
| 3 | Poll before payment | `GET /api/jobs/:id` → `state=awaiting_eval_payment`, invoice echoed, `paymentHash` present in stub mode |
| 4 | Pay eval invoice | `POST /api/dev/stub/pay/:hash` → `{"ok":true}` |
| 5 | Eval state advance | Polls until `state=awaiting_work_payment` OR `state=rejected` (30s timeout) |
| 6 | Pay work + get result | Pays work invoice, polls until `state=complete`, `result` non-empty (30s timeout) |
| 7 | Demo endpoint | `GET /api/demo?request=...` → HTTP 200, coherent `result` |
| 8a | Missing body | `POST /api/jobs {}` → HTTP 400 |
| 8b | Unknown job ID | `GET /api/jobs/does-not-exist` → HTTP 404 |
| 8c | Demo missing param | `GET /api/demo` → HTTP 400 |
| 8d | 501-char request | `POST /api/jobs` with 501 chars → HTTP 400 mentioning "500 characters" |
| 9 | Rate limiter | 6× `GET /api/demo` → at least one HTTP 429 |
| 10 | Rejection path | Adversarial request goes through eval, polls until `state=rejected` with a non-empty `reason` |

### Mode 2 — Session (tests 11–16)

| # | Name | What it checks |
|---|------|----------------|
| 11 | Create session | `POST /api/sessions {"amount_sats":200}` → HTTP 201, `sessionId`, `state=awaiting_payment`, `invoice.amountSats=200` |
| 12 | Poll before payment | `GET /api/sessions/:id` → `state=awaiting_payment` before invoice is paid |
| 13 | Pay deposit + activate | Pays deposit via stub, polls GET → `state=active`, `balanceSats=200`, `macaroon` present |
| 14 | Submit request (accepted) | `POST /api/sessions/:id/request` with valid macaroon → `state=complete` OR `state=rejected`, `debitedSats>0`, `balanceRemaining` decremented |
| 15 | Request without macaroon | Same endpoint, no `Authorization` header → HTTP 401 |
| 16 | Topup invoice creation | `POST /api/sessions/:id/topup {"amount_sats":500}` with macaroon → HTTP 200, `topup.paymentRequest` present, `topup.amountSats=500` |

---

## Architecture notes for reviewers

### Mode 1 mechanics
- Stub mode is active (no real Lightning node). `paymentHash` is exposed on GET responses so the testkit can drive the full payment flow automatically. In production (real LNbits), `paymentHash` is hidden.
- `POST /api/dev/stub/pay/:hash` is only mounted when `NODE_ENV !== 'production'`.
- State machine advances server-side on every GET poll — no webhooks.
- AI models: Haiku for eval (cheap gating), Sonnet for work (full output).
- **Pricing:** eval = 10 sats fixed. Work invoice = actual token usage (input + output) × Anthropic per-token rate × 1.4 margin, converted at live BTC/USD. This is dynamic — a 53-char request typically produces an invoice of ~180 sats, not a fixed tier. The old 50/100/250 sat fixed tiers were replaced by this model.
- Max request length: 500 chars. Rate limiter: 5 req/hr/IP on `/api/demo` (in-memory, resets on server restart).

### Mode 2 mechanics
- Minimum deposit: 100 sats. Maximum: 10,000 sats. Minimum working balance: 50 sats.
- Session expiry: 24 hours of inactivity. Balance is forfeited on expiry. Expiry is stated in the `expiresAt` field of every session response.
- Auth: `Authorization: Bearer <macaroon>` header. Macaroon is issued on first activation (GET /sessions/:id after deposit is paid).
- Cost per request: (eval tokens + work tokens) × model rate × 1.4 margin → converted to sats. If a request starts with enough balance but actual cost pushes balance negative, the request still completes and delivers — only the *next* request is blocked.
- If balance drops below 50 sats, session transitions to `paused`. Top up via `POST /sessions/:id/topup`. Session resumes automatically on the next GET poll once the topup invoice is paid.
- The same `POST /api/dev/stub/pay/:hash` endpoint works for all invoice types (eval, work, session deposit, topup).

### Eval + work latency (important for manual testers)
The eval call uses the real Anthropic API (Haiku), typically 2–5 seconds. The testkit uses polling loops (max 30s). Manual testers should poll with similar patience. The work call (Sonnet) typically runs 3–8 seconds.

---

## Test results log

| Date | Tester | Score | Notes |
|------|--------|-------|-------|
| 2026-03-18 | Perplexity Computer | 20/20 PASS | Issue #22 |
| 2026-03-18 | Hermes (Claude Opus 4) | 19/20 (pre-fix) | Issue #23; 1 failure = test ordering bug (8c hit rate limiter before param check). Fixed in testkit v4. |
| 2026-03-19 | Replit Agent (post-fix) | 20/20 PASS | Verified on fresh server after testkit v4 — all fixes confirmed |
| 2026-03-18 | Claude Opus 4.6 | 14/20 (pre-fix) | Issue #25; 2 failures = same rate-limit ordering as Hermes. Fixed in testkit v4. |

---

## Report template

**Tester:** [Claude / Perplexity / Kimi / Hermes / Human / Other]
**Date:**
**Base URL tested:**
**Method:** [Automated (`curl … | bash`) / Manual]

### Mode 1 — Per-Job

| Test | Pass / Fail / Skip | Latency | Notes |
|------|-------------------|---------|-------|
| 1 — Health check | | — | |
| 2 — Create job | | — | |
| 3 — Poll before payment | | — | |
| 4 — Pay eval invoice | | — | |
| 5 — Eval state advance | | ___s | |
| 6 — Pay work + result | | ___s | |
| 7 — Demo endpoint | | ___s | |
| 8a — Missing body | | — | |
| 8b — Unknown job ID | | — | |
| 8c — Demo missing param | | — | |
| 8d — 501-char request | | — | |
| 9 — Rate limiter | | — | |
| 10 — Rejection path | | ___s | |

### Mode 2 — Session

| Test | Pass / Fail / Skip | Notes |
|------|-------------------|-------|
| 11 — Create session | | |
| 12 — Poll before payment | | |
| 13 — Pay + activate | | |
| 14 — Submit request | | |
| 15 — Reject no macaroon | | |
| 16 — Topup invoice | | |

**Overall verdict:** Pass / Partial / Fail

**Total:** PASS=___ FAIL=___ SKIP=___

**Issues found:**

**Observations on result quality:**

**Suggestions:**
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
+								# Timmy API — Test Plan & Report Prompt
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								**What is Timmy?**
 								Timmy is a Lightning Network-gated AI agent API. Users pay Bitcoin (via Lightning) to submit requests to an AI agent (Claude). Two payment modes:
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								- **Mode 1 — Per-Job (live):** Pay per request. Eval invoice (10 sats fixed) → Haiku judges the request → work invoice (dynamic, token-based) → Sonnet executes → result delivered.
 								- **Mode 2 — Session (live):** Pre-fund a credit balance. Requests automatically debit actual compute cost (eval + work tokens × 1.4 margin, converted to sats at live BTC/USD). No per-job invoices once active.
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								**Live base URL:**
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
+								```
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								https://9f85e954-647c-46a5-90a7-396e495a805a-00-clz2vhmfuk7p.spock.replit.dev
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
+								```
 								---
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								## Running the full test suite — one command
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
 								```bash
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								curl -s https://9f85e954-647c-46a5-90a7-396e495a805a-00-clz2vhmfuk7p.spock.replit.dev/api/testkit | bash
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
+								```
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								The server returns a self-contained bash script with the base URL already baked in.
 								Requirements: `curl`, `bash`, `jq` — nothing else.
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								> **Note for repeat runs:** Tests 7 and 8c hit `GET /api/demo`, which is rate-limited to 5 req/hr per IP. If you run the testkit more than once in the same hour from the same IP, those two checks will return 429. This is expected behaviour — the rate limiter is working correctly. Run from a fresh IP (or wait an hour) for a clean 20/20.
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
 								---
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								## What the testkit covers
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								### Mode 1 — Per-Job (tests 1–10)
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								| # | Name | What it checks |
 								|---|------|----------------|
 								| 1 | Health check | `GET /api/healthz` → HTTP 200, `status=ok` |
 								| 2 | Create job | `POST /api/jobs` → HTTP 201, `jobId` + `evalInvoice.amountSats=10` |
 								| 3 | Poll before payment | `GET /api/jobs/:id` → `state=awaiting_eval_payment`, invoice echoed, `paymentHash` present in stub mode |
 								| 4 | Pay eval invoice | `POST /api/dev/stub/pay/:hash` → `{"ok":true}` |
 								| 5 | Eval state advance | Polls until `state=awaiting_work_payment` OR `state=rejected` (30s timeout) |
 								| 6 | Pay work + get result | Pays work invoice, polls until `state=complete`, `result` non-empty (30s timeout) |
 								| 7 | Demo endpoint | `GET /api/demo?request=...` → HTTP 200, coherent `result` |
 								| 8a | Missing body | `POST /api/jobs {}` → HTTP 400 |
 								| 8b | Unknown job ID | `GET /api/jobs/does-not-exist` → HTTP 404 |
 								| 8c | Demo missing param | `GET /api/demo` → HTTP 400 |
 								| 8d | 501-char request | `POST /api/jobs` with 501 chars → HTTP 400 mentioning "500 characters" |
 								| 9 | Rate limiter | 6× `GET /api/demo` → at least one HTTP 429 |
 								| 10 | Rejection path | Adversarial request goes through eval, polls until `state=rejected` with a non-empty `reason` |
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								### Mode 2 — Session (tests 11–16)
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								| # | Name | What it checks |
 								|---|------|----------------|
 								| 11 | Create session | `POST /api/sessions {"amount_sats":200}` → HTTP 201, `sessionId`, `state=awaiting_payment`, `invoice.amountSats=200` |
 								| 12 | Poll before payment | `GET /api/sessions/:id` → `state=awaiting_payment` before invoice is paid |
 								| 13 | Pay deposit + activate | Pays deposit via stub, polls GET → `state=active`, `balanceSats=200`, `macaroon` present |
 								| 14 | Submit request (accepted) | `POST /api/sessions/:id/request` with valid macaroon → `state=complete` OR `state=rejected`, `debitedSats>0`, `balanceRemaining` decremented |
 								| 15 | Request without macaroon | Same endpoint, no `Authorization` header → HTTP 401 |
 								| 16 | Topup invoice creation | `POST /api/sessions/:id/topup {"amount_sats":500}` with macaroon → HTTP 200, `topup.paymentRequest` present, `topup.amountSats=500` |
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
 								---
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								## Architecture notes for reviewers
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								### Mode 1 mechanics
 								- Stub mode is active (no real Lightning node). `paymentHash` is exposed on GET responses so the testkit can drive the full payment flow automatically. In production (real LNbits), `paymentHash` is hidden.
 								- `POST /api/dev/stub/pay/:hash` is only mounted when `NODE_ENV !== 'production'`.
 								- State machine advances server-side on every GET poll — no webhooks.
 								- AI models: Haiku for eval (cheap gating), Sonnet for work (full output).
 								- **Pricing:** eval = 10 sats fixed. Work invoice = actual token usage (input + output) × Anthropic per-token rate × 1.4 margin, converted at live BTC/USD. This is dynamic — a 53-char request typically produces an invoice of ~180 sats, not a fixed tier. The old 50/100/250 sat fixed tiers were replaced by this model.
 								- Max request length: 500 chars. Rate limiter: 5 req/hr/IP on `/api/demo` (in-memory, resets on server restart).
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								### Mode 2 mechanics
 								- Minimum deposit: 100 sats. Maximum: 10,000 sats. Minimum working balance: 50 sats.
 								- Session expiry: 24 hours of inactivity. Balance is forfeited on expiry. Expiry is stated in the `expiresAt` field of every session response.
 								- Auth: `Authorization: Bearer <macaroon>` header. Macaroon is issued on first activation (GET /sessions/:id after deposit is paid).
 								- Cost per request: (eval tokens + work tokens) × model rate × 1.4 margin → converted to sats. If a request starts with enough balance but actual cost pushes balance negative, the request still completes and delivers — only the *next* request is blocked.
 								- If balance drops below 50 sats, session transitions to `paused`. Top up via `POST /sessions/:id/topup`. Session resumes automatically on the next GET poll once the topup invoice is paid.
 								- The same `POST /api/dev/stub/pay/:hash` endpoint works for all invoice types (eval, work, session deposit, topup).
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								### Eval + work latency (important for manual testers)
 								The eval call uses the real Anthropic API (Haiku), typically 2–5 seconds. The testkit uses polling loops (max 30s). Manual testers should poll with similar patience. The work call (Sonnet) typically runs 3–8 seconds.
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
 								---
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								## Test results log
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								| Date | Tester | Score | Notes |
 								|------|--------|-------|-------|
 								| 2026-03-18 | Perplexity Computer | 20/20 PASS | Issue #22 |
 								| 2026-03-18 | Hermes (Claude Opus 4) | 19/20 (pre-fix) | Issue #23; 1 failure = test ordering bug (8c hit rate limiter before param check). Fixed in testkit v4. |
 								| 2026-03-19 | Replit Agent (post-fix) | 20/20 PASS | Verified on fresh server after testkit v4 — all fixes confirmed |
-												docs: add Claude Opus 4.6 result to testkit results log (issue #25)

											
										
										
											2026-03-19 01:04:50 +00:00
+								| 2026-03-18 | Claude Opus 4.6 | 14/20 (pre-fix) | Issue #25; 2 failures = same rate-limit ordering as Hermes. Fixed in testkit v4. |
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
 								---
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
+								## Report template
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								**Tester:** [Claude / Perplexity / Kimi / Hermes / Human / Other]
 								**Date:**
 								**Base URL tested:**
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
+								**Method:** [Automated (`curl … | bash`) / Manual]
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								### Mode 1 — Per-Job
-												Add automated testing script and expose payment hashes

Integrates a new bash script for automated end-to-end testing of the Timmy API. Updates API routes to expose payment hashes in stub mode for easier invoice payment simulation during testing. Modifies test plan documentation to include the new automated script.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 6f2776b0-a913-41d3-a988-759a82feb6f3
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:30:13 +00:00
 								| Test | Pass / Fail / Skip | Latency | Notes |
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								|------|-------------------|---------|-------|
-												Add automated testing script and expose payment hashes

Integrates a new bash script for automated end-to-end testing of the Timmy API. Updates API routes to expose payment hashes in stub mode for easier invoice payment simulation during testing. Modifies test plan documentation to include the new automated script.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 6f2776b0-a913-41d3-a988-759a82feb6f3
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:30:13 +00:00
+								| 1 — Health check | | — | |
 								| 2 — Create job | | — | |
 								| 3 — Poll before payment | | — | |
 								| 4 — Pay eval invoice | | — | |
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								| 5 — Eval state advance | | ___s | |
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
+								| 6 — Pay work + result | | ___s | |
-												Add automated testing script and expose payment hashes

Integrates a new bash script for automated end-to-end testing of the Timmy API. Updates API routes to expose payment hashes in stub mode for easier invoice payment simulation during testing. Modifies test plan documentation to include the new automated script.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 6f2776b0-a913-41d3-a988-759a82feb6f3
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:30:13 +00:00
+								| 7 — Demo endpoint | | ___s | |
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
+								| 8a — Missing body | | — | |
-												Add automated testing script and expose payment hashes

Integrates a new bash script for automated end-to-end testing of the Timmy API. Updates API routes to expose payment hashes in stub mode for easier invoice payment simulation during testing. Modifies test plan documentation to include the new automated script.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 6f2776b0-a913-41d3-a988-759a82feb6f3
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:30:13 +00:00
+								| 8b — Unknown job ID | | — | |
 								| 8c — Demo missing param | | — | |
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
+								| 8d — 501-char request | | — | |
-												Add automated testing script and expose payment hashes

Integrates a new bash script for automated end-to-end testing of the Timmy API. Updates API routes to expose payment hashes in stub mode for easier invoice payment simulation during testing. Modifies test plan documentation to include the new automated script.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 6f2776b0-a913-41d3-a988-759a82feb6f3
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:30:13 +00:00
+								| 9 — Rate limiter | | — | |
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								| 10 — Rejection path | | ___s | |
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								### Mode 2 — Session
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
 								| Test | Pass / Fail / Skip | Notes |
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								|------|-------------------|-------|
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
+								| 11 — Create session | | |
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								| 12 — Poll before payment | | |
 								| 13 — Pay + activate | | |
 								| 14 — Submit request | | |
 								| 15 — Reject no macaroon | | |
 								| 16 — Topup invoice | | |
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
+								**Overall verdict:** Pass / Partial / Fail
-												fix(testkit): macOS compat + fix test 8c ordering (#24)

											
										
										
											2026-03-18 21:01:13 -04:00
+								**Total:** PASS=___ FAIL=___ SKIP=___
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
+								**Issues found:**
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
+								**Observations on result quality:**
-												Add comprehensive test plan for evaluating the AI agent's API functionality

Add a new Markdown file containing a detailed test plan and report prompt for the AI agent API, and register it in the agent assets metadata.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: baaad612-0d55-41f8-983d-e1104c552e18
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:24:32 +00:00
-												Update test plan and script for dual-mode payment system

Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08
Replit-Helium-Checkpoint-Created: true

											
										
										
											2026-03-18 17:53:21 +00:00
+								**Suggestions:**