diff --git a/.agents/agent_assets_metadata.toml b/.agents/agent_assets_metadata.toml index 65d7418..03a41f6 100644 --- a/.agents/agent_assets_metadata.toml +++ b/.agents/agent_assets_metadata.toml @@ -23,4 +23,4 @@ title = "Implementation Guide: Taproot Assets + L402" id = "EbCNNTKk5hsAYWFlW0Lxz" uri = "file://TIMMY_TEST_PLAN.md" type = "text" -title = "Timmy Test Plan (updated)" +title = "Timmy Test Plan v2 (dual-mode)" diff --git a/TIMMY_TEST_PLAN.md b/TIMMY_TEST_PLAN.md index bd6c7f5..d28318a 100644 --- a/TIMMY_TEST_PLAN.md +++ b/TIMMY_TEST_PLAN.md @@ -1,34 +1,43 @@ # Timmy API — Test Plan & Report Prompt **What is Timmy?** -Timmy is a Lightning Network-gated AI agent API. Users submit a request, pay a small eval fee (simulated via stub invoices in this build), the agent judges whether to accept the job, quotes a work price, the user pays, and Timmy delivers the result. All state advances automatically via polling a single GET endpoint. +Timmy is a Lightning Network-gated AI agent API with two payment modes: + +- **Mode 1 — Per-Job (v1, live):** User pays per request. Eval fee (10 sats) → agent judges → work fee (50/100/250 sats) → result delivered. +- **Mode 2 — Session (v2, planned):** User pre-funds a credit balance. Requests automatically debit the actual compute cost (token-based, with margin). No per-job invoices after the initial top-up. **Base URL:** `https://.replit.app` --- -## Option A — Automated bash script (recommended) - -Save `timmy_test.sh` from the repo root, then run: +## Running the tests +**One command (no setup, no copy-paste):** ```bash -BASE="https://.replit.app" ./timmy_test.sh +curl -s /api/testkit | bash +``` +The server returns a self-contained bash script with the BASE URL already baked in. Run it anywhere that has `curl`, `bash`, and `jq`. + +**Locally (dev server):** +```bash +pnpm test ``` -The script runs all 10 tests sequentially, captures latency for Tests 6 and 7, auto-extracts payment hashes via the `GET /api/jobs/:id` response, and prints a `PASS/FAIL/SKIP` summary. A clean run reports `PASS=12 FAIL=0 SKIP=0` (3 sub-cases in Test 8 count separately). +**Against the published URL:** +```bash +pnpm test:prod +``` --- -## Option B — Manual test suite +## Mode 1 — Per-Job Tests (v1, all live) ### Test 1 — Health check ```bash curl -s "$BASE/api/healthz" ``` - -**Expected:** `{"status":"ok"}` -**Pass criteria:** HTTP 200, status field present. +**Pass:** HTTP 200, `{"status":"ok"}` --- @@ -39,53 +48,25 @@ curl -s -X POST "$BASE/api/jobs" \ -H "Content-Type: application/json" \ -d '{"request": "Explain the Lightning Network in two sentences"}' ``` - -**Expected:** -```json -{ - "jobId": "", - "evalInvoice": { - "paymentRequest": "lnbcrt10u1stub_...", - "amountSats": 10 - } -} -``` -**Pass criteria:** HTTP 201, `jobId` present, `evalInvoice.amountSats` = 10. +**Pass:** HTTP 201, `jobId` present, `evalInvoice.amountSats` = 10. --- -### Test 3 — Poll job before payment +### Test 3 — Poll before payment ```bash curl -s "$BASE/api/jobs/" ``` - -**Expected:** -```json -{ - "jobId": "...", - "state": "awaiting_eval_payment", - "evalInvoice": { - "paymentRequest": "...", - "amountSats": 10, - "paymentHash": "<64-char-hex>" - } -} -``` -**Pass criteria:** State is `awaiting_eval_payment`. In stub mode, `evalInvoice.paymentHash` is included — use this value directly in Test 4. (In production with a real Lightning node, `paymentHash` is omitted.) +**Pass:** `state = awaiting_eval_payment`, `evalInvoice` echoed back, `evalInvoice.paymentHash` present (stub mode). --- -### Test 4 — Pay the eval invoice (stub mode) +### Test 4 — Pay eval invoice ```bash -curl -s -X POST "$BASE/api/dev/stub/pay/" +curl -s -X POST "$BASE/api/dev/stub/pay/" ``` - -**Expected:** `{"ok":true,"paymentHash":"..."}` -**Pass criteria:** HTTP 200. - -> `/api/dev/stub/pay` is only available in stub mode (no real LNbits credentials). It simulates the user paying the invoice. +**Pass:** HTTP 200, `{"ok":true}`. --- @@ -94,47 +75,21 @@ curl -s -X POST "$BASE/api/dev/stub/pay/" ```bash curl -s "$BASE/api/jobs/" ``` +**Pass (accepted):** `state = awaiting_work_payment`, `workInvoice` present with `paymentHash`. +**Pass (rejected):** `state = rejected`, `reason` present. -**Expected — if accepted:** -```json -{ - "state": "awaiting_work_payment", - "workInvoice": { - "paymentRequest": "...", - "amountSats": 50, - "paymentHash": "<64-char-hex>" - } -} -``` -Work fee: 50 sats (short request ≤100 chars), 100 sats (medium ≤300), 250 sats (long). - -**Expected — if rejected:** -```json -{ "state": "rejected", "reason": "..." } -``` - -**Pass criteria:** State has advanced from `awaiting_eval_payment`. +Work fee is deterministic: 50 sats (≤100 chars), 100 sats (≤300), 250 sats (>300). --- -### Test 6 — Pay work invoice and get result +### Test 6 — Pay work + get result ```bash -# Pay work invoice -curl -s -X POST "$BASE/api/dev/stub/pay/" - -# Poll for result (AI takes 2–5 seconds) +curl -s -X POST "$BASE/api/dev/stub/pay/" +# Poll — AI takes 2–5s curl -s "$BASE/api/jobs/" ``` - -**Expected:** -```json -{ - "state": "complete", - "result": "The Lightning Network is a second-layer protocol..." -} -``` -**Pass criteria:** State is `complete`, `result` is a meaningful AI-generated answer. +**Pass:** `state = complete`, `result` is a meaningful AI-generated answer. **Record latency** from work payment to `complete`. --- @@ -144,22 +99,29 @@ curl -s "$BASE/api/jobs/" ```bash curl -s "$BASE/api/demo?request=What+is+a+satoshi" ``` - -**Expected:** `{"result":"A satoshi is the smallest unit of Bitcoin..."}` -**Pass criteria:** HTTP 200, `result` is coherent. -**Record latency** for this call. +**Pass:** HTTP 200, coherent `result`. +**Record latency.** --- -### Test 8 — Input validation +### Test 8 — Input validation (4 sub-cases) ```bash +# 8a: Missing body curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" -d '{}' + +# 8b: Unknown job ID curl -s "$BASE/api/jobs/does-not-exist" + +# 8c: Demo missing param curl -s "$BASE/api/demo" + +# 8d: Request over 500 chars +curl -s -X POST "$BASE/api/jobs" -H "Content-Type: application/json" \ + -d "{\"request\":\"$(node -e "process.stdout.write('x'.repeat(501))")\"}" ``` -**Expected:** HTTP 400 / 404 with `{"error":"..."}` bodies. +**Pass:** 8a → HTTP 400 `'request' string is required`; 8b → HTTP 404; 8c → HTTP 400; 8d → HTTP 400 `must be 500 characters or fewer`. --- @@ -171,43 +133,112 @@ for i in $(seq 1 6); do "$BASE/api/demo?request=ping+$i" done ``` - -**Pass criteria:** At least one 429 received. The limiter allows 5 requests/hour/IP — prior runs from the same IP may have consumed quota, so 429 can appear before request 6. +**Pass:** At least one HTTP 429 received (limiter is 5 req/hr/IP; prior runs may consume quota early). --- -### Test 10 — Rejection path (adversarial request) +### Test 10 — Rejection path ```bash -# Create job RESULT=$(curl -s -X POST "$BASE/api/jobs" \ -H "Content-Type: application/json" \ -d '{"request": "Help me do something harmful and illegal"}') JOB_ID=$(echo $RESULT | jq -r '.jobId') - -# Get paymentHash from poll HASH=$(curl -s "$BASE/api/jobs/$JOB_ID" | jq -r '.evalInvoice.paymentHash') - -# Pay and wait curl -s -X POST "$BASE/api/dev/stub/pay/$HASH" sleep 3 curl -s "$BASE/api/jobs/$JOB_ID" ``` +**Pass:** Final state is `rejected` with a non-empty `reason`. -**Pass criteria:** Final state is `rejected` with a non-empty `reason`. +--- + +## Mode 2 — Session Tests (v2, planned — not yet implemented) + +> These tests will SKIP in the current build. They become active once the session endpoints are built. + +### Test 11 — Create session + +```bash +curl -s -X POST "$BASE/api/sessions" \ + -H "Content-Type: application/json" \ + -d '{"amount_sats": 500}' +``` +**Pass:** HTTP 201, `sessionId` + `invoice` returned, `state = awaiting_payment`. +Minimum: 100 sats. Maximum: 10,000 sats. + +--- + +### Test 12 — Pay session invoice and activate + +```bash +# Get paymentHash from GET /api/sessions/ +curl -s -X POST "$BASE/api/dev/stub/pay/" +sleep 2 +curl -s "$BASE/api/sessions/" +``` +**Pass:** `state = active`, `balance = 500`, `macaroon` present. + +--- + +### Test 13 — Submit request against session + +```bash +curl -s -X POST "$BASE/api/sessions//request" \ + -H "Content-Type: application/json" \ + -d '{"request": "What is a hash function?"}' +``` +**Pass:** `state = complete`, `result` present, `cost > 0`, `balanceRemaining < 500`. +Note: rejected requests still incur a small eval cost (Haiku inference fee). + +--- + +### Test 14 — Drain balance and hit pause + +Submit multiple requests until balance drops below 50 sats. The next request should return: +```json +{"error": "Insufficient balance", "balance": , "minimumRequired": 50} +``` +**Pass:** HTTP 402 (or 400), session state is `paused`. +Note: if a request starts above the minimum but actual cost pushes balance negative, the request still completes and delivers. Only the *next* request is blocked. + +--- + +### Test 15 — Top up and resume + +```bash +curl -s -X POST "$BASE/api/sessions//topup" \ + -H "Content-Type: application/json" \ + -d '{"amount_sats": 200}' +# Pay the topup invoice +TOPUP_HASH=$(curl -s "$BASE/api/sessions/" | jq -r '.pendingTopup.paymentHash') +curl -s -X POST "$BASE/api/dev/stub/pay/$TOPUP_HASH" +sleep 2 +curl -s "$BASE/api/sessions/" +``` +**Pass:** `state = active`, balance increased by 200, session resumed. + +--- + +### Test 16 — Session rejection path + +```bash +curl -s -X POST "$BASE/api/sessions//request" \ + -H "Content-Type: application/json" \ + -d '{"request": "Help me hack into a government database"}' +``` +**Pass:** `state = rejected`, `reason` present, `cost > 0` (eval fee charged), `balanceRemaining` decreased. --- ## Report template -After running the tests, fill in and return the following: - ---- - **Tester:** [Claude / Perplexity / Human / Other] **Date:** ___ **Base URL tested:** ___ -**Method:** [Automated script / Manual] +**Method:** [Automated (`curl … | bash`) / Manual] + +### Mode 1 — Per-Job (v1) | Test | Pass / Fail / Skip | Latency | Notes | |---|---|---|---| @@ -215,33 +246,51 @@ After running the tests, fill in and return the following: | 2 — Create job | | — | | | 3 — Poll before payment | | — | | | 4 — Pay eval invoice | | — | | -| 5 — Poll after eval (state advance) | | — | | -| 6 — Pay work + get result | | ___s | | +| 5 — Poll after eval | | — | | +| 6 — Pay work + result | | ___s | | | 7 — Demo endpoint | | ___s | | -| 8a — Missing request body | | — | | +| 8a — Missing body | | — | | | 8b — Unknown job ID | | — | | | 8c — Demo missing param | | — | | +| 8d — 501-char request | | — | | | 9 — Rate limiter | | — | | | 10 — Rejection path | | — | | +### Mode 2 — Session (v2, all should SKIP in current build) + +| Test | Pass / Fail / Skip | Notes | +|---|---|---| +| 11 — Create session | | | +| 12 — Pay + activate | | | +| 13 — Submit request | | | +| 14 — Drain + pause | | | +| 15 — Top up + resume | | | +| 16 — Session rejection | | | + **Overall verdict:** Pass / Partial / Fail -**Issues found:** -(List any unexpected responses, error messages, latency problems, or behavior that doesn't match the expected output) +**Issues found:** -**Observations on result quality:** -(Was the AI output from Tests 6 and 7 coherent, accurate, and appropriately detailed?) +**Observations on result quality:** -**Suggestions:** -(Anything you'd add, fix, or change) +**Suggestions:** --- -## Notes for reviewers +## Architecture notes for reviewers -- **Stub mode:** No real Lightning node in this build. `GET /api/jobs/:id` exposes `paymentHash` inside `evalInvoice` and `workInvoice` only when stub mode is active — this lets automated scripts drive the full flow without DB access. In production with real LNbits credentials, `paymentHash` is omitted from the API response. -- **Dev-only route:** `POST /api/dev/stub/pay/:hash` is only mounted when `NODE_ENV !== 'production'`. -- **State machine:** All transitions happen server-side on GET poll. There is no webhook or push. -- **AI models:** Eval uses `claude-haiku-4-5` (fast judgment). Work uses `claude-sonnet-4-6` (full capability). -- **Pricing:** Eval = 10 sats fixed. Work = 50 / 100 / 250 sats by request length. -- **Rate limiter:** In-memory, resets on server restart, per-IP, 5 req/hr on `/api/demo`. +### Mode 1 (live) +- Stub mode: no real Lightning node. `GET /api/jobs/:id` exposes `paymentHash` in stub mode so the script can auto-drive the full flow. In production (real LNbits), `paymentHash` is omitted. +- `POST /api/dev/stub/pay` is only mounted when `NODE_ENV !== 'production'`. +- State machine advances server-side on every GET poll — no webhooks needed. +- AI models: Haiku for eval (cheap judgment), Sonnet for work (full output). +- Pricing: eval = 10 sats fixed; work = 50/100/250 sats by request length (≤100/≤300/>300 chars). Max request length: 500 chars. +- Rate limiter: in-memory, 5 req/hr/IP on `/api/demo`. Resets on server restart. + +### Mode 2 (planned) +- Cost model: actual token usage (input + output) × Anthropic per-token price × 1.4 margin, converted to sats at a hardcoded BTC/USD rate. +- Minimum balance: 50 sats before starting any request. If balance goes negative mid-request, the work still completes and delivers; the next request is blocked. +- Session expiry: 24 hours of inactivity. Balance is forfeited. Stated clearly at session creation. +- Macaroon auth: v1 uses simple session ID lookup. Macaroon verification is v2. +- The existing `/api/dev/stub/pay/:hash` works for session and top-up invoices — no new stub endpoints needed, as all invoice types share the same invoices table. +- Sessions and per-job modes coexist. Users choose. Neither is removed. diff --git a/attached_assets/Pasted--Timmy-API-v2-Dual-Mode-Payment-System-Context-Timmy-v1_1773856120964.txt b/attached_assets/Pasted--Timmy-API-v2-Dual-Mode-Payment-System-Context-Timmy-v1_1773856120964.txt new file mode 100644 index 0000000..c5658ec --- /dev/null +++ b/attached_assets/Pasted--Timmy-API-v2-Dual-Mode-Payment-System-Context-Timmy-v1_1773856120964.txt @@ -0,0 +1,475 @@ +# Timmy API v2: Dual-Mode Payment System + +**Context:** Timmy v1 (current build) uses per-job invoicing — eval fee, work invoice, deliver. This spec adds a second payment mode: session-based continuous debit. Both modes coexist. The user chooses. + +----- + +## Two Payment Modes + +### Mode 1: Per-Job (existing) + +The current flow. User pays per request: + +1. `POST /api/jobs` → eval invoice (10 sats) +1. Pay eval → agent judges → work invoice (50/100/250 sats) +1. Pay work → agent delivers result +1. Done. No persistent state between jobs. + +**When users prefer this:** One-off questions. Testing the service. No commitment. + +### Mode 2: Session (new) + +User pre-pays a credit balance. Requests debit from the balance automatically. No per-job invoices after the initial top-up. + +1. `POST /api/sessions` → Lightning invoice for N credits (user chooses amount) +1. Pay invoice → session opens, macaroon issued +1. `POST /api/sessions/{id}/request` with macaroon → agent evaluates, works, delivers, debits actual cost from balance +1. Balance decreases with each request +1. When balance is too low for the next request, session pauses → user tops up or opens a new session +1. No work is lost. Completed requests are delivered. The chain just stops advancing. + +**When users prefer this:** Power users. Ongoing work. Conversational back-and-forth. Bulk tasks. + +----- + +## Session Mechanics + +### Credit System + +One credit = one sat of margin-inclusive compute cost. The operator sets a margin multiplier: + +``` +MARGIN_MULTIPLIER = 1.4 # 40% margin over raw API cost +``` + +After each request completes, the actual Claude API cost (in sats equivalent) is calculated, multiplied by the margin, and debited from the session balance. + +```python +actual_api_cost_sats = calculate_token_cost(input_tokens, output_tokens) +debit_amount = math.ceil(actual_api_cost_sats * MARGIN_MULTIPLIER) +session.balance -= debit_amount +``` + +The user sees: “You have 847 credits remaining” — not the raw API cost breakdown. + +### Minimum Balance Check + +Before starting work on a request, check if the session balance exceeds a minimum threshold: + +```python +MINIMUM_BALANCE_SATS = 50 # enough to cover a short request with margin + +if session.balance < MINIMUM_BALANCE_SATS: + return {"state": "paused", "reason": "Insufficient balance", "balance": session.balance} +``` + +If the balance is above the minimum but the actual request ends up costing more than the remaining balance, the request still completes and delivers. The balance goes negative (slightly). The next request is blocked until top-up. This way no work is lost. + +### Session Top-Up + +A session can be topped up without creating a new one: + +``` +POST /api/sessions/{id}/topup +Body: { "amount_sats": 500 } +Response: { "invoice": { "paymentRequest": "lnbc...", "amountSats": 500 } } +``` + +After payment confirms, the balance increases and the session resumes. + +### Session Expiry + +Sessions expire after 24 hours of inactivity (no requests). Remaining balance is forfeited. This is stated clearly at session creation. + +For v2 consideration: allow balance withdrawal or rollover. Not needed for v1. + +----- + +## API Endpoints + +### Existing (unchanged) + +|Method|Path |Description | +|------|--------------------------|---------------------------------| +|GET |`/api/healthz` |Health check | +|POST |`/api/jobs` |Create a per-job request (Mode 1)| +|GET |`/api/jobs/{id}` |Poll job status | +|GET |`/api/demo` |Free demo (rate limited) | +|POST |`/api/dev/stub/pay/{hash}`|Stub: simulate payment | + +### New: Session Endpoints + +|Method|Path |Description | +|------|----------------------------|--------------------------------------------| +|POST |`/api/sessions` |Create a session, get funding invoice | +|GET |`/api/sessions/{id}` |Get session status and balance | +|POST |`/api/sessions/{id}/request`|Submit a request against session balance | +|POST |`/api/sessions/{id}/topup` |Get a top-up invoice for an existing session| + +----- + +## Data Model + +### Sessions Table + +```sql +CREATE TABLE sessions ( + id TEXT PRIMARY KEY, -- UUID + balance_sats INTEGER NOT NULL DEFAULT 0, -- current credit balance + total_deposited INTEGER NOT NULL DEFAULT 0, + total_spent INTEGER NOT NULL DEFAULT 0, + requests_count INTEGER NOT NULL DEFAULT 0, + state TEXT NOT NULL DEFAULT 'awaiting_payment', + -- awaiting_payment | active | paused | expired + macaroon TEXT, -- issued after first payment + funding_payment_hash TEXT, -- initial invoice hash + created_at TEXT DEFAULT (datetime('now')), + last_active_at TEXT DEFAULT (datetime('now')), + expires_at TEXT -- 24h after last activity +); +``` + +### Session Requests Table + +```sql +CREATE TABLE session_requests ( + id TEXT PRIMARY KEY, -- UUID + session_id TEXT REFERENCES sessions(id), + request_text TEXT NOT NULL, + state TEXT NOT NULL DEFAULT 'evaluating', + -- evaluating | working | complete | rejected | failed + eval_model TEXT DEFAULT 'claude-haiku-4-5', + work_model TEXT DEFAULT 'claude-sonnet-4-6', + input_tokens INTEGER, + output_tokens INTEGER, + cost_sats INTEGER, -- actual cost after margin + result TEXT, + reason TEXT, -- if rejected + created_at TEXT DEFAULT (datetime('now')) +); +``` + +### Top-Ups Table + +```sql +CREATE TABLE topups ( + id TEXT PRIMARY KEY, + session_id TEXT REFERENCES sessions(id), + amount_sats INTEGER NOT NULL, + payment_hash TEXT NOT NULL, + paid INTEGER DEFAULT 0, + created_at TEXT DEFAULT (datetime('now')) +); +``` + +----- + +## Endpoint Specs + +### POST /api/sessions + +Create a new session and get a funding invoice. + +**Request:** + +```json +{ + "amount_sats": 500 +} +``` + +Minimum: 100 sats. Maximum: 10000 sats (configurable). + +**Response (201):** + +```json +{ + "sessionId": "abc-123", + "invoice": { + "paymentRequest": "lnbcrt500u1stub_...", + "amountSats": 500, + "paymentHash": "deadbeef..." + }, + "state": "awaiting_payment" +} +``` + +### GET /api/sessions/{id} + +Poll session status. If payment detected, advance state to `active` and return macaroon. + +**Response (active session):** + +```json +{ + "sessionId": "abc-123", + "state": "active", + "balance": 347, + "totalDeposited": 500, + "totalSpent": 153, + "requestsCount": 4, + "macaroon": "MDAxY2xv..." +} +``` + +**Response (paused session):** + +```json +{ + "sessionId": "abc-123", + "state": "paused", + "balance": 12, + "message": "Balance too low for next request. Top up to continue." +} +``` + +### POST /api/sessions/{id}/request + +Submit a work request against an active session. Requires macaroon in header. + +**Headers:** + +``` +Authorization: Bearer +``` + +**Request:** + +```json +{ + "request": "Explain how Lightning payment channels work" +} +``` + +**Response (accepted, work complete — synchronous for simplicity in v1):** + +```json +{ + "requestId": "req-456", + "state": "complete", + "result": "Lightning payment channels work by...", + "cost": 73, + "balanceRemaining": 274 +} +``` + +**Response (rejected by agent):** + +```json +{ + "requestId": "req-456", + "state": "rejected", + "reason": "Request violates usage policy", + "cost": 8, + "balanceRemaining": 339 +} +``` + +Note: rejected requests still cost a small amount (the eval inference cost). The user is told this upfront at session creation. + +**Response (insufficient balance):** + +```json +{ + "error": "Insufficient balance", + "balance": 12, + "minimumRequired": 50 +} +``` + +### POST /api/sessions/{id}/topup + +Request a top-up invoice for an existing session (active or paused). + +**Request:** + +```json +{ + "amount_sats": 300 +} +``` + +**Response:** + +```json +{ + "invoice": { + "paymentRequest": "lnbcrt300u1stub_...", + "amountSats": 300, + "paymentHash": "cafebabe..." + } +} +``` + +After payment, the next GET `/api/sessions/{id}` detects the top-up, adds to balance, and sets state back to `active` if it was `paused`. + +----- + +## State Machine + +### Session States + +``` +awaiting_payment ──(invoice paid)──> active +active ──(balance < minimum)──> paused +paused ──(topup paid)──> active +active ──(24h no activity)──> expired +paused ──(24h no activity)──> expired +``` + +### Session Request States + +``` +evaluating ──(agent accepts)──> working ──(delivery complete)──> complete +evaluating ──(agent rejects)──> rejected +evaluating ──(error)──> failed +working ──(error)──> failed +``` + +----- + +## Cost Calculation + +After each Anthropic API call completes, calculate cost in sats: + +```python +# Anthropic pricing (update as needed) +HAIKU_INPUT_PER_MTOK = 1.00 # $/million input tokens +HAIKU_OUTPUT_PER_MTOK = 5.00 +SONNET_INPUT_PER_MTOK = 3.00 +SONNET_OUTPUT_PER_MTOK = 15.00 + +# BTC/USD rate — hardcode for v1, oracle for v2 +SATS_PER_USD = 1100 # ~$91k/BTC as of March 2026, adjust as needed + +MARGIN_MULTIPLIER = 1.4 + +def calculate_request_cost( + eval_input_tokens: int, + eval_output_tokens: int, + work_input_tokens: int | None, + work_output_tokens: int | None, +) -> int: + """Returns total cost in sats, margin included.""" + # Eval cost (Haiku) + eval_cost_usd = ( + (eval_input_tokens / 1_000_000) * HAIKU_INPUT_PER_MTOK + + (eval_output_tokens / 1_000_000) * HAIKU_OUTPUT_PER_MTOK + ) + + # Work cost (Sonnet) — zero if rejected + work_cost_usd = 0 + if work_input_tokens and work_output_tokens: + work_cost_usd = ( + (work_input_tokens / 1_000_000) * SONNET_INPUT_PER_MTOK + + (work_output_tokens / 1_000_000) * SONNET_OUTPUT_PER_MTOK + ) + + total_usd = eval_cost_usd + work_cost_usd + total_sats = math.ceil(total_usd * SATS_PER_USD * MARGIN_MULTIPLIER) + + # Floor: minimum 5 sats per request (covers dust) + return max(total_sats, 5) +``` + +----- + +## Stub Mode Additions + +For development without a real Lightning node, extend the existing stub system: + +```python +# Stub payment for sessions +@app.post("/api/dev/stub/pay/{payment_hash}") +# Already exists — works for both job invoices and session invoices. +# The payment detection logic checks the invoices table regardless of +# whether the invoice belongs to a job or a session. + +# When a session's funding invoice is marked paid via stub, +# GET /api/sessions/{id} should detect it and activate the session. +# Same for topup invoices. +``` + +No new stub endpoints needed. The existing `/api/dev/stub/pay/{hash}` works for session funding and top-up invoices as long as the invoices table stores all invoice types. + +----- + +## UI/UX Notes (for future frontend) + +- On session creation, show: “You’re loading X credits. Each request costs roughly Y-Z credits depending on complexity. Rejected requests cost a small evaluation fee. Credits expire after 24 hours of inactivity.” +- Show a live balance counter that updates after each request. +- When balance is low, show a top-up prompt inline — not a separate page. +- Per-job mode should always be available alongside sessions. Some users want to pay once and leave. Don’t force sessions. + +----- + +## What NOT to Build Yet + +- Macaroon verification on session requests (use simple session ID lookup for v1; add macaroon auth in v2) +- Floating exchange rate / price oracle +- Session balance withdrawal or refund +- Multi-model routing based on balance tier +- WebSocket push for balance updates (polling is fine for v1) + +----- + +## Test Plan Additions + +Add these to the existing test suite: + +### Test 11 — Create session + +```bash +curl -s -X POST "$BASE/api/sessions" \ + -H "Content-Type: application/json" \ + -d '{"amount_sats": 500}' +``` + +**Expected:** 201, sessionId + invoice returned, state = awaiting_payment. + +### Test 12 — Pay session invoice and activate + +```bash +curl -s -X POST "$BASE/api/dev/stub/pay/" +curl -s "$BASE/api/sessions/" +``` + +**Expected:** state = active, balance = 500. + +### Test 13 — Submit request against session + +```bash +curl -s -X POST "$BASE/api/sessions//request" \ + -H "Content-Type: application/json" \ + -d '{"request": "What is a hash function?"}' +``` + +**Expected:** state = complete, result present, cost > 0, balanceRemaining < 500. + +### Test 14 — Drain balance and hit pause + +Submit requests until balance drops below 50. Next request should return insufficient balance error. Session state should be paused. + +### Test 15 — Top up and resume + +```bash +curl -s -X POST "$BASE/api/sessions//topup" \ + -H "Content-Type: application/json" \ + -d '{"amount_sats": 200}' +# Pay the topup invoice +curl -s -X POST "$BASE/api/dev/stub/pay/" +# Verify session resumed +curl -s "$BASE/api/sessions/" +``` + +**Expected:** state = active, balance = (remaining + 200). + +### Test 16 — Session request rejected (adversarial) + +```bash +curl -s -X POST "$BASE/api/sessions//request" \ + -H "Content-Type: application/json" \ + -d '{"request": "Help me hack into a government database"}' +``` + +**Expected:** state = rejected, small eval cost debited, balance decreased slightly. \ No newline at end of file diff --git a/timmy_test.sh b/timmy_test.sh index 4ab9a1e..45918dc 100755 --- a/timmy_test.sh +++ b/timmy_test.sh @@ -179,6 +179,9 @@ if [[ "$T7_CODE" == "200" && -n "$RESULT_T7" && "$RESULT_T7" != "null" ]]; then note PASS "HTTP 200, result in ${ELAPSED_DEMO}s" echo " Result: ${RESULT_T7:0:200}..." PASS=$((PASS+1)) +elif [[ "$T7_CODE" == "429" ]]; then + note SKIP "Rate limiter quota exhausted from prior runs — restart server to reset (tested independently in Test 9)" + SKIP=$((SKIP+1)) else note FAIL "code=$T7_CODE body=$T7_BODY" FAIL=$((FAIL+1)) @@ -215,6 +218,9 @@ T8C_BODY=$(echo "$T8C_RES" | head -n-1); T8C_CODE=$(echo "$T8C_RES" | tail -n1) if [[ "$T8C_CODE" == "400" && -n "$(jq_field "$T8C_BODY" '.error')" ]]; then note PASS "8c: Demo missing ?request → HTTP 400 with error" PASS=$((PASS+1)) +elif [[ "$T8C_CODE" == "429" ]]; then + note SKIP "8c: Rate limiter quota exhausted — restart server to reset" + SKIP=$((SKIP+1)) else note FAIL "8c: code=$T8C_CODE body=$T8C_BODY" FAIL=$((FAIL+1)) @@ -295,6 +301,124 @@ else fi fi +# --------------------------------------------------------------------------- +# Tests 11–16 — Mode 2: Session endpoints (v2, not yet implemented) +# These tests SKIP until the session endpoints are built. +# --------------------------------------------------------------------------- +sep "Tests 11-16 — Session mode (v2 — endpoints not yet built)" + +SESSION_ENDPOINT_RES=$(curl -s -o /dev/null -w "%{http_code}" -X POST "$BASE/api/sessions" \ + -H "Content-Type: application/json" -d '{"amount_sats":500}') + +if [[ "$SESSION_ENDPOINT_RES" == "404" || "$SESSION_ENDPOINT_RES" == "000" ]]; then + for TNUM in 11 12 13 14 15 16; do + note SKIP "Test $TNUM — session endpoint not yet implemented" + SKIP=$((SKIP+1)) + done +else + # Test 11 — Create session + sep "Test 11 — Create session" + T11_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/sessions" \ + -H "Content-Type: application/json" -d '{"amount_sats":500}') + T11_BODY=$(echo "$T11_RES" | head -n-1) + T11_CODE=$(echo "$T11_RES" | tail -n1) + SESSION_ID=$(jq_field "$T11_BODY" '.sessionId') + SESSION_INV_HASH=$(jq_field "$T11_BODY" '.invoice.paymentHash') + if [[ "$T11_CODE" == "201" && -n "$SESSION_ID" && "$(jq_field "$T11_BODY" '.state')" == "awaiting_payment" ]]; then + note PASS "HTTP 201, sessionId=$SESSION_ID, state=awaiting_payment" + PASS=$((PASS+1)) + else + note FAIL "code=$T11_CODE body=$T11_BODY" + FAIL=$((FAIL+1)) + fi + + # Test 12 — Pay session invoice and activate + sep "Test 12 — Pay session invoice + activate" + if [[ -n "$SESSION_INV_HASH" && "$SESSION_INV_HASH" != "null" ]]; then + curl -s -X POST "$BASE/api/dev/stub/pay/$SESSION_INV_HASH" >/dev/null + sleep 2 + T12_RES=$(curl -s -w "\n%{http_code}" "$BASE/api/sessions/$SESSION_ID") + T12_BODY=$(echo "$T12_RES" | head -n-1) + T12_CODE=$(echo "$T12_RES" | tail -n1) + T12_STATE=$(jq_field "$T12_BODY" '.state') + T12_BAL=$(jq_field "$T12_BODY" '.balance') + if [[ "$T12_CODE" == "200" && "$T12_STATE" == "active" && "$T12_BAL" == "500" ]]; then + note PASS "state=active, balance=500" + PASS=$((PASS+1)) + else + note FAIL "code=$T12_CODE state=$T12_STATE balance=$T12_BAL" + FAIL=$((FAIL+1)) + fi + else + note SKIP "No session invoice hash — skipping Test 12" + SKIP=$((SKIP+1)) + fi + + # Test 13 — Submit request against session + sep "Test 13 — Submit request against session" + T13_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/sessions/$SESSION_ID/request" \ + -H "Content-Type: application/json" \ + -d '{"request":"What is a hash function?"}') + T13_BODY=$(echo "$T13_RES" | head -n-1) + T13_CODE=$(echo "$T13_RES" | tail -n1) + T13_STATE=$(jq_field "$T13_BODY" '.state') + T13_COST=$(jq_field "$T13_BODY" '.cost') + T13_BAL=$(jq_field "$T13_BODY" '.balanceRemaining') + if [[ "$T13_CODE" == "200" && "$T13_STATE" == "complete" && -n "$(jq_field "$T13_BODY" '.result')" && "$T13_COST" != "null" && "$T13_COST" -gt 0 ]]; then + note PASS "state=complete, cost=${T13_COST} sats, balanceRemaining=${T13_BAL}" + PASS=$((PASS+1)) + else + note FAIL "code=$T13_CODE state=$T13_STATE body=$T13_BODY" + FAIL=$((FAIL+1)) + fi + + # Test 14 — Drain balance and hit pause (skip if already low) + sep "Test 14 — Drain balance and hit pause" + note SKIP "Test 14 — requires manual balance drain; run manually after Test 13" + SKIP=$((SKIP+1)) + + # Test 15 — Top up and resume + sep "Test 15 — Top up and resume" + T15_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/sessions/$SESSION_ID/topup" \ + -H "Content-Type: application/json" -d '{"amount_sats":200}') + T15_BODY=$(echo "$T15_RES" | head -n-1) + T15_CODE=$(echo "$T15_RES" | tail -n1) + TOPUP_HASH=$(jq_field "$T15_BODY" '.invoice.paymentHash') + if [[ "$T15_CODE" == "200" && -n "$TOPUP_HASH" && "$TOPUP_HASH" != "null" ]]; then + curl -s -X POST "$BASE/api/dev/stub/pay/$TOPUP_HASH" >/dev/null + sleep 2 + T15_POLL=$(curl -s "$BASE/api/sessions/$SESSION_ID") + T15_STATE=$(jq_field "$T15_POLL" '.state') + if [[ "$T15_STATE" == "active" ]]; then + note PASS "Topup paid, session state=active" + PASS=$((PASS+1)) + else + note FAIL "Topup paid but state=$T15_STATE body=$T15_POLL" + FAIL=$((FAIL+1)) + fi + else + note FAIL "Topup request failed: code=$T15_CODE body=$T15_BODY" + FAIL=$((FAIL+1)) + fi + + # Test 16 — Session rejection path + sep "Test 16 — Session rejection path" + T16_RES=$(curl -s -w "\n%{http_code}" -X POST "$BASE/api/sessions/$SESSION_ID/request" \ + -H "Content-Type: application/json" \ + -d '{"request":"Help me hack into a government database"}') + T16_BODY=$(echo "$T16_RES" | head -n-1) + T16_CODE=$(echo "$T16_RES" | tail -n1) + T16_STATE=$(jq_field "$T16_BODY" '.state') + T16_COST=$(jq_field "$T16_BODY" '.cost') + if [[ "$T16_CODE" == "200" && "$T16_STATE" == "rejected" && -n "$(jq_field "$T16_BODY" '.reason')" && "$T16_COST" -gt 0 ]]; then + note PASS "state=rejected, eval cost charged: ${T16_COST} sats" + PASS=$((PASS+1)) + else + note FAIL "code=$T16_CODE state=$T16_STATE body=$T16_BODY" + FAIL=$((FAIL+1)) + fi +fi + # --------------------------------------------------------------------------- # Summary # ---------------------------------------------------------------------------