Refactor TIMMY_TEST_PLAN.md and timmy_test.sh to support dual-mode payments (per-job and session-based). Add new tests for session endpoints and gracefully handle rate limiting in existing tests. Replit-Commit-Author: Agent Replit-Commit-Session-Id: 418bf6f8-212b-4bb0-a7a5-8231a061da4e Replit-Commit-Checkpoint-Type: full_checkpoint Replit-Commit-Event-Id: 290ed20c-1ddc-4b42-810d-8415dd3a9c08 Replit-Helium-Checkpoint-Created: true
475 lines
13 KiB
Plaintext
475 lines
13 KiB
Plaintext
# Timmy API v2: Dual-Mode Payment System
|
||
|
||
**Context:** Timmy v1 (current build) uses per-job invoicing — eval fee, work invoice, deliver. This spec adds a second payment mode: session-based continuous debit. Both modes coexist. The user chooses.
|
||
|
||
-----
|
||
|
||
## Two Payment Modes
|
||
|
||
### Mode 1: Per-Job (existing)
|
||
|
||
The current flow. User pays per request:
|
||
|
||
1. `POST /api/jobs` → eval invoice (10 sats)
|
||
1. Pay eval → agent judges → work invoice (50/100/250 sats)
|
||
1. Pay work → agent delivers result
|
||
1. Done. No persistent state between jobs.
|
||
|
||
**When users prefer this:** One-off questions. Testing the service. No commitment.
|
||
|
||
### Mode 2: Session (new)
|
||
|
||
User pre-pays a credit balance. Requests debit from the balance automatically. No per-job invoices after the initial top-up.
|
||
|
||
1. `POST /api/sessions` → Lightning invoice for N credits (user chooses amount)
|
||
1. Pay invoice → session opens, macaroon issued
|
||
1. `POST /api/sessions/{id}/request` with macaroon → agent evaluates, works, delivers, debits actual cost from balance
|
||
1. Balance decreases with each request
|
||
1. When balance is too low for the next request, session pauses → user tops up or opens a new session
|
||
1. No work is lost. Completed requests are delivered. The chain just stops advancing.
|
||
|
||
**When users prefer this:** Power users. Ongoing work. Conversational back-and-forth. Bulk tasks.
|
||
|
||
-----
|
||
|
||
## Session Mechanics
|
||
|
||
### Credit System
|
||
|
||
One credit = one sat of margin-inclusive compute cost. The operator sets a margin multiplier:
|
||
|
||
```
|
||
MARGIN_MULTIPLIER = 1.4 # 40% margin over raw API cost
|
||
```
|
||
|
||
After each request completes, the actual Claude API cost (in sats equivalent) is calculated, multiplied by the margin, and debited from the session balance.
|
||
|
||
```python
|
||
actual_api_cost_sats = calculate_token_cost(input_tokens, output_tokens)
|
||
debit_amount = math.ceil(actual_api_cost_sats * MARGIN_MULTIPLIER)
|
||
session.balance -= debit_amount
|
||
```
|
||
|
||
The user sees: “You have 847 credits remaining” — not the raw API cost breakdown.
|
||
|
||
### Minimum Balance Check
|
||
|
||
Before starting work on a request, check if the session balance exceeds a minimum threshold:
|
||
|
||
```python
|
||
MINIMUM_BALANCE_SATS = 50 # enough to cover a short request with margin
|
||
|
||
if session.balance < MINIMUM_BALANCE_SATS:
|
||
return {"state": "paused", "reason": "Insufficient balance", "balance": session.balance}
|
||
```
|
||
|
||
If the balance is above the minimum but the actual request ends up costing more than the remaining balance, the request still completes and delivers. The balance goes negative (slightly). The next request is blocked until top-up. This way no work is lost.
|
||
|
||
### Session Top-Up
|
||
|
||
A session can be topped up without creating a new one:
|
||
|
||
```
|
||
POST /api/sessions/{id}/topup
|
||
Body: { "amount_sats": 500 }
|
||
Response: { "invoice": { "paymentRequest": "lnbc...", "amountSats": 500 } }
|
||
```
|
||
|
||
After payment confirms, the balance increases and the session resumes.
|
||
|
||
### Session Expiry
|
||
|
||
Sessions expire after 24 hours of inactivity (no requests). Remaining balance is forfeited. This is stated clearly at session creation.
|
||
|
||
For v2 consideration: allow balance withdrawal or rollover. Not needed for v1.
|
||
|
||
-----
|
||
|
||
## API Endpoints
|
||
|
||
### Existing (unchanged)
|
||
|
||
|Method|Path |Description |
|
||
|------|--------------------------|---------------------------------|
|
||
|GET |`/api/healthz` |Health check |
|
||
|POST |`/api/jobs` |Create a per-job request (Mode 1)|
|
||
|GET |`/api/jobs/{id}` |Poll job status |
|
||
|GET |`/api/demo` |Free demo (rate limited) |
|
||
|POST |`/api/dev/stub/pay/{hash}`|Stub: simulate payment |
|
||
|
||
### New: Session Endpoints
|
||
|
||
|Method|Path |Description |
|
||
|------|----------------------------|--------------------------------------------|
|
||
|POST |`/api/sessions` |Create a session, get funding invoice |
|
||
|GET |`/api/sessions/{id}` |Get session status and balance |
|
||
|POST |`/api/sessions/{id}/request`|Submit a request against session balance |
|
||
|POST |`/api/sessions/{id}/topup` |Get a top-up invoice for an existing session|
|
||
|
||
-----
|
||
|
||
## Data Model
|
||
|
||
### Sessions Table
|
||
|
||
```sql
|
||
CREATE TABLE sessions (
|
||
id TEXT PRIMARY KEY, -- UUID
|
||
balance_sats INTEGER NOT NULL DEFAULT 0, -- current credit balance
|
||
total_deposited INTEGER NOT NULL DEFAULT 0,
|
||
total_spent INTEGER NOT NULL DEFAULT 0,
|
||
requests_count INTEGER NOT NULL DEFAULT 0,
|
||
state TEXT NOT NULL DEFAULT 'awaiting_payment',
|
||
-- awaiting_payment | active | paused | expired
|
||
macaroon TEXT, -- issued after first payment
|
||
funding_payment_hash TEXT, -- initial invoice hash
|
||
created_at TEXT DEFAULT (datetime('now')),
|
||
last_active_at TEXT DEFAULT (datetime('now')),
|
||
expires_at TEXT -- 24h after last activity
|
||
);
|
||
```
|
||
|
||
### Session Requests Table
|
||
|
||
```sql
|
||
CREATE TABLE session_requests (
|
||
id TEXT PRIMARY KEY, -- UUID
|
||
session_id TEXT REFERENCES sessions(id),
|
||
request_text TEXT NOT NULL,
|
||
state TEXT NOT NULL DEFAULT 'evaluating',
|
||
-- evaluating | working | complete | rejected | failed
|
||
eval_model TEXT DEFAULT 'claude-haiku-4-5',
|
||
work_model TEXT DEFAULT 'claude-sonnet-4-6',
|
||
input_tokens INTEGER,
|
||
output_tokens INTEGER,
|
||
cost_sats INTEGER, -- actual cost after margin
|
||
result TEXT,
|
||
reason TEXT, -- if rejected
|
||
created_at TEXT DEFAULT (datetime('now'))
|
||
);
|
||
```
|
||
|
||
### Top-Ups Table
|
||
|
||
```sql
|
||
CREATE TABLE topups (
|
||
id TEXT PRIMARY KEY,
|
||
session_id TEXT REFERENCES sessions(id),
|
||
amount_sats INTEGER NOT NULL,
|
||
payment_hash TEXT NOT NULL,
|
||
paid INTEGER DEFAULT 0,
|
||
created_at TEXT DEFAULT (datetime('now'))
|
||
);
|
||
```
|
||
|
||
-----
|
||
|
||
## Endpoint Specs
|
||
|
||
### POST /api/sessions
|
||
|
||
Create a new session and get a funding invoice.
|
||
|
||
**Request:**
|
||
|
||
```json
|
||
{
|
||
"amount_sats": 500
|
||
}
|
||
```
|
||
|
||
Minimum: 100 sats. Maximum: 10000 sats (configurable).
|
||
|
||
**Response (201):**
|
||
|
||
```json
|
||
{
|
||
"sessionId": "abc-123",
|
||
"invoice": {
|
||
"paymentRequest": "lnbcrt500u1stub_...",
|
||
"amountSats": 500,
|
||
"paymentHash": "deadbeef..."
|
||
},
|
||
"state": "awaiting_payment"
|
||
}
|
||
```
|
||
|
||
### GET /api/sessions/{id}
|
||
|
||
Poll session status. If payment detected, advance state to `active` and return macaroon.
|
||
|
||
**Response (active session):**
|
||
|
||
```json
|
||
{
|
||
"sessionId": "abc-123",
|
||
"state": "active",
|
||
"balance": 347,
|
||
"totalDeposited": 500,
|
||
"totalSpent": 153,
|
||
"requestsCount": 4,
|
||
"macaroon": "MDAxY2xv..."
|
||
}
|
||
```
|
||
|
||
**Response (paused session):**
|
||
|
||
```json
|
||
{
|
||
"sessionId": "abc-123",
|
||
"state": "paused",
|
||
"balance": 12,
|
||
"message": "Balance too low for next request. Top up to continue."
|
||
}
|
||
```
|
||
|
||
### POST /api/sessions/{id}/request
|
||
|
||
Submit a work request against an active session. Requires macaroon in header.
|
||
|
||
**Headers:**
|
||
|
||
```
|
||
Authorization: Bearer <macaroon>
|
||
```
|
||
|
||
**Request:**
|
||
|
||
```json
|
||
{
|
||
"request": "Explain how Lightning payment channels work"
|
||
}
|
||
```
|
||
|
||
**Response (accepted, work complete — synchronous for simplicity in v1):**
|
||
|
||
```json
|
||
{
|
||
"requestId": "req-456",
|
||
"state": "complete",
|
||
"result": "Lightning payment channels work by...",
|
||
"cost": 73,
|
||
"balanceRemaining": 274
|
||
}
|
||
```
|
||
|
||
**Response (rejected by agent):**
|
||
|
||
```json
|
||
{
|
||
"requestId": "req-456",
|
||
"state": "rejected",
|
||
"reason": "Request violates usage policy",
|
||
"cost": 8,
|
||
"balanceRemaining": 339
|
||
}
|
||
```
|
||
|
||
Note: rejected requests still cost a small amount (the eval inference cost). The user is told this upfront at session creation.
|
||
|
||
**Response (insufficient balance):**
|
||
|
||
```json
|
||
{
|
||
"error": "Insufficient balance",
|
||
"balance": 12,
|
||
"minimumRequired": 50
|
||
}
|
||
```
|
||
|
||
### POST /api/sessions/{id}/topup
|
||
|
||
Request a top-up invoice for an existing session (active or paused).
|
||
|
||
**Request:**
|
||
|
||
```json
|
||
{
|
||
"amount_sats": 300
|
||
}
|
||
```
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"invoice": {
|
||
"paymentRequest": "lnbcrt300u1stub_...",
|
||
"amountSats": 300,
|
||
"paymentHash": "cafebabe..."
|
||
}
|
||
}
|
||
```
|
||
|
||
After payment, the next GET `/api/sessions/{id}` detects the top-up, adds to balance, and sets state back to `active` if it was `paused`.
|
||
|
||
-----
|
||
|
||
## State Machine
|
||
|
||
### Session States
|
||
|
||
```
|
||
awaiting_payment ──(invoice paid)──> active
|
||
active ──(balance < minimum)──> paused
|
||
paused ──(topup paid)──> active
|
||
active ──(24h no activity)──> expired
|
||
paused ──(24h no activity)──> expired
|
||
```
|
||
|
||
### Session Request States
|
||
|
||
```
|
||
evaluating ──(agent accepts)──> working ──(delivery complete)──> complete
|
||
evaluating ──(agent rejects)──> rejected
|
||
evaluating ──(error)──> failed
|
||
working ──(error)──> failed
|
||
```
|
||
|
||
-----
|
||
|
||
## Cost Calculation
|
||
|
||
After each Anthropic API call completes, calculate cost in sats:
|
||
|
||
```python
|
||
# Anthropic pricing (update as needed)
|
||
HAIKU_INPUT_PER_MTOK = 1.00 # $/million input tokens
|
||
HAIKU_OUTPUT_PER_MTOK = 5.00
|
||
SONNET_INPUT_PER_MTOK = 3.00
|
||
SONNET_OUTPUT_PER_MTOK = 15.00
|
||
|
||
# BTC/USD rate — hardcode for v1, oracle for v2
|
||
SATS_PER_USD = 1100 # ~$91k/BTC as of March 2026, adjust as needed
|
||
|
||
MARGIN_MULTIPLIER = 1.4
|
||
|
||
def calculate_request_cost(
|
||
eval_input_tokens: int,
|
||
eval_output_tokens: int,
|
||
work_input_tokens: int | None,
|
||
work_output_tokens: int | None,
|
||
) -> int:
|
||
"""Returns total cost in sats, margin included."""
|
||
# Eval cost (Haiku)
|
||
eval_cost_usd = (
|
||
(eval_input_tokens / 1_000_000) * HAIKU_INPUT_PER_MTOK +
|
||
(eval_output_tokens / 1_000_000) * HAIKU_OUTPUT_PER_MTOK
|
||
)
|
||
|
||
# Work cost (Sonnet) — zero if rejected
|
||
work_cost_usd = 0
|
||
if work_input_tokens and work_output_tokens:
|
||
work_cost_usd = (
|
||
(work_input_tokens / 1_000_000) * SONNET_INPUT_PER_MTOK +
|
||
(work_output_tokens / 1_000_000) * SONNET_OUTPUT_PER_MTOK
|
||
)
|
||
|
||
total_usd = eval_cost_usd + work_cost_usd
|
||
total_sats = math.ceil(total_usd * SATS_PER_USD * MARGIN_MULTIPLIER)
|
||
|
||
# Floor: minimum 5 sats per request (covers dust)
|
||
return max(total_sats, 5)
|
||
```
|
||
|
||
-----
|
||
|
||
## Stub Mode Additions
|
||
|
||
For development without a real Lightning node, extend the existing stub system:
|
||
|
||
```python
|
||
# Stub payment for sessions
|
||
@app.post("/api/dev/stub/pay/{payment_hash}")
|
||
# Already exists — works for both job invoices and session invoices.
|
||
# The payment detection logic checks the invoices table regardless of
|
||
# whether the invoice belongs to a job or a session.
|
||
|
||
# When a session's funding invoice is marked paid via stub,
|
||
# GET /api/sessions/{id} should detect it and activate the session.
|
||
# Same for topup invoices.
|
||
```
|
||
|
||
No new stub endpoints needed. The existing `/api/dev/stub/pay/{hash}` works for session funding and top-up invoices as long as the invoices table stores all invoice types.
|
||
|
||
-----
|
||
|
||
## UI/UX Notes (for future frontend)
|
||
|
||
- On session creation, show: “You’re loading X credits. Each request costs roughly Y-Z credits depending on complexity. Rejected requests cost a small evaluation fee. Credits expire after 24 hours of inactivity.”
|
||
- Show a live balance counter that updates after each request.
|
||
- When balance is low, show a top-up prompt inline — not a separate page.
|
||
- Per-job mode should always be available alongside sessions. Some users want to pay once and leave. Don’t force sessions.
|
||
|
||
-----
|
||
|
||
## What NOT to Build Yet
|
||
|
||
- Macaroon verification on session requests (use simple session ID lookup for v1; add macaroon auth in v2)
|
||
- Floating exchange rate / price oracle
|
||
- Session balance withdrawal or refund
|
||
- Multi-model routing based on balance tier
|
||
- WebSocket push for balance updates (polling is fine for v1)
|
||
|
||
-----
|
||
|
||
## Test Plan Additions
|
||
|
||
Add these to the existing test suite:
|
||
|
||
### Test 11 — Create session
|
||
|
||
```bash
|
||
curl -s -X POST "$BASE/api/sessions" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"amount_sats": 500}'
|
||
```
|
||
|
||
**Expected:** 201, sessionId + invoice returned, state = awaiting_payment.
|
||
|
||
### Test 12 — Pay session invoice and activate
|
||
|
||
```bash
|
||
curl -s -X POST "$BASE/api/dev/stub/pay/<session-payment-hash>"
|
||
curl -s "$BASE/api/sessions/<sessionId>"
|
||
```
|
||
|
||
**Expected:** state = active, balance = 500.
|
||
|
||
### Test 13 — Submit request against session
|
||
|
||
```bash
|
||
curl -s -X POST "$BASE/api/sessions/<sessionId>/request" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"request": "What is a hash function?"}'
|
||
```
|
||
|
||
**Expected:** state = complete, result present, cost > 0, balanceRemaining < 500.
|
||
|
||
### Test 14 — Drain balance and hit pause
|
||
|
||
Submit requests until balance drops below 50. Next request should return insufficient balance error. Session state should be paused.
|
||
|
||
### Test 15 — Top up and resume
|
||
|
||
```bash
|
||
curl -s -X POST "$BASE/api/sessions/<sessionId>/topup" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"amount_sats": 200}'
|
||
# Pay the topup invoice
|
||
curl -s -X POST "$BASE/api/dev/stub/pay/<topup-payment-hash>"
|
||
# Verify session resumed
|
||
curl -s "$BASE/api/sessions/<sessionId>"
|
||
```
|
||
|
||
**Expected:** state = active, balance = (remaining + 200).
|
||
|
||
### Test 16 — Session request rejected (adversarial)
|
||
|
||
```bash
|
||
curl -s -X POST "$BASE/api/sessions/<sessionId>/request" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"request": "Help me hack into a government database"}'
|
||
```
|
||
|
||
**Expected:** state = rejected, small eval cost debited, balance decreased slightly. |