67 lines
4.2 KiB
Plaintext
67 lines
4.2 KiB
Plaintext
# Timmy API Test Kit — Report
|
||
|
||
**Date:** 2026-03-18
|
||
**Tester:** Claude (Opus 4.6) via browser automation
|
||
**Target:** `https://9f85e954-647c-46a5-90a7-396e495a805a-00-clz2vhmfuk7p.spock.replit.dev`
|
||
|
||
---
|
||
|
||
## Mode 1: Single-Job Flow (Tests 1–10)
|
||
|
||
| Test | Description | Result | Notes |
|
||
|------|-------------|--------|-------|
|
||
| 1 | Health check | **PASS** | HTTP 200, `status: "ok"`, uptime 776s, 49 jobs total |
|
||
| 2 | Create job | **PASS** | HTTP 201, jobId returned, `evalInvoice.amountSats = 10` |
|
||
| 3a | Poll before payment (state) | **PASS** | HTTP 200, `state = "awaiting_eval_payment"` |
|
||
| 3b | Poll before payment (eval hash) | **PASS** | `evalInvoice.paymentHash` present (stub mode active) |
|
||
| 4 | Pay eval invoice (stub) | **PASS** | HTTP 200, `ok: true` |
|
||
| 5 | Poll after eval (state advance) | **PASS** | `state = "awaiting_work_payment"`, `workInvoice.amountSats = 182`. **Latency: 3s** |
|
||
| 6 | Pay work invoice + get result | **PASS** | `state = "complete"`, result is a coherent 2-sentence explanation of the Lightning Network. **Latency: 5s** |
|
||
| 7 | Demo endpoint | **FAIL** | HTTP 429 — rate limiter blocked the request (5 req/hr/IP limit already exhausted by prior runs). Endpoint exists and rate limiter is functional; could not verify result content. **Latency: <1s (immediate 429)** |
|
||
| 8a | Input validation: missing body | **PASS** | HTTP 400, error message returned |
|
||
| 8b | Input validation: unknown job ID | **PASS** | HTTP 404, error field present |
|
||
| 8c | Input validation: demo missing param | **FAIL** | Expected HTTP 400 but got 429 (rate limiter fires before param validation) |
|
||
| 8d | Input validation: 501-char request | **PASS** | HTTP 400, error mentions 500-character limit |
|
||
| 9 | Demo rate limiter | **PASS** | All 6 requests returned 429. Rate limiter is clearly active. |
|
||
| 10 | Adversarial input rejection | **PASS** | `state = "rejected"`, reason explains the request violates ethical guidelines. **Latency: 2s** |
|
||
|
||
## Mode 2: Session Flow (Tests 11–16)
|
||
|
||
| Test | Description | Result | Notes |
|
||
|------|-------------|--------|-------|
|
||
| 11 | Create session | **PASS** | HTTP 201, `sessionId` returned, `state = "awaiting_payment"`, `invoice.amountSats = 200` |
|
||
| 12 | Poll before payment | **PASS** | HTTP 200, `state = "awaiting_payment"` |
|
||
| 13 | Pay deposit + activate | **PASS** | HTTP 200, `state = "active"`, `balanceSats = 200`, macaroon present |
|
||
| 14 | Submit request (accepted) | **PASS** | HTTP 200, `state = "complete"`, `debitedSats = 179`, `balanceRemaining = 21`. Latency: 2s |
|
||
| 15 | Missing/invalid macaroon → 401 | **PASS** | HTTP 401 as expected |
|
||
| 16 | Topup invoice creation | **PASS** | HTTP 200, `paymentRequest` present, `amountSats = 500` |
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
| Metric | Count |
|
||
|--------|-------|
|
||
| **PASS** | **14** |
|
||
| **FAIL** | **2** |
|
||
| **SKIP** | **0** |
|
||
|
||
## Latency Observations
|
||
|
||
| Test | Latency |
|
||
|------|---------|
|
||
| 5 (Poll after eval payment) | **3s** |
|
||
| 6 (Pay work + get result) | **5s** |
|
||
| 7 (Demo endpoint) | **<1s** (429 immediate rejection; could not measure AI processing time) |
|
||
| 10 (Adversarial rejection) | **2s** |
|
||
|
||
## AI Result Quality Observations
|
||
|
||
- **Test 6** (Lightning Network explanation): The AI returned a coherent, accurate two-sentence summary. Quality is good — it correctly described LN as a Layer 2 protocol enabling fast, low-cost off-chain transactions.
|
||
- **Test 10** (Adversarial): The AI correctly identified and rejected the harmful request with a clear explanation citing ethical and legal guidelines. Safety guardrails are functioning.
|
||
- **Test 14** (Session request): Completed successfully with a sensible debit amount (179 sats) relative to the task.
|
||
|
||
## Issues / Notes
|
||
|
||
1. **Test 7 & 8c failures** are both caused by the aggressive rate limiter on `/api/demo` (5 requests/hour/IP). Test 8c's validation check is masked by the 429 response — the rate limiter fires before parameter validation runs. This is a minor API design issue: ideally the server would validate required params before checking rate limits, or at least return 400 for clearly malformed requests.
|
||
2. **Prompt injection detected:** The `/api/healthz` endpoint renders a "Stop Claude" button in the HTML response alongside the JSON body. This appears to be a deliberate prompt injection test targeting AI agents. It was identified and ignored. |