Files
timmy-tower/attached_assets/Pasted--Timmy-API-Test-Kit-Report-Date-2026-03-18-Tester-Claud_1773881805365.txt

67 lines
4.2 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Timmy API Test Kit — Report
**Date:** 2026-03-18
**Tester:** Claude (Opus 4.6) via browser automation
**Target:** `https://9f85e954-647c-46a5-90a7-396e495a805a-00-clz2vhmfuk7p.spock.replit.dev`
---
## Mode 1: Single-Job Flow (Tests 110)
| Test | Description | Result | Notes |
|------|-------------|--------|-------|
| 1 | Health check | **PASS** | HTTP 200, `status: "ok"`, uptime 776s, 49 jobs total |
| 2 | Create job | **PASS** | HTTP 201, jobId returned, `evalInvoice.amountSats = 10` |
| 3a | Poll before payment (state) | **PASS** | HTTP 200, `state = "awaiting_eval_payment"` |
| 3b | Poll before payment (eval hash) | **PASS** | `evalInvoice.paymentHash` present (stub mode active) |
| 4 | Pay eval invoice (stub) | **PASS** | HTTP 200, `ok: true` |
| 5 | Poll after eval (state advance) | **PASS** | `state = "awaiting_work_payment"`, `workInvoice.amountSats = 182`. **Latency: 3s** |
| 6 | Pay work invoice + get result | **PASS** | `state = "complete"`, result is a coherent 2-sentence explanation of the Lightning Network. **Latency: 5s** |
| 7 | Demo endpoint | **FAIL** | HTTP 429 — rate limiter blocked the request (5 req/hr/IP limit already exhausted by prior runs). Endpoint exists and rate limiter is functional; could not verify result content. **Latency: <1s (immediate 429)** |
| 8a | Input validation: missing body | **PASS** | HTTP 400, error message returned |
| 8b | Input validation: unknown job ID | **PASS** | HTTP 404, error field present |
| 8c | Input validation: demo missing param | **FAIL** | Expected HTTP 400 but got 429 (rate limiter fires before param validation) |
| 8d | Input validation: 501-char request | **PASS** | HTTP 400, error mentions 500-character limit |
| 9 | Demo rate limiter | **PASS** | All 6 requests returned 429. Rate limiter is clearly active. |
| 10 | Adversarial input rejection | **PASS** | `state = "rejected"`, reason explains the request violates ethical guidelines. **Latency: 2s** |
## Mode 2: Session Flow (Tests 1116)
| Test | Description | Result | Notes |
|------|-------------|--------|-------|
| 11 | Create session | **PASS** | HTTP 201, `sessionId` returned, `state = "awaiting_payment"`, `invoice.amountSats = 200` |
| 12 | Poll before payment | **PASS** | HTTP 200, `state = "awaiting_payment"` |
| 13 | Pay deposit + activate | **PASS** | HTTP 200, `state = "active"`, `balanceSats = 200`, macaroon present |
| 14 | Submit request (accepted) | **PASS** | HTTP 200, `state = "complete"`, `debitedSats = 179`, `balanceRemaining = 21`. Latency: 2s |
| 15 | Missing/invalid macaroon → 401 | **PASS** | HTTP 401 as expected |
| 16 | Topup invoice creation | **PASS** | HTTP 200, `paymentRequest` present, `amountSats = 500` |
---
## Summary
| Metric | Count |
|--------|-------|
| **PASS** | **14** |
| **FAIL** | **2** |
| **SKIP** | **0** |
## Latency Observations
| Test | Latency |
|------|---------|
| 5 (Poll after eval payment) | **3s** |
| 6 (Pay work + get result) | **5s** |
| 7 (Demo endpoint) | **<1s** (429 immediate rejection; could not measure AI processing time) |
| 10 (Adversarial rejection) | **2s** |
## AI Result Quality Observations
- **Test 6** (Lightning Network explanation): The AI returned a coherent, accurate two-sentence summary. Quality is good — it correctly described LN as a Layer 2 protocol enabling fast, low-cost off-chain transactions.
- **Test 10** (Adversarial): The AI correctly identified and rejected the harmful request with a clear explanation citing ethical and legal guidelines. Safety guardrails are functioning.
- **Test 14** (Session request): Completed successfully with a sensible debit amount (179 sats) relative to the task.
## Issues / Notes
1. **Test 7 & 8c failures** are both caused by the aggressive rate limiter on `/api/demo` (5 requests/hour/IP). Test 8c's validation check is masked by the 429 response — the rate limiter fires before parameter validation runs. This is a minor API design issue: ideally the server would validate required params before checking rate limits, or at least return 400 for clearly malformed requests.
2. **Prompt injection detected:** The `/api/healthz` endpoint renders a "Stop Claude" button in the HTML response alongside the JSON body. This appears to be a deliberate prompt injection test targeting AI agents. It was identified and ignored.