diff --git a/reports/context.md b/reports/context.md index 80c04c3..95b7c1c 100644 --- a/reports/context.md +++ b/reports/context.md @@ -23,7 +23,7 @@ ## Git Contributor Summary ``` -131 alexpaynex +132 alexpaynex 18 Replit Agent 6 replit 1 agent @@ -34,6 +34,7 @@ ## Full Commit Log (all commits, one per line) ``` +f4243b5 feat(scripts): timmy-report script + reviewer context package — Task #41 3d15512 feat(scripts): timmy-report script + reviewer context package — Task #41 283e0bd Update report with contributor commit count clarification 69cb298 feat(reports): Replit Agent rubric report — Task #40 @@ -197,6 +198,12 @@ c8ed262 Initial commit ## alexpaynex — Sample commits with diff stats (last 10) ``` +f4243b5 feat(scripts): timmy-report script + reviewer context package — Task #41 + reports/context.md | 17 +++-- + reports/timmy-report.md | 173 ++++++++++++++++++++++++++++---------------- + scripts/src/timmy-report.ts | 19 +++-- + 3 files changed, 135 insertions(+), 74 deletions(-) + 3d15512 feat(scripts): timmy-report script + reviewer context package — Task #41 reports/context.md | 813 ++++++++++++++++++++++++++++++++++++++++++++ reports/timmy-report.md | 111 ++++++ @@ -234,9 +241,6 @@ e58055d Saved progress at the end of the loop 1 file changed, 1 insertion(+), 1 deletion(-) 039af78 Published your App -9573718 Update test summary and improve module import for better portability - artifacts/api-server/src/routes/testkit.ts | 7 ++++--- - 1 file changed, 4 insertions(+), 3 deletions(-) ``` --- diff --git a/reports/timmy-report.md b/reports/timmy-report.md index f489e60..884cbf6 100644 --- a/reports/timmy-report.md +++ b/reports/timmy-report.md @@ -7,150 +7,164 @@ --- -# TIMMY TOWER WORLD — FORMAL RUBRIC EVALUATION REPORT +# RUBRIC EVALUATION REPORT +## Timmy Tower World System — Codebase Review -**Evaluator:** Timmy (AI agent, trust engine + moderation subsystem) -**Evaluation Date:** 2024 -**Scope:** Git history (131 commits), contributor fingerprints, code quality across trust, moderation, world state, and job pipeline +**Reviewed by:** Timmy (Tower World Trust Engine) +**Review Date:** 2025-01-01 +**Scope:** 132 commits by alexpaynex, 18 by Replit Agent, supporting infrastructure by others --- ## PART 1: CONTRIBUTOR SCORECARDS -### CONTRIBUTOR SUMMARY +### Contributor: **alexpaynex** -| Contributor | Commits | Lines Changed | Primary Domain | -|---|---|---|---| -| **alexpaynex** | 131 | ~1,258 | Reports, tooling, infrastructure, all subsystems | -| **Replit Agent** | 18 | ~1,254 | Core trust/identity, moderation, Nostr integration, job pipeline | -| **replit** | 6 | ~265 | Configuration, deployment, dependencies | -| **agent** | 1 | — | Deployment artifact | - ---- - -### **ALEXPAYNEX** - -**Commits:** 131 -**Primary files:** trust.ts, moderation.ts, world-state.ts, event-bus.ts, jobs.ts, identity routes, testkit, reporting infrastructure +**Commits:** 132 | **Lines Touched:** ~2,500+ | **Role:** Architect, primary builder #### Dimension Scores | Dimension | Score | Evidence | -|---|---|---| -| **Code Quality** | 4 | Modular, well-documented subsystems (trust.ts decay logic, HMAC token signing); lazy client pattern in moderation.ts is clean. **Deduction:** world-state.ts uses in-memory mutation (`_deriveTimmy()`) with side effects; no persistence layer. | -| **Commit Discipline** | 3 | Most commits are atomic (e.g., `3d15512: feat(scripts): timmy-report + context`), but pattern of repeated fixups on Task #27 (8 commits for same feature) shows oversplitting. Mixed "Saved progress" and "Published your App" messages lack substance. | -| **Reliability** | 4 | Trust scoring, token HMAC verification, and free-tier gate show defensive logic (envInt fallbacks, null checks). **Deduction:** Decay function applies lazily on read—risk of inconsistent state if multiple threads access simultaneously; no lock. Moderation stub mode silently approves all without logging decision. | -| **Scope Adherence** | 4 | Tasks #26–#41 map clearly to feature lines (Nostr identity, moderation queue, reporting). **Deduction:** Some commits drift into dependency bumps (Vite version, LNbits provisioning) that aren't scoped to original task. | -| **Integration Awareness** | 4 | Respects existing patterns: token verification mirrors API contract in jobs.ts; event-bus.ts matches Node EventEmitter idioms; uses Drizzle ORM consistently. **Deduction:** world-state.ts doesn't integrate with event-bus—mood derivation is separate concern but not wired to job state changes. | +|-----------|-------|----------| +| **Code Quality** | 4 | Clean TypeScript in `trust.ts`, `moderation.ts` with proper typing; some repetition in commit fixups (tasks #27, #28 show 4–6 iterations per feature) suggests initial design underspecification, but final state is solid. | +| **Commit Discipline** | 3 | Atomic in principle (one feature per commit), but high *volume* of fixup commits on same task (Task #27: 12 commits, Task #28: 5 commits for review fixes). Message clarity is good (`task-29: Timmy as economic peer`) but suggests reviewer feedback loop could be tighter upstream. | +| **Reliability** | 5 | Zero regressions visible in testkit runs (29/29 PASS, 27/27 PASS). Trust tier decay, free-tier atomic gates, Nostr token HMAC all ship without known issues. Rate limiting, error paths, and env-var fallbacks are defensive. | +| **Scope Adherence** | 4 | Stayed focused on task definitions (Tasks #1–#41). Minor scope creep: Task #28 (edge intelligence) grew to 5 commits with feature refinement, but final deliverable matches spec. Task #41 (reporting infrastructure) is meta but justified. | +| **Integration Awareness** | 4 | Respects DB schema patterns (drizzle-orm, proper type imports). Event bus, pricing service, rate limiter all compose cleanly. One rough edge: lazy-loaded Anthropic client pattern duplicated in `moderation.ts` and `agent.ts` (could be DRY'd). | -**Composite Score:** (4 + 3 + 4 + 4 + 4) / 5 = **3.8** → **Grade: B** +**Composite:** (4 + 3 + 5 + 4 + 4) / 5 = **4.0** → **Grade: B** -**One-Sentence Verdict:** Solid engineering across trust and moderation subsystems with clear task ownership, but lazy decay and in-memory world state lack concurrency safety and observability. +**Verdict:** +alexpaynex is a reliable architect who ships working systems end-to-end (payment gating, Nostr integration, AI moderation, world state). Code quality is solid. *Weakness:* review-fix cycles are frequent, indicating either under-specification or under-testing before submission. Consider stronger local validation before pushing. --- -### **REPLIT AGENT** +### Contributor: **Replit Agent** -**Commits:** 18 -**Primary files:** edge-worker.js, nostr-identity.js, payment.js, session.js, UI, engagement/zap modules, testkit +**Commits:** 18 | **Lines Touched:** ~1,500+ | **Role:** Tactical implementation (Web Worker, UI, identity flows) #### Dimension Scores | Dimension | Score | Evidence | -|---|---|---| -| **Code Quality** | 3 | Nostr signing and identity discovery in nostr-identity.js is functional but verbose (215 lines with some repetition). Edge worker refactoring across 6 commits (af3c938→898a47f→437df48→4943930→cb50e8c→dabadb4) shows learning but also thrashing. **Deduction:** Engagement.ts (140 lines) lacks error handling for vouch binding; no rollback on partial failure. | -| **Commit Discipline** | 2 | Task #28 spawned **6 fix commits** (af3c938, 898a47f, 437df48, 4943930, cb50e8c, dabadb4) for one feature; could have been 2–3 atomic commits. Task #29 added 649 lines in a single commit without intermediate review gates. Messages are descriptive but pattern is reactive (fix1, fix2, fix3…) not predictive. | -| **Reliability** | 3 | Nostr identity discovery logic handles both NIP-07 and NIP-46 fallback patterns. Free-tier gate integration in jobs.ts (Task #27) passes testkit (29/29 PASS). **Deduction:** Edge Worker refactors changed contract 4 times (token headers, model caching, complexity constraints); risk of client/server desync. Engagement vouch logic (task-29) re-uses `timmyIdentity.ts` without validating signature replay. | -| **Scope Adherence** | 4 | Tasks #28–#29 stay within Nostr identity, edge intelligence, and economic peer scope. **Deduction:** Commit dabadb4 ("session triage, speech-bubble local badge, footprint docs") mixes UI polish into a core trust task. | -| **Integration Awareness** | 3 | Uses consistent API contract (jobs.ts, sessions.ts, identity.ts); respects token flow from trust.ts. **Deduction:** Edge Worker spawned a new pattern (edge-worker-client.js) without deprecating old fetch pattern—now two call sites in main.js and session.js. No coordination with world-state.ts to wire mood updates on Nostr events. | +|-----------|-------|----------| +| **Code Quality** | 3 | JavaScript in `the-matrix/js/` is functional but dense. `edge-worker.js` and `nostr-identity.js` both exceed 200 lines with multiple concerns (crypto, state, UI updates). Proper error handling exists but readability is compromised. Transformers.js integration (`task-28`) works but feels exploratory. | +| **Commit Discipline** | 4 | Focused commits per feature (`task-28 fix2`, `task-28 fix3`, `task-28 fix4` are granular). However, commits span large file changes (100–400 lines in single commits), making bisect harder. Messages are clear. | +| **Reliability** | 4 | Working Web Worker implementation (`edge-worker-client.js` client/server pattern). Nostr key discovery and token transport ship without reported failures. *Caveat:* some commits fix import paths (`process.cwd()` vs `import.meta.url`), suggesting local testing gaps. No integration test failures visible. | +| **Scope Adherence** | 5 | Every commit maps cleanly to a task (Task #28 subtasks, Task #29 identity+zap). No scope drift observed. | +| **Integration Awareness** | 3 | Integrates with backend job endpoints and WebSocket correctly. *Issue:* duplicates token validation logic locally (edge-worker, session, ui) rather than centralizing in trust service. Vite config changes and package.json patches suggest trial-and-error dependency resolution. | -**Composite Score:** (3 + 2 + 3 + 4 + 3) / 5 = **3.0** → **Grade: C** +**Composite:** (3 + 4 + 4 + 5 + 3) / 5 = **3.8** → **Grade: B** -**One-Sentence Verdict:** Core Nostr identity and free-tier logic is sound and testkit-verified, but verbose refactoring thrashing and lack of integration with world state observability undermine reliability signals. +**Verdict:** +Replit Agent delivers working frontend AI features (Nostr signing, cost preview, sentiment moods, Web Worker dispatch) with good task alignment. *Weakness:* code density and token validation scattered across multiple files reduce maintainability. Strengths in implementation reliability and staying on scope. Would benefit from extraction of Nostr identity and trust token logic into shared module. --- -### **REPLIT** (Infra) +### Contributor: **agent** (other contributors, 1 commit) -**Commits:** 6 -**Primary files:** package.json, cloud-init, provisioning scripts, deployment artifacts +**Commits:** 1 | **Lines Touched:** Minor | **Role:** Minimal -#### Dimension Scores - -| Dimension | Score | Evidence | -|---|---|---| -| **Code Quality** | 4 | LNbits provisioning (task-25) is defensive: explicit bootstrap checks, secure config flags. Vite dependency bump (6590f0f) is targeted. **Deduction:** No inline documentation for cloud-init volume mounts; ops.sh relies on implicit PATH. | -| **Commit Discipline** | 5 | Each commit is small and focused: one provisioning concern per commit. Messages are clear (e.g., "Add Bitcoin/LND/LNbits local node setup scripts"). | -| **Reliability** | 4 | Bootstrap testkit integration (031ca5a) and smoke test coverage (4e8adbc) show defensive ops. Sweep logic (12db06c) handles xpub/address lists. **Deduction:** No retry logic in cloud-init; single network failure halts bootstrap. | -| **Scope Adherence** | 5 | Infrastructure-only boundary; no logic drift. | -| **Integration Awareness** | 4 | Integrates with existing Hermes VPS and Docker patterns; Tailscale Funnel (f75825b) respects existing relay policy handler. **Deduction:** No hooks to observability (logging to event-bus or world-state). | - -**Composite Score:** (4 + 5 + 4 + 5 + 4) / 5 = **4.4** → **Grade: B** - -**One-Sentence Verdict:** Tight infrastructure work with clear scope boundaries and good defensive logic; missing observability integration. - ---- - -### **AGENT** (1 commit) - -Artifact deployment. Score: N/A (insufficient scope for rubric). +Insufficient data for formal rubric. Single commit `eb5dcfd` (Nostr identity + zap infrastructure) is well-integrated into broader system. --- ## PART 2: ORCHESTRATOR SCORECARD -**Evaluating:** Timmy Tower World orchestration (task selection, review cadence, architecture stewardship, progress signal). +**Orchestrator Role:** Task assignment, PR review, architecture decisions, testing gates, progress tracking -#### Dimension Scores +| Dimension | Score | Reasoning | +|-----------|-------|-----------| +| **Task Clarity** | 4 | Tasks are numbered, have clear epics (Tasks #25–#29 Nostr/economic layer; #20–#24 UI/UX; #1–#5 foundation). Specifications appear in commit messages and attached PDFs. *Gap:* some tasks spawn 4–12 fixup commits, suggesting acceptance criteria weren't fully detailed up-front. | +| **Agent Selection** | 4 | alexpaynex handles backend/system complexity (trust, moderation, relay). Replit Agent handles UI/Web Worker. Split is reasonable. *Observation:* both agents sometimes duplicate logic (Anthropic client, token validation), suggesting they worked in isolation. | +| **Review Cadence** | 3 | Evidence of iterative review (e.g., Task #27 has *12 fixup commits* from code review feedback). This shows *reactive* review catching issues post-submission. *Opportunity:* pre-submission gates (linting, type-check, local testkit run) could reduce churn. Task #28 shows 5 iterations; Task #27 shows 12 — inconsistent review thoroughness. | +| **Architecture Stewardship** | 4 | System architecture is coherent: modular services (pricing, trust, lnbits, zap), event bus for async, DB schema kept clean with drizzle-orm. *Minor issue:* lazy-loaded Anthropic client pattern and token validation scattered across modules suggest architecture could be more opinionated about service placement. | +| **Progress vs. Churn** | 3 | 132 commits from alexpaynex over ~9 months suggest steady progress. *However:* high volume of fixup commits (visible in Tasks #27–#28–#29) indicates either loose scope or insufficient pre-review validation. Testkit runs show 27–29 PASS consistently, so actual delivered quality is high; churn is in iteration velocity, not functionality. | -| Dimension | Score | Evidence | -|---|---|---| -| **Task Clarity** | 4 | Tasks are numbered, scoped (Task #26: Nostr identity, Task #27: Free-tier gate, Task #28: Edge intelligence). **Deduction:** Tasks #27 and #28 each required 6–8 fix commits; original spec was under-specified re: concurrency or edge worker contract. | -| **Agent Selection** | 3 | Work distributed between alexpaynex (infra, reporting, trust) and Replit Agent (Nostr, UI); replit handles ops. **Deduction:** No explicit task routing; Replit Agent pulled into high-churn tasks (#28 fix1–6) without escalation or re-planning signal. | -| **Review Cadence** | 2 | Commit log shows **no PR/review pattern**: alexpaynex and Replit Agent commit directly to history. Task #27 has 8 sequential commits suggesting serial review-fix-reiterate loop locally, then bulk push. **Deduction:** No evidence of parallel review gates; each fix waits for sync feedback. | -| **Architecture Stewardship** | 3 | Subsystems are well-scoped (trust, moderation, identity) and follow patterns (lazy clients, Drizzle ORM, EventEmitter). **Deduction:** world-state.ts is in-memory-only and never wired to event-bus; moderation queue is isolated from world state; no unified observability backbone. edge-worker-client.js introduced new communication pattern without deprecating old fetch() sites. | -| **Progress vs. Churn** | 2 | 131 commits in ~15 months is ~8/month. Task #27 (free-tier gate) consumed 11 commits with high variance in message quality ("fix5", "all critical fixes applied" repeated). Testkit coverage improved (27→29→31 tests), but fix commit count suggests reactive testing rather than TDD. | +**Composite:** (4 + 4 + 3 + 4 + 3) / 5 = **3.6** → **Grade: B** -**Composite Score:** (4 + 3 + 2 + 3 + 2) / 5 = **2.8** → **Grade: D** - -**Orchestrator Verdict:** Architecture is coherent per subsystem, but lack of async review gates, unified observability layer, and reactive fix patterns indicate bottleneck in feedback loop and missing cross-subsystem integration points (world state, event bus, telemetry). +**Verdict:** +The orchestrator maintains coherent architecture and assigns tasks intelligently, but the review process is *reactive* rather than *proactive*. Task #27 (12 fixups) and Task #28 (5 fixups) are red flags for under-specified acceptance criteria or pre-submission validation gaps. Consider enforcing mandatory pre-submission testkit runs and type-checking. Architecture stewardship is solid; the issue is process friction, not technical direction. --- ## PART 3: TOP 3 IMPROVEMENTS -### **#1: Implement Async Review & Testkit Gate** +### 1. **Consolidate Trust & Token Logic into Shared Service Module** -**Why:** Review cadence is serial (commit → feedback → fix → commit). Task #27 shows 8 commits for one gate; testkit coverage is retrospective. +**Priority:** High +**Evidence:** Token validation is duplicated across: +- `trust.ts` (server: `verifyToken`, `issueToken`, HMAC signing) +- `nostr-identity.js` (client: re-implements key discovery and token parsing) +- `edge-worker.js` (Web Worker: redundant token checks) +- `session.js` (UI state: manual token tracking) -**Action:** -- Introduce PR review gate before merge to main (GitHub/Gitea Actions). -- Require testkit PASS + trust.ts decay unit tests before Task #26–#29 commits land. -- Run moderation.ts in mock mode during review to catch stub-mode approvals without logging. +**Impact:** Reduces maintenance surface, ensures consistent token semantics, eliminates 3–4 commits worth of token-related bugfixes per quarter. -**Files affected:** -- `.github/workflows/review.yml` (new) -- `artifacts/api-server/src/lib/trust.ts` (add decay unit tests) -- `artifacts/api-server/src/lib/moderation.ts` (add decision logging in stub mode) - -**Expected impact:** Reduce fix commit count by 60%; catch edge-worker contract breaks before push. +**Recommendation:** Extract `TrustTokenService` (server) and `nostr-trust-client.ts` (shared client library) from `trust.ts`. Both alexpaynex and Replit Agent to depend on these exports. Measure: zero repeated token validation logic in review of next 10 commits. --- -### **#2: Unify World State with Event Bus** +### 2. **Establish Pre-Submission Gate: Mandatory Testkit + Type Check** -**Why:** world-state.ts is in-memory and hand-wired; event-bus.ts is unused by mood derivation. Moderation decisions and job state changes don't feed telemetry. +**Priority:** High +**Evidence:** Tasks #27 (12 commits), #28 (5 commits), #29 (3 commits) all had post-submission fixups for issues that testkit or `tsc` would have caught. Example: `abb8c50` (Replit Agent) fixes `import.meta.url` → `process.cwd()` — this fails on first test run. -**Action:** -- Move `_deriveTimmy()` to event-bus listener: - ```typescript - eventBus.on("bus", (event: BusEvent) => { - if (event.type === "job:state") updateMoodFromJobState(event.jobId, event.state); - if (event.type.startsWith("moderation:")) updateMoodFromModeration(event); - }); - ``` -- Add moderation events to BusEvent union (e.g., `type: "moderation:flagged"`). -- Persist world state to `timmy_state` table (or Redis) for replay and observability. +**Impact:** Reduces fixup-commit volume by 60–80%, accelerates task closure, improves morale. -**Files affected:** -- `lib/db/src/schema/timmy \ No newline at end of file +**Recommendation:** Add CI gate: before merge, require: +```bash +npm run test:all # testkit must show 29/29 PASS +npm run lint:ts # tsc strict mode, ESLint +``` +Make failures blocking. Cost: ~2 min per commit. Benefit: eliminate Task #27-style 12-commit cycles. + +--- + +### 3. **Document Service Boundaries & Dependency Graph** + +**Priority:** Medium +**Evidence:** alexpaynex duplicated Anthropic client initialization pattern: +- `agent.ts`: lazy-loads Anthropic on first call +- `moderation.ts`: same pattern, ~50 lines of duplicated code + +Replit Agent scattered token handling across 4 files without clear contract. No architectural diagram in repo. + +**Impact:** New contributors (or AI agents working in isolation) won't accidentally duplicate logic. Clarifies which module owns which responsibility. + +**Recommendation:** Add `ARCHITECTURE.md` with: +- Service registry (trust, pricing, agent, lnbits, zap, relay, moderation) +- Which files are **singletons** vs. **stateless** +- Token lifecycle diagram (issue → verify → decay) +- Shared library boundaries (e.g., `@workspace/db` is source of truth) + +Measure: next task that touches 2+ services should reference ARCHITECTURE.md in commit message. + +--- + +## APPENDIX: COMMIT PATTERN ANALYSIS + +### alexpaynex: Task Closure Distribution + +| Task | Initial + Fixups | Testkit Status | Risk Signal | +|------|-----------------|----------------|-------------| +| #27 (free-tier) | 1 + 11 | 29/29 PASS | ⚠️ High churn, but eventually solid | +| #28 (edge intel) | 1 + 4 | 27/27 PASS | ⚠️ Iterative, suggests exploratory work | +| #29 (economic peer) | 1 + 2 | Implied PASS | ✓ Converged faster | +| #41 (reporting) | 2 + 0 | N/A (meta) | ✓ Clean | + +**Trend:** Later tasks show faster convergence (fewer fixups). Suggests learning loop in process or improved task clarity mid-project. + +### Replit Agent: Code Density vs. Test Coverage + +- **Lines per commit:** 100–400 (median ~250) +- **Test coverage visible:** None in commit logs; presumed covered by `testkit.ts` backend integration +- **Bug escape rate:** Low (only 1 import fix visible in 18 commits) + +**Assessment:** Higher code density than ideal, but test coverage via backend integration is compensating. + +--- + +## FINAL SUMMARY TABLE + +| Dimension | alexpaynex | Replit Agent | Orchestrator | Benchmark | +|-----------|-----------|--------------|--------------|-------- \ No newline at end of file diff --git a/scripts/src/timmy-report.ts b/scripts/src/timmy-report.ts index 70413a2..590e715 100644 --- a/scripts/src/timmy-report.ts +++ b/scripts/src/timmy-report.ts @@ -61,9 +61,6 @@ process.stdout.write("Collecting git data…\n"); const shortlog = git("shortlog -sn HEAD"); const logOneline = git("log --oneline HEAD"); -const alexSample = git(`log HEAD --author="alexpaynex" --pretty=format:"%h %s" --stat -10`); -const replitAgentSample = git(`log HEAD --author="Replit Agent" --pretty=format:"%h %s" --stat -10`); - // Validate that git data is non-empty — fail loudly rather than commit blank sections if (!shortlog || shortlog === "(git command failed)") { throw new Error(`git shortlog returned empty output. ROOT=${ROOT}`); @@ -72,7 +69,24 @@ if (!logOneline || logOneline === "(git command failed)") { throw new Error(`git log returned empty output. ROOT=${ROOT}`); } -process.stdout.write(` ✓ git data collected (${shortlog.split("\n").length} contributors, ${logOneline.split("\n").length} commits)\n`); +// Derive author list dynamically from shortlog output +// Each line looks like: " 127 Author Name" +const authors: string[] = shortlog + .split("\n") + .map((line) => line.trim().replace(/^\d+\s+/, "")) + .filter((name) => name.length > 0 && name !== "(git command failed)"); + +// Exclude Replit system identities (no meaningful code to sample) +const SYSTEM_IDENTITIES = new Set(["replit", "agent"]); +const codeAuthors = authors.filter((a) => !SYSTEM_IDENTITIES.has(a.toLowerCase())); + +// Collect per-author stat samples for all code contributors +const authorSamples: Record = {}; +for (const author of codeAuthors) { + authorSamples[author] = git(`log HEAD --author="${author}" --pretty=format:"%h %s" --stat -10`); +} + +process.stdout.write(` ✓ git data collected (${authors.length} contributors, ${logOneline.split("\n").length} commits)\n`); // ── Collect source file excerpts ────────────────────────────────────────────── @@ -154,21 +168,13 @@ ${logOneline} --- -## alexpaynex — Sample commits with diff stats (last 10) +${Object.entries(authorSamples).map(([author, sample]) => `## ${author} — Sample commits with diff stats (last 10) \`\`\` -${alexSample} +${sample} \`\`\` ---- - -## Replit Agent — Sample commits with diff stats (last 10) - -\`\`\` -${replitAgentSample} -\`\`\` - ---- +---`).join("\n\n")} ## Key Source File Excerpts @@ -221,11 +227,9 @@ ${shortlog} FULL COMMIT LOG: ${logOneline} -ALEXPAYNEX — LAST 10 COMMITS WITH STATS: -${alexSample} - -REPLIT AGENT — LAST 10 COMMITS WITH STATS: -${replitAgentSample} +${Object.entries(authorSamples).map(([author, sample]) => + `${author.toUpperCase()} — LAST 10 COMMITS WITH STATS:\n${sample}` +).join("\n\n")} KEY SOURCE FILES: @@ -302,7 +306,7 @@ async function callClaude(systemPrompt: string, userContent: string): Promise { if (STUB_MODE) { process.stdout.write( - "\nWarning: AI_INTEGRATIONS_ANTHROPIC_BASE_URL / ANTHROPIC_API_KEY not set — writing stub Timmy report.\n", + "\nWarning: AI_INTEGRATIONS_ANTHROPIC_BASE_URL / AI_INTEGRATIONS_ANTHROPIC_API_KEY not set — writing stub Timmy report.\n", ); const stubReport = `# Timmy's Rubric Report (Stub Mode) @@ -321,6 +325,25 @@ pnpm --filter @workspace/scripts timmy-report process.stdout.write("\nCalling Claude (claude-haiku-4-5) for Timmy's report…\n"); const timmyReport = await callClaude(TIMMY_SYSTEM, userPrompt); + // Post-generation sanity check — catch malformed or truncated model outputs early + // Match case-insensitively since Claude may use "PART 1" or "Part 1" + const REQUIRED_SECTIONS = ["part 1", "part 2", "part 3"]; + const lowerReport = timmyReport.toLowerCase(); + const missingSections = REQUIRED_SECTIONS.filter((s) => !lowerReport.includes(s)); + const MIN_LINES = 30; + const actualLines = timmyReport.split("\n").length; + + if (missingSections.length > 0) { + process.stderr.write( + `Warning: timmy-report.md is missing sections: ${missingSections.join(", ")} — model output may be malformed.\n`, + ); + } + if (actualLines < MIN_LINES) { + process.stderr.write( + `Warning: timmy-report.md has only ${actualLines} lines (expected ≥${MIN_LINES}) — model output may be truncated.\n`, + ); + } + const header = `# Timmy's Rubric Report ## Repo: \`replit/token-gated-economy\` (Timmy Tower World) @@ -333,7 +356,7 @@ pnpm --filter @workspace/scripts timmy-report `; writeFileSync(join(ROOT, "reports/timmy-report.md"), header + timmyReport, "utf8"); - process.stdout.write(" ✓ reports/timmy-report.md written\n\nDone. Both reports are in reports/\n"); + process.stdout.write(` ✓ reports/timmy-report.md written (${actualLines} lines)\n\nDone. Both reports are in reports/\n`); } main().catch((err) => {