[claude] Agent debate on borderline eval requests (#21) #72

Merged
claude merged 1 commits from claude/issue-21 into main 2026-03-23 01:07:53 +00:00
Collaborator

Fixes #21

Summary

  • Borderline detection: Updated eval prompt to return confidence: "high"/"low"; low-confidence evals route to debate path
  • Debate execution: Two sequential Haiku calls (Beta-A argues initial position, Beta-B argues opposing), then a third synthesis call renders final verdict
  • Debate broadcast: agent_debate WebSocket events emitted for each argument and the final verdict, with agent names and argument text
  • Debate UI: Styled dialogue in Workshop event log — Beta-A (blue), Beta-B (pink), verdict (green/red) — visually distinct from regular chat
  • Storage: job_debates table stores debate transcripts (argFor, argAgainst, verdict) for review
  • Fast path unchanged: High-confidence evals skip debate entirely, zero performance impact

Files changed

  • lib/db/src/schema/job-debates.ts — new table
  • lib/db/src/schema/index.ts — export new table
  • artifacts/api-server/src/lib/event-bus.tsDebateEvent type
  • artifacts/api-server/src/lib/agent.ts — confidence field + runDebate() method
  • artifacts/api-server/src/routes/jobs.ts — debate routing in runEvalInBackground
  • artifacts/api-server/src/routes/events.ts — translate debate events to WS messages
  • the-matrix/js/websocket.js — handle agent_debate messages
  • the-matrix/js/ui.jsappendDebateMessage() function
  • the-matrix/index.html — debate CSS styles
Fixes #21 ## Summary - **Borderline detection**: Updated eval prompt to return `confidence: "high"/"low"`; low-confidence evals route to debate path - **Debate execution**: Two sequential Haiku calls (Beta-A argues initial position, Beta-B argues opposing), then a third synthesis call renders final verdict - **Debate broadcast**: `agent_debate` WebSocket events emitted for each argument and the final verdict, with agent names and argument text - **Debate UI**: Styled dialogue in Workshop event log — Beta-A (blue), Beta-B (pink), verdict (green/red) — visually distinct from regular chat - **Storage**: `job_debates` table stores debate transcripts (argFor, argAgainst, verdict) for review - **Fast path unchanged**: High-confidence evals skip debate entirely, zero performance impact ## Files changed - `lib/db/src/schema/job-debates.ts` — new table - `lib/db/src/schema/index.ts` — export new table - `artifacts/api-server/src/lib/event-bus.ts` — `DebateEvent` type - `artifacts/api-server/src/lib/agent.ts` — confidence field + `runDebate()` method - `artifacts/api-server/src/routes/jobs.ts` — debate routing in `runEvalInBackground` - `artifacts/api-server/src/routes/events.ts` — translate debate events to WS messages - `the-matrix/js/websocket.js` — handle `agent_debate` messages - `the-matrix/js/ui.js` — `appendDebateMessage()` function - `the-matrix/index.html` — debate CSS styles
claude added 1 commit 2026-03-23 01:07:20 +00:00
feat: agent debate on borderline eval requests (#21)
Some checks failed
CI / Typecheck & Lint (pull_request) Failing after 1s
06c152d296
When the eval model returns confidence: "low", a mini debate is triggered:
- Beta-A argues the initial position, Beta-B argues the opposing view
- A third synthesis call renders the final verdict
- Debate arguments broadcast as agent_debate WebSocket events
- Frontend renders debate as styled dialogue (Beta-A/Beta-B) in event log
- Debate transcript stored in job_debates table for review
- Fast path unchanged for high-confidence evals

Fixes #21

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
claude merged commit 609acc8f66 into main 2026-03-23 01:07:53 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: replit/timmy-tower#72