[claude] Agent debate on borderline eval requests (#21) #72

claude · 2026-03-23T01:07:19Z

claude commented

2026-03-23 01:07:19 +00:00

Fixes #21

Summary

Borderline detection: Updated eval prompt to return confidence: "high"/"low"; low-confidence evals route to debate path
Debate execution: Two sequential Haiku calls (Beta-A argues initial position, Beta-B argues opposing), then a third synthesis call renders final verdict
Debate broadcast: agent_debate WebSocket events emitted for each argument and the final verdict, with agent names and argument text
Debate UI: Styled dialogue in Workshop event log — Beta-A (blue), Beta-B (pink), verdict (green/red) — visually distinct from regular chat
Storage: job_debates table stores debate transcripts (argFor, argAgainst, verdict) for review
Fast path unchanged: High-confidence evals skip debate entirely, zero performance impact

Files changed

lib/db/src/schema/job-debates.ts — new table
lib/db/src/schema/index.ts — export new table
artifacts/api-server/src/lib/event-bus.ts — DebateEvent type
artifacts/api-server/src/lib/agent.ts — confidence field + runDebate() method
artifacts/api-server/src/routes/jobs.ts — debate routing in runEvalInBackground
artifacts/api-server/src/routes/events.ts — translate debate events to WS messages
the-matrix/js/websocket.js — handle agent_debate messages
the-matrix/js/ui.js — appendDebateMessage() function
the-matrix/index.html — debate CSS styles

Fixes #21 ## Summary - **Borderline detection**: Updated eval prompt to return `confidence: "high"/"low"`; low-confidence evals route to debate path - **Debate execution**: Two sequential Haiku calls (Beta-A argues initial position, Beta-B argues opposing), then a third synthesis call renders final verdict - **Debate broadcast**: `agent_debate` WebSocket events emitted for each argument and the final verdict, with agent names and argument text - **Debate UI**: Styled dialogue in Workshop event log — Beta-A (blue), Beta-B (pink), verdict (green/red) — visually distinct from regular chat - **Storage**: `job_debates` table stores debate transcripts (argFor, argAgainst, verdict) for review - **Fast path unchanged**: High-confidence evals skip debate entirely, zero performance impact ## Files changed - `lib/db/src/schema/job-debates.ts` — new table - `lib/db/src/schema/index.ts` — export new table - `artifacts/api-server/src/lib/event-bus.ts` — `DebateEvent` type - `artifacts/api-server/src/lib/agent.ts` — confidence field + `runDebate()` method - `artifacts/api-server/src/routes/jobs.ts` — debate routing in `runEvalInBackground` - `artifacts/api-server/src/routes/events.ts` — translate debate events to WS messages - `the-matrix/js/websocket.js` — handle `agent_debate` messages - `the-matrix/js/ui.js` — `appendDebateMessage()` function - `the-matrix/index.html` — debate CSS styles

claude added 1 commit 2026-03-23 01:07:20 +00:00

feat: agent debate on borderline eval requests (#21 )

CI / Typecheck & Lint (pull_request) Failing after 1s

Details

06c152d296

When the eval model returns confidence: "low", a mini debate is triggered:
- Beta-A argues the initial position, Beta-B argues the opposing view
- A third synthesis call renders the final verdict
- Debate arguments broadcast as agent_debate WebSocket events
- Frontend renders debate as styled dialogue (Beta-A/Beta-B) in event log
- Debate transcript stored in job_debates table for review
- Fast path unchanged for high-confidence evals

Fixes #21

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude referenced this pull request

2026-03-23 01:07:30 +00:00

Agent Debate on Borderline Eval Requests #21

claude merged commit 609acc8f66 into main

2026-03-23 01:07:53 +00:00

claude referenced this issue from a commit

2026-03-23 01:07:55 +00:00

[claude] Agent debate on borderline eval requests (#21) (#72)

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: replit/timmy-tower#72