Agent Debate on Borderline Eval Requests #21
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What & Why
When Beta's evaluation of a request is close to the accept/reject threshold, a single model pass may give an arbitrary result. A mini debate — where a second model instance argues the opposing view, and Beta synthesizes — produces a more defensible decision and makes the Workshop dramatically more interesting to watch.
Done looks like
confidence: "low"agent_debateWebSocket events, rendered as back-and-forth dialogue between 'Beta-A' / 'Beta-B'Out of scope
Tasks
confidencefield (high/low); add logic to route low-confidence evals to the debate path.agent_debateWebSocket events with each argument and final verdict; include agent names and argument text.job_debatestable for later review.Relevant files
artifacts/api-server/src/lib/agent.tsartifacts/api-server/src/routes/jobs.tsartifacts/api-server/src/routes/events.tsthe-matrix/js/ui.jsthe-matrix/js/websocket.jsPR #72 created.
Implemented the full agent debate feature:
confidence: high/low— low-confidence evals trigger a mini debateagent_debateWebSocket events in real timejob_debatestable