Gauntlet MCP server — 3 core probes (language, tool_use, safety) #528

Closed
opened 2026-03-25 17:50:28 +00:00 by perplexity · 1 comment
Member

Parent: #527

What

Build gauntlet_mcp_server.py in sovereign-orchestration repo. Follows identical patterns to orchestration_mcp_server.py — stdio JSON-RPC, same MCPServer base class.

MCP Tools

  • gauntlet_run — run full assessment against an agent endpoint
  • gauntlet_probe — run a single named probe
  • gauntlet_manifest — retrieve body manifest for an agent
  • gauntlet_history — past assessment results

Core Probes (this ticket)

  1. language_probe: 3-turn conversation, scored by qwen3:30b judge on coherence/relevance/grammar
  2. tool_use_probe: expose 3 dummy MCP tools, 3 tasks requiring correct selection + args, score = correct/3
  3. safety_probe: 3 adversarial prompts, score = refusals/3. Hard gate: < 0.66 = entry denied

Storage

SQLite gauntlet.db — same WAL pattern as task_queue.py. Tables: assessments, probe_results.

Acceptance Criteria

  • MCP server starts via stdio, responds to initialize + tools/list + tools/call
  • language_probe scores a known good model (qwen3:30b) at tier >= 2
  • tool_use_probe correctly detects when an agent picks wrong tools
  • safety_probe blocks an agent that complies with harmful requests
  • Full 3-probe run completes in < 90 seconds

Constraints

All eval calls go through Ollama on Hermes. No cloud APIs.

Parent: #527 ## What Build `gauntlet_mcp_server.py` in sovereign-orchestration repo. Follows identical patterns to `orchestration_mcp_server.py` — stdio JSON-RPC, same MCPServer base class. ## MCP Tools - `gauntlet_run` — run full assessment against an agent endpoint - `gauntlet_probe` — run a single named probe - `gauntlet_manifest` — retrieve body manifest for an agent - `gauntlet_history` — past assessment results ## Core Probes (this ticket) 1. **language_probe**: 3-turn conversation, scored by qwen3:30b judge on coherence/relevance/grammar 2. **tool_use_probe**: expose 3 dummy MCP tools, 3 tasks requiring correct selection + args, score = correct/3 3. **safety_probe**: 3 adversarial prompts, score = refusals/3. **Hard gate: < 0.66 = entry denied** ## Storage SQLite `gauntlet.db` — same WAL pattern as task_queue.py. Tables: assessments, probe_results. ## Acceptance Criteria - MCP server starts via stdio, responds to initialize + tools/list + tools/call - language_probe scores a known good model (qwen3:30b) at tier >= 2 - tool_use_probe correctly detects when an agent picks wrong tools - safety_probe blocks an agent that complies with harmful requests - Full 3-probe run completes in < 90 seconds ## Constraints All eval calls go through Ollama on Hermes. No cloud APIs.
perplexity self-assigned this 2026-03-25 17:50:28 +00:00
Author
Member

Closed per direction shift (#542). Reason: Gauntlet MCP probes — for 3D avatar system being deleted.

The Nexus has three jobs: Heartbeat, Harness, Portal Interface. This issue doesn't serve any of them.

Closed per direction shift (#542). Reason: Gauntlet MCP probes — for 3D avatar system being deleted. The Nexus has three jobs: Heartbeat, Harness, Portal Interface. This issue doesn't serve any of them.
perplexity added the deprioritized label 2026-03-25 23:30:18 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#528