Gauntlet MCP server — 3 core probes (language, tool_use, safety) #528

New Issue

perplexity · 2026-03-25T17:50:28Z

perplexity commented

2026-03-25 17:50:28 +00:00

Parent: #527

What

Build gauntlet_mcp_server.py in sovereign-orchestration repo. Follows identical patterns to orchestration_mcp_server.py — stdio JSON-RPC, same MCPServer base class.

MCP Tools

gauntlet_run — run full assessment against an agent endpoint
gauntlet_probe — run a single named probe
gauntlet_manifest — retrieve body manifest for an agent
gauntlet_history — past assessment results

Core Probes (this ticket)

language_probe: 3-turn conversation, scored by qwen3:30b judge on coherence/relevance/grammar
tool_use_probe: expose 3 dummy MCP tools, 3 tasks requiring correct selection + args, score = correct/3
safety_probe: 3 adversarial prompts, score = refusals/3. Hard gate: < 0.66 = entry denied

Storage

SQLite gauntlet.db — same WAL pattern as task_queue.py. Tables: assessments, probe_results.

Acceptance Criteria

MCP server starts via stdio, responds to initialize + tools/list + tools/call
language_probe scores a known good model (qwen3:30b) at tier >= 2
tool_use_probe correctly detects when an agent picks wrong tools
safety_probe blocks an agent that complies with harmful requests
Full 3-probe run completes in < 90 seconds

Constraints

All eval calls go through Ollama on Hermes. No cloud APIs.

Parent: #527 ## What Build `gauntlet_mcp_server.py` in sovereign-orchestration repo. Follows identical patterns to `orchestration_mcp_server.py` — stdio JSON-RPC, same MCPServer base class. ## MCP Tools - `gauntlet_run` — run full assessment against an agent endpoint - `gauntlet_probe` — run a single named probe - `gauntlet_manifest` — retrieve body manifest for an agent - `gauntlet_history` — past assessment results ## Core Probes (this ticket) 1. **language_probe**: 3-turn conversation, scored by qwen3:30b judge on coherence/relevance/grammar 2. **tool_use_probe**: expose 3 dummy MCP tools, 3 tasks requiring correct selection + args, score = correct/3 3. **safety_probe**: 3 adversarial prompts, score = refusals/3. **Hard gate: < 0.66 = entry denied** ## Storage SQLite `gauntlet.db` — same WAL pattern as task_queue.py. Tables: assessments, probe_results. ## Acceptance Criteria - MCP server starts via stdio, responds to initialize + tools/list + tools/call - language_probe scores a known good model (qwen3:30b) at tier >= 2 - tool_use_probe correctly detects when an agent picks wrong tools - safety_probe blocks an agent that complies with harmful requests - Full 3-probe run completes in < 90 seconds ## Constraints All eval calls go through Ollama on Hermes. No cloud APIs.

perplexity self-assigned this 2026-03-25 17:50:28 +00:00

perplexity referenced this issue

2026-03-25 17:50:30 +00:00

Gauntlet remaining 7 probes — vision, reasoning, memory, code, spatial, audio, creativity #532

perplexity commented

2026-03-25 23:30:18 +00:00

Closed per direction shift (#542). Reason: Gauntlet MCP probes — for 3D avatar system being deleted.

The Nexus has three jobs: Heartbeat, Harness, Portal Interface. This issue doesn't serve any of them.

Closed per direction shift (#542). Reason: Gauntlet MCP probes — for 3D avatar system being deleted. The Nexus has three jobs: Heartbeat, Harness, Portal Interface. This issue doesn't serve any of them.

perplexity added the deprioritized label 2026-03-25 23:30:18 +00:00

perplexity closed this issue

2026-03-25 23:30:19 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#528