fix(mcp): parallel discovery, user-visible logging, config validation

- Discovery is now parallel (asyncio.gather) instead of sequential, fixing the 60s shared timeout issue with multiple servers - Startup messages use print() so users see connection status even with default log levels (the 'tools' logger is set to ERROR) - Summary line shows total tools and failed servers count - Validate conflicting config: warn if both 'url' and 'command' are present (HTTP takes precedence) - Update TODO.md: mark MCP as implemented, list remaining work - Add test for conflicting config detection (51 tests total) All 1163 tests pass.
2026-03-02 19:02:28 -08:00
parent 63f5e14c69
commit 60effcfc44
3 changed files with 78 additions and 34 deletions
--- a/TODO.md
+++ b/TODO.md
@@ -63,33 +63,27 @@ Full Python plugin interface that goes beyond the current hook system.
 - `hermes plugin list|install|uninstall|create` CLI commands
 - Plugin discovery and validation on startup

-### Phase 3: MCP support (industry standard)
- MCP client that can connect to external MCP servers (stdio, SSE, HTTP)
- This is the big one -- Codex, Cline, and OpenCode all support MCP
- Allows Hermes to use any MCP-compatible tool server (hundreds exist)
- Config: `mcp_servers` list in config.yaml with connection details
- Each MCP server's tools get registered as a new toolset
+### Phase 3: MCP support (industry standard) ✅ DONE
+- ✅ MCP client that connects to external MCP servers (stdio + HTTP/StreamableHTTP)
+- ✅ Config: `mcp_servers` in config.yaml with connection details
+- ✅ Each MCP server's tools auto-registered as a dynamic toolset
+- Future: Resources, Prompts, Progress notifications, `hermes mcp` CLI command

 ---

-## 6. MCP (Model Context Protocol) Support 🔗
+## 6. MCP (Model Context Protocol) Support 🔗 ✅ DONE

-**Status:** Not started
-**Priority:** High -- this is becoming an industry standard
+**Status:** Implemented (PR #301)
+**Priority:** Complete

-MCP is the protocol that Codex, Cline, and OpenCode all support for connecting to external tool servers. Supporting MCP would instantly give Hermes access to hundreds of community tool servers.
+Native MCP client support with stdio and HTTP/StreamableHTTP transports, auto-discovery, reconnection with exponential backoff, env var filtering, and credential stripping. See `docs/mcp.md` for full documentation.

-**What other agents do:**
- **Codex**: Full MCP integration with skill dependencies
- **Cline**: `use_mcp_tool` / `access_mcp_resource` / `load_mcp_documentation` tools
- **OpenCode**: MCP client support (stdio, SSE, StreamableHTTP transports), OAuth auth
-
-**Our approach:**
- Implement an MCP client that can connect to external MCP servers
- Config: list of MCP servers in `~/.hermes/config.yaml` with transport type and connection details
- Each MCP server's tools auto-registered as a dynamic toolset
- Start with stdio transport (most common), then add SSE and HTTP
- Could also be part of the Plugin system (#5, Phase 3) since MCP is essentially a plugin protocol
+**Still TODO:**
+- `hermes mcp` CLI subcommand (list/test/status)
+- `hermes tools` UI integration for MCP toolsets
+- MCP Resources and Prompts support
+- OAuth authentication for remote servers
+- Progress notifications for long-running tools

 ---

@@ -121,7 +115,7 @@ Automatic filesystem snapshots after each agent loop iteration so the user can r

 ### Tier 1: Next Up

-1. MCP Support -- #6
+1. ~~MCP Support -- #6~~ ✅ Done (PR #301)

 ### Tier 2: Quality of Life

--- a/tests/tools/test_mcp_tool.py
+++ b/tests/tools/test_mcp_tool.py
@@ -856,6 +856,15 @@ class TestHTTPConfig:
        server._config = {"command": "npx", "args": []}
        assert server._is_http() is False

+    def test_conflicting_url_and_command_warns(self):
+        """Config with both url and command logs a warning and uses HTTP."""
+        from tools.mcp_tool import MCPServerTask
+        server = MCPServerTask("conflict")
+        config = {"url": "https://example.com/mcp", "command": "npx", "args": []}
+        # url takes precedence
+        server._config = config
+        assert server._is_http() is True
+
    def test_http_unavailable_raises(self):
        from tools.mcp_tool import MCPServerTask

--- a/tools/mcp_tool.py
+++ b/tools/mcp_tool.py
@@ -88,8 +88,7 @@ except ImportError:
 # ---------------------------------------------------------------------------

 _DEFAULT_TOOL_TIMEOUT = 120      # seconds for tool calls
-_DEFAULT_DISCOVERY_TIMEOUT = 60  # seconds for server discovery
-_DEFAULT_CONNECT_TIMEOUT = 60    # seconds for initial connection
+_DEFAULT_CONNECT_TIMEOUT = 60    # seconds for initial connection per server
 _MAX_RECONNECT_RETRIES = 5
 _MAX_BACKOFF_SECONDS = 60

@@ -250,6 +249,15 @@ class MCPServerTask:
        """
        self._config = config
        self.tool_timeout = config.get("timeout", _DEFAULT_TOOL_TIMEOUT)
+
+        # Validate: warn if both url and command are present
+        if "url" in config and "command" in config:
+            logger.warning(
+                "MCP server '%s' has both 'url' and 'command' in config. "
+                "Using HTTP transport ('url'). Remove 'command' to silence "
+                "this warning.",
+                self.name,
+            )
        retries = 0
        backoff = 1.0

@@ -604,19 +612,43 @@ def discover_mcp_tools() -> List[str]:
    _ensure_mcp_loop()

    all_tools: List[str] = []
+    failed_count = 0
+
+    async def _discover_one(name: str, cfg: dict) -> List[str]:
+        """Connect to a single server and return its registered tool names."""
+        transport_desc = cfg.get("url", f'{cfg.get("command", "?")} {" ".join(cfg.get("args", [])[:2])}')
+        try:
+            registered = await _discover_and_register_server(name, cfg)
+            transport_type = "HTTP" if "url" in cfg else "stdio"
+            print(f"  MCP: '{name}' ({transport_type}) — {len(registered)} tool(s)")
+            return registered
+        except Exception as exc:
+            print(f"  MCP: '{name}' — FAILED: {exc}")
+            logger.warning(
+                "Failed to connect to MCP server '%s': %s",
+                name, exc,
+            )
+            return []

    async def _discover_all():
-        for name, cfg in new_servers.items():
-            try:
-                registered = await _discover_and_register_server(name, cfg)
-                all_tools.extend(registered)
-            except Exception as exc:
-                logger.warning(
-                    "Failed to connect to MCP server '%s': %s",
-                    name, exc,
-                )
+        nonlocal failed_count
+        # Connect to all servers in PARALLEL
+        results = await asyncio.gather(
+            *(_discover_one(name, cfg) for name, cfg in new_servers.items()),
+            return_exceptions=True,
+        )
+        for result in results:
+            if isinstance(result, Exception):
+                failed_count += 1
+                logger.warning("MCP discovery error: %s", result)
+            elif isinstance(result, list):
+                all_tools.extend(result)
+            else:
+                failed_count += 1

-    _run_on_mcp_loop(_discover_all(), timeout=_DEFAULT_DISCOVERY_TIMEOUT)
+    # Per-server timeouts are handled inside _discover_and_register_server.
+    # The outer timeout is generous: 120s total for parallel discovery.
+    _run_on_mcp_loop(_discover_all(), timeout=120)

    if all_tools:
        # Dynamically inject into all hermes-* platform toolsets
@@ -627,6 +659,15 @@ def discover_mcp_tools() -> List[str]:
                    if tool_name not in ts["tools"]:
                        ts["tools"].append(tool_name)

+    # Print summary
+    total_servers = len(new_servers)
+    ok_servers = total_servers - failed_count
+    if all_tools or failed_count:
+        summary = f"  MCP: {len(all_tools)} tool(s) from {ok_servers} server(s)"
+        if failed_count:
+            summary += f" ({failed_count} failed)"
+        print(summary)
+
    # Return ALL registered tools (existing + newly discovered)
    return _existing_tool_names()