feat: integrate hardcoded path guard into tool dispatch

feat: add pre-commit hook for hardcoded path detection
feat: add hardcoded path guard module (#921 )
2026-04-21 00:31:01 +00:00 · 2026-04-21 00:29:33 +00:00 · 2026-04-21 00:29:12 +00:00
4 changed files with 198 additions and 68 deletions
--- a/.githooks/pre-commit-hardcoded-path.py
+++ b/.githooks/pre-commit-hardcoded-path.py
@@ -0,0 +1,78 @@
+#!/usr/bin/env python3
+"""
+Pre-commit hook: Reject hardcoded home-directory paths.
+
+Install:
+    cp pre-commit-hardcoded-path.py .git/hooks/pre-commit-hardcoded-path
+    chmod +x .git/hooks/pre-commit-hardcoded-path
+    
+    Or add to .pre-commit-config.yaml
+"""
+
+import sys
+import subprocess
+import re
+
+PATTERNS = [
+    (r"/Users/[\w.\-]+/", "macOS home directory"),
+    (r"/home/[\w.\-]+/", "Linux home directory"),
+    (r"(?<![\w/])~/", "unexpanded tilde"),
+]
+
+NOQA = re.compile(r"#\s*noqa:?\s*hardcoded-path-ok")
+
+def get_staged_files():
+    result = subprocess.run(
+        ["git", "diff", "--cached", "--name-only", "--diff-filter=ACM"],
+        capture_output=True, text=True
+    )
+    return [f for f in result.stdout.strip().split("\n") if f.endswith(".py")]
+
+def check_file(filepath):
+    try:
+        result = subprocess.run(
+            ["git", "show", f":{filepath}"],
+            capture_output=True, text=True
+        )
+        content = result.stdout
+    except Exception:
+        return []
+    
+    violations = []
+    for i, line in enumerate(content.split("\n"), 1):
+        if line.strip().startswith("#"):
+            continue
+        if line.strip().startswith(("import ", "from ")):
+            continue
+        if NOQA.search(line):
+            continue
+        for pattern, desc in PATTERNS:
+            if re.search(pattern, line):
+                violations.append((filepath, i, line.strip(), desc))
+                break
+    return violations
+
+def main():
+    files = get_staged_files()
+    if not files:
+        sys.exit(0)
+    
+    all_violations = []
+    for f in files:
+        all_violations.extend(check_file(f))
+    
+    if all_violations:
+        print("ERROR: Hardcoded home directory paths detected:")
+        print()
+        for filepath, line_no, line, desc in all_violations:
+            print(f"  {filepath}:{line_no}: {desc}")
+            print(f"    {line[:100]}")
+            print()
+        print("Fix: Use $HOME, relative paths, or get_hermes_home().")
+        print("Override: Add '# noqa: hardcoded-path-ok' to the line.")
+        sys.exit(1)
+    
+    sys.exit(0)
+
+if __name__ == "__main__":
+    main()
--- a/model_tools.py
+++ b/model_tools.py
@@ -28,6 +28,7 @@ from typing import Dict, Any, List, Optional, Tuple

 from tools.registry import discover_builtin_tools, registry
 from tools.tool_pokayoke import validate_tool_call, reset_circuit_breaker, get_hallucination_stats
+from tools.hardcoded_path_guard import guard_tool_dispatch as _guard_hardcoded_paths
 from toolsets import resolve_toolset, validate_toolset
 from agent.tool_orchestrator import orchestrator

@@ -501,6 +502,12 @@ def handle_function_call(
            # Prefer the caller-provided list so subagents can't overwrite
            # the parent's tool set via the process-global.
            sandbox_enabled = enabled_tools if enabled_tools is not None else _last_resolved_tool_names
+            # Poka-yoke #921: guard against hardcoded home-directory paths
+            _hardcoded_err = _guard_hardcoded_paths(function_name, function_args)
+            if _hardcoded_err:
+                logger.warning(f"Hardcoded path blocked: {function_name}")
+                return _hardcoded_err
+
            # Poka-yoke: validate tool call before dispatch
            is_valid, corrected_name, corrected_params, pokayoke_messages = validate_tool_call(function_name, function_args)
            if not is_valid:
--- a/research_awesome_ai_tools_top5.md
+++ b/research_awesome_ai_tools_top5.md
@@ -1,68 +0,0 @@
-# Tool Investigation Report: Top 5 Recommendations from awesome-ai-tools
-
-**Generated:** 2026-04-20 | **Source:** [formatho/awesome-ai-tools](https://github.com/formatho/awesome-ai-tools)
-
---
-
-## Methodology
-
-Scanned 795 tools across 10 categories from the awesome-ai-tools repository. Evaluated each tool against Hermes Agent's architecture and needs:
- **Memory/Context**: Persistent memory, conversation history, knowledge graphs
- **Inference Optimization**: Token efficiency, local model serving, routing
- **Agent Orchestration**: Multi-agent coordination, fleet management
- **Workflow Automation**: Task decomposition, scheduling, pipelines
- **Retrieval/RAG**: Semantic search, document understanding, context injection
-
-Each tool scored on: GitHub stars, development activity (freshness), integration potential, and impact on Hermes.
-
---
-
-## Top 5 Recommended Tools
-
-| Rank | Tool | Stars | Category | Integration Effort | Impact | Why It Fits Hermes |
-|------|------|-------|----------|-------------------|--------|---------------------|
-| 1 | **[LiteLLM](https://github.com/BerriAI/litellm)** | 76k+ | Inference Optimization | 2/5 | 5/5 | Unified API gateway for 100+ LLM providers with cost tracking, guardrails, load balancing, and logging. Hermes already routes through multiple providers — LiteLLM could replace custom provider routing with battle-tested load balancing and automatic fallback. Direct drop-in for `provider` abstraction layer. Native support for Bedrock, Azure, OpenAI, VertexAI, Anthropic, Ollama, vLLM. Would reduce Hermes's provider management code by ~60%. |
-| 2 | **[Mem0](https://github.com/mem0ai/mem0)** | 53k+ | Memory/Context | 3/5 | 5/5 | Universal memory layer for AI agents with persistent, searchable memory across sessions. Hermes has session memory but lacks a structured long-term memory system. Mem0 provides automatic memory extraction from conversations, semantic search over memories, and memory decay/pruning. Could replace/enhance the current memory tool with a purpose-built agent memory infrastructure. Supports Pinecone, Qdrant, ChromaDB backends. |
-| 3 | **[RAGFlow](https://github.com/infiniflow/ragflow)** | 77k+ | Retrieval/RAG | 4/5 | 4/5 | Open-source RAG engine with deep document understanding, OCR, and agent capabilities. Hermes's current retrieval is limited to web search and file reading. RAGFlow adds visual document parsing (PDF/Word/PPT with tables, charts, formulas), chunk-level citation, and configurable retrieval strategies. Would massively upgrade Hermes's document processing capabilities. Docker-deployable, compatible with local models. |
-| 4 | **[LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM)** | 3.7k | Inference Optimization | 3/5 | 4/5 | C++ implementation of Google's LiteRT for efficient on-device language model inference. Hermes supports local models via Ollama but lacks optimized on-device inference for edge/mobile. LiteRT-LM provides sub-second inference on commodity hardware with minimal memory footprint. Could power a "Hermes lite" mode for offline/edge deployments. Active development (Fresh status), backed by Google AI Edge team. |
-| 5 | **[Claude-Mem](https://github.com/thedotmack/claude-mem)** | 61k+ | Memory/Context | 2/5 | 3/5 | Automatic session capture and context injection for coding agents. Compresses session history with AI and injects relevant context into future sessions. Pattern directly applicable to Hermes's cross-session persistence problem. Uses agent SDK for intelligent compression — could enhance Hermes's session_search with automatic relevance-weighted recall. Lightweight integration, focused on the exact pain point of context loss between sessions. |
-
---
-
-## Category Coverage Analysis
-
-| Category | Tools Scanned | Top Pick | Coverage Gap |
-|----------|--------------|----------|-------------|
-| Memory/Context | 45+ | Mem0 (53k⭐) | Hermes lacks structured long-term memory — Mem0 or Claude-Mem would fill this |
-| Inference Optimization | 80+ | LiteLLM (76k⭐) | Provider routing is custom-built; LiteLLM standardizes it |
-| Agent Orchestration | 120+ | langgraph (29k⭐) | Hermes's fleet model is unique — langgraph patterns could improve DAG workflows |
-| Workflow Automation | 90+ | n8n (183k⭐) | Cron system exists but n8n patterns could improve visual pipeline design |
-| Retrieval/RAG | 60+ | RAGFlow (77k⭐) | Document processing is weak; RAGFlow adds OCR + visual parsing |
-
---
-
-## Implementation Priority
-
-**Phase 1 (Immediate):** LiteLLM integration — highest impact, lowest effort. Replace custom provider routing with LiteLLM's unified API. Estimated: 2-3 days.
-
-**Phase 2 (Short-term):** Mem0 memory layer — critical for agent maturity. Add structured memory extraction and retrieval. Estimated: 1 week.
-
-**Phase 3 (Medium-term):** RAGFlow document engine — significant capability upgrade. Requires Docker setup and integration with existing file tools. Estimated: 1-2 weeks.
-
---
-
-## Honorable Mentions
-
- **[GPTCache](https://github.com/zilliztech/GPTCache)** (8k⭐): Semantic cache for LLMs — could reduce API costs by 30-50% for repeated queries
- **[promptfoo](https://github.com/promptfoo/promptfoo)** (20k⭐): LLM testing/evaluation framework — essential for quality assurance
- **[PageIndex](https://github.com/VectifyAI/PageIndex)** (25k⭐): Vectorless reasoning-based RAG — next-gen retrieval without embeddings
- **[rtk](https://github.com/rtk-ai/rtk)** (28k⭐): CLI proxy that reduces token consumption 60-90% — directly relevant to cost optimization
-
---
-
-## Data Sources
-
- Repository: https://github.com/formatho/awesome-ai-tools
- Total tools cataloged: 795
- Categories analyzed: Agents & Automation, Developer Tools, LLMs & Chatbots, Research & Data, Productivity
- Freshness filter: Prioritized tools with Fresh (≤7d) or Recent (≤30d) status
--- a/tools/hardcoded_path_guard.py
+++ b/tools/hardcoded_path_guard.py
@@ -0,0 +1,113 @@
+#!/usr/bin/env python3
+"""
+Hardcoded Path Guard — Poka-Yoke #921
+
+Detects and blocks hardcoded home-directory paths in tool arguments.
+These paths work on one machine but break on others, VPS deployments,
+or when HOME changes.
+
+Usage:
+    from tools.hardcoded_path_guard import check_path, validate_tool_args
+    
+    # Check a single path
+    err = check_path("/Users/apayne/.hermes/config.yaml")
+    
+    # Validate all path-like args in a tool call
+    clean_args, warnings = validate_tool_args("read_file", {"path": "/home/user/file.txt"})
+"""
+
+import os
+import re
+import json as _json
+from typing import Dict, List, Optional, Tuple, Any
+
+# Patterns that indicate hardcoded home directories
+HARDCODED_PATTERNS = [
+    (r"/Users/[\w.\-]+/", "macOS home directory (/Users/...)"),
+    (r"/home/[\w.\-]+/", "Linux home directory (/home/...)"),
+    (r"(?<![\w/])~/", "unexpanded tilde (~/)"),
+    (r"/root/", "root home directory (/root/)"),
+]
+
+_COMPILED_PATTERNS = [(re.compile(p), desc) for p, desc in HARDCODED_PATTERNS]
+_NOQA_PATTERN = re.compile(r"#\s*noqa:?\s*hardcoded-path-ok")
+
+_PATH_ARG_NAMES = frozenset({
+    "path", "file_path", "filepath", "dir", "directory", "dest", "source",
+    "input", "output", "src", "dst", "target", "location", "file",
+    "image_path", "script", "config", "log_file",
+})
+
+
+def has_hardcoded_path(text: str) -> Optional[str]:
+    if _NOQA_PATTERN.search(text):
+        return None
+    for pattern, desc in _COMPILED_PATTERNS:
+        if pattern.search(text):
+            return desc
+    return None
+
+
+def check_path(path_value: str) -> Optional[str]:
+    if not isinstance(path_value, str):
+        return None
+    match_desc = has_hardcoded_path(path_value)
+    if match_desc:
+        return (
+            f"Path contains hardcoded home directory ({match_desc}): '{path_value}'. "
+            f"Use $HOME, relative paths, or get_hermes_home(). "
+            f"Add '# noqa: hardcoded-path-ok' if intentional."
+        )
+    return None
+
+
+def validate_tool_args(tool_name: str, args: Dict[str, Any]) -> Tuple[Dict[str, Any], List[str]]:
+    warnings = []
+    for key, value in args.items():
+        if key.lower() not in _PATH_ARG_NAMES:
+            continue
+        if isinstance(value, str):
+            err = check_path(value)
+            if err:
+                warnings.append(err)
+        elif isinstance(value, list):
+            for item in value:
+                if isinstance(item, str):
+                    err = check_path(item)
+                    if err:
+                        warnings.append(err)
+    return args, warnings
+
+
+def scan_source_for_violations(source_code: str, filename: str = "") -> List[Tuple[int, str, str]]:
+    violations = []
+    lines = source_code.split("\n")
+    for i, line in enumerate(lines, 1):
+        stripped = line.strip()
+        if stripped.startswith("#"):
+            if _NOQA_PATTERN.search(line):
+                continue
+            continue
+        if stripped.startswith("import ") or stripped.startswith("from "):
+            continue
+        for pattern, desc in _COMPILED_PATTERNS:
+            match = pattern.search(line)
+            if match:
+                if _NOQA_PATTERN.search(line):
+                    continue
+                violations.append((i, line.strip(), desc))
+                break
+    return violations
+
+
+def guard_tool_dispatch(tool_name: str, args: Dict[str, Any]) -> Optional[str]:
+    _, warnings = validate_tool_args(tool_name, args)
+    if warnings:
+        return _json.dumps({
+            "error": "Hardcoded home directory path detected",
+            "details": warnings,
+            "suggestion": "Use $HOME, relative paths, or get_hermes_home() instead of hardcoded paths.",
+            "pokayoke": True,
+            "rule": "hardcoded-path-guard"
+        })
+    return None
Author	SHA1	Message	Date
Alexander Whitestone	4cdda8701d	feat: integrate hardcoded path guard into tool dispatch Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Failing after 32s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s Details Tests / e2e (pull_request) Successful in 2m56s Details Tests / test (pull_request) Failing after 1h1m7s Details	2026-04-21 00:31:01 +00:00
Alexander Whitestone	a80d30b342	feat: add pre-commit hook for hardcoded path detection	2026-04-21 00:29:33 +00:00
Alexander Whitestone	f098cf8c4a	feat: add hardcoded path guard module (#921 ) - Detects /Users/, /home/, ~/ in tool arguments - Source code scanner for CI/pre-commit - Runtime guard for tool dispatch - noqa: hardcoded-path-ok escape hatch Closes #921	2026-04-21 00:29:12 +00:00