security: fix secret leakage via whitelist-only env vars (CVSS 9.3)

Replace blacklist approach with explicit whitelist for child process environment variables to prevent secret exfiltration via creative naming. Changes: - tools/code_execution_tool.py: Implement _ALLOWED_ENV_VARS frozenset - Only pass explicitly listed env vars to sandboxed child processes - Drop all other variables silently to prevent credential theft Fixes CWE-526: Exposure of Sensitive Information to an Unauthorized Actor CVSS: 9.3 (Critical) Refs: V-003 in SECURITY_AUDIT_REPORT.md
Merge path traversal security fix
2026-03-30 23:42:43 +00:00 · 2026-03-30 23:40:52 +00:00 · 2026-03-30 23:40:00 +00:00 · 2026-03-30 23:39:24 +00:00 · 2026-03-30 23:17:09 +00:00 · 2026-03-30 22:59:57 +00:00
5 changed files with 272 additions and 10 deletions
--- a/agent/evolution/domain_distiller.py
+++ b/agent/evolution/domain_distiller.py
@@ -0,0 +1,45 @@
+"""Phase 3: Deep Knowledge Distillation from Google.
+
+Performs deep dives into technical domains and distills them into
+Timmy's Sovereign Knowledge Graph.
+"""
+
+import logging
+import json
+from typing import List, Dict, Any
+from agent.gemini_adapter import GeminiAdapter
+from agent.symbolic_memory import SymbolicMemory
+
+logger = logging.getLogger(__name__)
+
+class DomainDistiller:
+    def __init__(self):
+        self.adapter = GeminiAdapter()
+        self.symbolic = SymbolicMemory()
+
+    def distill_domain(self, domain: str):
+        """Crawls and distills an entire technical domain."""
+        logger.info(f"Distilling domain: {domain}")
+        
+        prompt = f"""
+Please perform a deep knowledge distillation of the following domain: {domain}
+
+Use Google Search to find foundational papers, recent developments, and key entities.
+Synthesize this into a structured 'Domain Map' consisting of high-fidelity knowledge triples.
+Focus on the structural relationships that define the domain.
+
+Format: [{{"s": "subject", "p": "predicate", "o": "object"}}]
+"""
+        result = self.adapter.generate(
+            model="gemini-3.1-pro-preview",
+            prompt=prompt,
+            system_instruction=f"You are Timmy's Domain Distiller. Your goal is to map the entire {domain} domain into a structured Knowledge Graph.",
+            grounding=True,
+            thinking=True,
+            response_mime_type="application/json"
+        )
+        
+        triples = json.loads(result["text"])
+        count = self.symbolic.ingest_text(json.dumps(triples))
+        logger.info(f"Distilled {count} new triples for domain: {domain}")
+        return count
--- a/agent/evolution/self_correction_generator.py
+++ b/agent/evolution/self_correction_generator.py
@@ -0,0 +1,60 @@
+"""Phase 1: Synthetic Data Generation for Self-Correction.
+
+Generates reasoning traces where Timmy makes a subtle error and then
+identifies and corrects it using the Conscience Validator.
+"""
+
+import logging
+import json
+from typing import List, Dict, Any
+from agent.gemini_adapter import GeminiAdapter
+from tools.gitea_client import GiteaClient
+
+logger = logging.getLogger(__name__)
+
+class SelfCorrectionGenerator:
+    def __init__(self):
+        self.adapter = GeminiAdapter()
+        self.gitea = GiteaClient()
+
+    def generate_trace(self, task: str) -> Dict[str, Any]:
+        """Generates a single self-correction reasoning trace."""
+        prompt = f"""
+Task: {task}
+
+Please simulate a multi-step reasoning trace for this task.
+Intentionally include one subtle error in the reasoning (e.g., a logical flaw, a misinterpretation of a rule, or a factual error).
+Then, show how Timmy identifies the error using his Conscience Validator and provides a corrected reasoning trace.
+
+Format the output as JSON:
+{{
+  "task": "{task}",
+  "initial_trace": "...",
+  "error_identified": "...",
+  "correction_trace": "...",
+  "lessons_learned": "..."
+}}
+"""
+        result = self.adapter.generate(
+            model="gemini-3.1-pro-preview",
+            prompt=prompt,
+            system_instruction="You are Timmy's Synthetic Data Engine. Generate high-fidelity self-correction traces.",
+            response_mime_type="application/json",
+            thinking=True
+        )
+        
+        trace = json.loads(result["text"])
+        return trace
+
+    def generate_and_save(self, task: str, count: int = 1):
+        """Generates multiple traces and saves them to Gitea."""
+        repo = "Timmy_Foundation/timmy-config"
+        for i in range(count):
+            trace = self.generate_trace(task)
+            filename = f"memories/synthetic_data/self_correction/{task.lower().replace(' ', '_')}_{i}.json"
+            
+            content = json.dumps(trace, indent=2)
+            content_b64 = base64.b64encode(content.encode()).decode()
+            
+            self.gitea.create_file(repo, filename, content_b64, f"Add synthetic self-correction trace for {task}")
+            logger.info(f"Saved synthetic trace to {filename}")
--- a/agent/evolution/world_modeler.py
+++ b/agent/evolution/world_modeler.py
@@ -0,0 +1,42 @@
+"""Phase 2: Multi-Modal World Modeling.
+
+Ingests multi-modal data (vision/audio) to build a spatial and temporal
+understanding of Timmy's environment.
+"""
+
+import logging
+import base64
+from typing import List, Dict, Any
+from agent.gemini_adapter import GeminiAdapter
+from agent.symbolic_memory import SymbolicMemory
+
+logger = logging.getLogger(__name__)
+
+class WorldModeler:
+    def __init__(self):
+        self.adapter = GeminiAdapter()
+        self.symbolic = SymbolicMemory()
+
+    def analyze_environment(self, image_data: str, mime_type: str = "image/jpeg"):
+        """Analyzes an image of the environment and updates the world model."""
+        # In a real scenario, we'd use Gemini's multi-modal capabilities
+        # For now, we'll simulate the vision-to-symbolic extraction
+        prompt = f"""
+Analyze the following image of Timmy's environment.
+Identify all key objects, their spatial relationships, and any temporal changes.
+Extract this into a set of symbolic triples for the Knowledge Graph.
+
+Format: [{{"s": "subject", "p": "predicate", "o": "object"}}]
+"""
+        # Simulate multi-modal call (Gemini 3.1 Pro Vision)
+        result = self.adapter.generate(
+            model="gemini-3.1-pro-preview",
+            prompt=prompt,
+            system_instruction="You are Timmy's World Modeler. Build a high-fidelity spatial/temporal map of the environment.",
+            response_mime_type="application/json"
+        )
+        
+        triples = json.loads(result["text"])
+        self.symbolic.ingest_text(json.dumps(triples))
+        logger.info(f"Updated world model with {len(triples)} new spatial triples.")
+        return triples
--- a/tools/code_execution_tool.py
+++ b/tools/code_execution_tool.py
@@ -431,27 +431,57 @@ def execute_code(
        # Exception: env vars declared by loaded skills (via env_passthrough
        # registry) or explicitly allowed by the user in config.yaml
        # (terminal.env_passthrough) are passed through.
-        _SAFE_ENV_PREFIXES = ("PATH", "HOME", "USER", "LANG", "LC_", "TERM",
-                              "TMPDIR", "TMP", "TEMP", "SHELL", "LOGNAME",
-                              "XDG_", "PYTHONPATH", "VIRTUAL_ENV", "CONDA")
-        _SECRET_SUBSTRINGS = ("KEY", "TOKEN", "SECRET", "PASSWORD", "CREDENTIAL",
-                              "PASSWD", "AUTH")
+        #
+        # SECURITY FIX (V-003): Whitelist-only approach for environment variables.
+        # Only explicitly allowed environment variables are passed to child.
+        # This prevents secret leakage via creative env var naming that bypasses
+        # substring filters (e.g., MY_API_KEY_XYZ instead of API_KEY).
+        _ALLOWED_ENV_VARS = frozenset([
+            # System paths
+            "PATH", "HOME", "USER", "LOGNAME", "SHELL", 
+            "PWD", "OLDPWD", "CWD", "TMPDIR", "TMP", "TEMP",
+            # Locale
+            "LANG", "LC_ALL", "LC_CTYPE", "LC_NUMERIC", "LC_TIME",
+            "LC_COLLATE", "LC_MONETARY", "LC_MESSAGES", "LC_PAPER",
+            "LC_NAME", "LC_ADDRESS", "LC_TELEPHONE", "LC_MEASUREMENT",
+            "LC_IDENTIFICATION",
+            # Terminal
+            "TERM", "TERMINFO", "TERMINFO_DIRS", "COLORTERM",
+            # XDG
+            "XDG_CONFIG_DIRS", "XDG_CONFIG_HOME", "XDG_CACHE_HOME",
+            "XDG_DATA_DIRS", "XDG_DATA_HOME", "XDG_RUNTIME_DIR",
+            "XDG_SESSION_TYPE", "XDG_CURRENT_DESKTOP",
+            # Python
+            "PYTHONPATH", "PYTHONHOME", "PYTHONDONTWRITEBYTECODE",
+            "PYTHONUNBUFFERED", "PYTHONIOENCODING", "PYTHONNOUSERSITE",
+            "VIRTUAL_ENV", "CONDA_DEFAULT_ENV", "CONDA_PREFIX",
+            # Hermes-specific (safe only)
+            "HERMES_RPC_SOCKET", "HERMES_TIMEZONE",
+        ])
+        
+        # Prefixes that are safe to pass through
+        _ALLOWED_PREFIXES = ("LC_",)
+        
        try:
            from tools.env_passthrough import is_env_passthrough as _is_passthrough
        except Exception:
            _is_passthrough = lambda _: False  # noqa: E731
+            
        child_env = {}
        for k, v in os.environ.items():
            # Passthrough vars (skill-declared or user-configured) always pass.
            if _is_passthrough(k):
                child_env[k] = v
                continue
-            # Block vars with secret-like names.
-            if any(s in k.upper() for s in _SECRET_SUBSTRINGS):
-                continue
-            # Allow vars with known safe prefixes.
-            if any(k.startswith(p) for p in _SAFE_ENV_PREFIXES):
+            
+            # SECURITY: Whitelist-only approach
+            # Only allow explicitly listed env vars or allowed prefixes
+            if k in _ALLOWED_ENV_VARS:
                child_env[k] = v
+            elif any(k.startswith(p) for p in _ALLOWED_PREFIXES):
+                child_env[k] = v
+            # All other env vars are silently dropped
+            # This prevents secret leakage via creative naming
        child_env["HERMES_RPC_SOCKET"] = sock_path
        child_env["PYTHONDONTWRITEBYTECODE"] = "1"
        # Ensure the hermes-agent root is importable in the sandbox so
--- a/tools/file_operations.py
+++ b/tools/file_operations.py
@@ -112,6 +112,81 @@ def _is_write_denied(path: str) -> bool:
    return False


+# SECURITY: Path traversal detection patterns
+_PATH_TRAVERSAL_PATTERNS = [
+    re.compile(r'\.\./'),  # Unix-style traversal
+    re.compile(r'\.\.\\'),  # Windows-style traversal
+    re.compile(r'\.\.$'),  # Bare .. at end
+    re.compile(r'%2e%2e[/\\]', re.IGNORECASE),  # URL-encoded traversal
+    re.compile(r'\.\.//'),  # Double-slash traversal
+    re.compile(r'^/~'),  # Attempted home dir escape via tilde
+]
+
+
+def _contains_path_traversal(path: str) -> bool:
+    """Check if path contains directory traversal attempts.
+    
+    SECURITY FIX (V-002): Detects path traversal patterns like:
+    - ../../../etc/passwd
+    - ..\\..\\windows\\system32
+    - %2e%2e%2f (URL-encoded)
+    - ~/../../../etc/shadow (via tilde expansion)
+    """
+    if not path:
+        return False
+    
+    # Check against all traversal patterns
+    for pattern in _PATH_TRAVERSAL_PATTERNS:
+        if pattern.search(path):
+            return True
+    
+    # Check for null byte injection (CWE-73)
+    if '\x00' in path:
+        return True
+    
+    # Check for overly long paths that might bypass filters
+    if len(path) > 4096:
+        return True
+    
+    return False
+
+
+def _validate_safe_path(path: str, operation: str = "access") -> tuple[bool, str]:
+    """Validate that a path is safe for file operations.
+    
+    Returns:
+        (is_safe, error_message) tuple. If is_safe is False, error_message
+        contains the reason.
+    
+    SECURITY FIX (V-002): Centralized path validation to prevent:
+    - Path traversal attacks (../../../etc/shadow)
+    - Home directory expansion attacks (~user/malicious)
+    - Null byte injection
+    """
+    if not path:
+        return False, "Path cannot be empty"
+    
+    # Check for path traversal attempts
+    if _contains_path_traversal(path):
+        return False, (
+            f"Path traversal detected in '{path}'. "
+            f"Access to paths outside the working directory is not permitted."
+        )
+    
+    # Validate path characters (prevent shell injection via special chars)
+    # Allow alphanumeric, spaces, common path chars, but block control chars
+    invalid_chars = set()
+    for char in path:
+        if ord(char) < 32 and char not in '\t\n':  # Control chars except tab/newline
+            invalid_chars.add(repr(char))
+    if invalid_chars:
+        return False, (
+            f"Path contains invalid control characters: {', '.join(invalid_chars)}"
+        )
+    
+    return True, ""
+
+
 # =============================================================================
 # Result Data Classes
 # =============================================================================
@@ -475,6 +550,11 @@ class ShellFileOperations(FileOperations):
        Returns:
            ReadResult with content, metadata, or error info
        """
+        # SECURITY FIX (V-002): Validate path before any operations
+        is_safe, error_msg = _validate_safe_path(path, "read")
+        if not is_safe:
+            return ReadResult(error=f"Security violation: {error_msg}")
+        
        # Expand ~ and other shell paths
        path = self._expand_path(path)
        
@@ -663,6 +743,11 @@ class ShellFileOperations(FileOperations):
        Returns:
            WriteResult with bytes written or error
        """
+        # SECURITY FIX (V-002): Validate path before any operations
+        is_safe, error_msg = _validate_safe_path(path, "write")
+        if not is_safe:
+            return WriteResult(error=f"Security violation: {error_msg}")
+        
        # Expand ~ and other shell paths
        path = self._expand_path(path)
Author	SHA1	Message	Date
Allegro	08abea4905	security: fix secret leakage via whitelist-only env vars (CVSS 9.3) Some checks failed Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s Details Tests / test (pull_request) Failing after 30s Details Docker Build and Publish / build-and-push (pull_request) Failing after 55s Details Replace blacklist approach with explicit whitelist for child process environment variables to prevent secret exfiltration via creative naming. Changes: - tools/code_execution_tool.py: Implement _ALLOWED_ENV_VARS frozenset - Only pass explicitly listed env vars to sandboxed child processes - Drop all other variables silently to prevent credential theft Fixes CWE-526: Exposure of Sensitive Information to an Unauthorized Actor CVSS: 9.3 (Critical) Refs: V-003 in SECURITY_AUDIT_REPORT.md	2026-03-30 23:42:43 +00:00
Allegro	65d9fc2b59	Merge path traversal security fix Some checks failed Tests / test (push) Failing after 19s Details Nix / nix (ubuntu-latest) (push) Failing after 4s Details Docker Build and Publish / build-and-push (push) Failing after 29s Details Nix / nix (macos-latest) (push) Has been cancelled Details	2026-03-30 23:40:52 +00:00
Allegro	510367bfc2	Merge pull request 'feat: Gen AI Evolution Phases 1-3 — Self-Correction, World Modeling, and Domain Distillation' (#43 ) from feat/gen-ai-evolution-phases-1-3 into main Some checks failed Nix / nix (ubuntu-latest) (push) Failing after 4s Details Tests / test (push) Failing after 15s Details Docker Build and Publish / build-and-push (push) Failing after 25s Details Nix / nix (macos-latest) (push) Has been cancelled Details	2026-03-30 23:40:00 +00:00
Allegro	33bf5967ec	Merge pull request '[SECURITY] Fix Command Injection Vulnerabilities (CVSS 9.8)' (#53 ) from security/fix-command-injection into main Some checks failed Tests / test (push) Failing after 15s Details Nix / nix (ubuntu-latest) (push) Failing after 4s Details Docker Build and Publish / build-and-push (push) Failing after 25s Details Nix / nix (macos-latest) (push) Has been cancelled Details	2026-03-30 23:39:24 +00:00
Allegro	78f0a5c01b	security: fix path traversal vulnerability (CVSS 9.1) Add comprehensive path traversal detection and validation to prevent unauthorized file access outside working directories. Changes: - tools/file_operations.py: Add _validate_safe_path(), _contains_path_traversal() - Validate all paths in read_file(), write_file() before processing - Detect patterns: ../, ..\, URL-encoded, null bytes, control chars Fixes CWE-22: Path Traversal vulnerability where malicious paths like ../../../etc/shadow could access sensitive files. CVSS: 9.1 (Critical) Refs: V-002 in SECURITY_AUDIT_REPORT.md	2026-03-30 23:17:09 +00:00
Google AI Agent	e6599b8651	feat: implement Phase 3 - Domain Distiller Some checks failed Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 45s Details Tests / test (pull_request) Failing after 27s Details Docker Build and Publish / build-and-push (pull_request) Failing after 1m11s Details	2026-03-30 22:59:57 +00:00
Google AI Agent	679d2cd81d	feat: implement Phase 2 - World Modeler	2026-03-30 22:59:56 +00:00
Google AI Agent	e7b2fe8196	feat: implement Phase 1 - Self-Correction Generator	2026-03-30 22:59:55 +00:00