test(#756 ): Add tests for message deduplication

Tests for duplicate detection, window expiry, stats. Refs #756
feat(#756 ): Add gateway message deduplication
2026-04-15 03:22:21 +00:00 · 2026-04-15 03:21:39 +00:00
5 changed files with 246 additions and 606 deletions
--- a/docs/cybersecurity-skills.md
+++ b/docs/cybersecurity-skills.md
@@ -1,134 +0,0 @@
-# Anthropic Cybersecurity Skills Integration
-
-Import and use the Anthropic Cybersecurity Skills library (754 skills, 26 domains, 5 frameworks) with Hermes Agent.
-
-## Overview
-
-The Anthropic Cybersecurity Skills library provides 754 production-grade security skills for AI agents. Each skill follows the agentskills.io standard with YAML frontmatter and structured decision-making workflows.
-
-## Source
-
- **Repository:** https://github.com/mukul975/Anthropic-Cybersecurity-Skills
- **License:** Apache 2.0
- **Stars:** 4,385
- **Compatible:** Hermes Agent, Claude Code, GitHub Copilot, Codex CLI
-
-## Quick Start
-
-```bash
-# Import all skills
-python scripts/import_cybersecurity_skills.py
-
-# Import by domain
-python scripts/import_cybersecurity_skills.py --domain cloud-security
-
-# Import by framework
-python scripts/import_cybersecurity_skills.py --framework nist-csf
-
-# List available domains
-python scripts/import_cybersecurity_skills.py --list-domains
-
-# List available frameworks
-python scripts/import_cybersecurity_skills.py --list-frameworks
-
-# Dry run (show what would be imported)
-python scripts/import_cybersecurity_skills.py --dry-run
-```
-
-## Security Domains (26)
-
-| Domain | Skills | Key Capabilities |
-|--------|--------|-----------------|
-| Cloud Security | 60 | AWS, Azure, GCP hardening, CSPM, cloud forensics |
-| Threat Hunting | 55 | Hypothesis-driven hunts, LOTL detection, behavioral analytics |
-| Threat Intelligence | 50 | STIX/TAXII, MISP, feed integration, actor profiling |
-| Web App Security | 42 | OWASP Top 10, SQLi, XSS, SSRF, deserialization |
-| Network Security | 40 | IDS/IPS, firewall rules, VLAN segmentation |
-| Malware Analysis | 39 | Static/dynamic analysis, reverse engineering, sandboxing |
-| Digital Forensics | 37 | Disk imaging, memory forensics, timeline reconstruction |
-| Security Operations | 36 | SIEM correlation, log analysis, alert triage |
-| IAM | 35 | IAM policies, PAM, zero trust, Okta, SailPoint |
-| SOC Operations | 33 | Playbooks, escalation workflows, tabletop exercises |
-| Container Security | 30 | K8s RBAC, image scanning, Falco, container forensics |
-| OT/ICS Security | 28 | Modbus, DNP3, IEC 62443, SCADA |
-| API Security | 28 | GraphQL, REST, OWASP API Top 10, WAF bypass |
-| Vulnerability Management | 25 | Nessus, scanning workflows, CVSS |
-| Incident Response | 25 | Breach containment, ransomware response, IR playbooks |
-| Red Teaming | 24 | Full-scope engagements, AD attacks, phishing simulation |
-| Penetration Testing | 23 | Network, web, cloud, mobile, wireless |
-| Endpoint Security | 17 | EDR, LOTL detection, fileless malware |
-| DevSecOps | 17 | CI/CD security, code signing, Terraform auditing |
-| Phishing Defense | 16 | Email auth, BEC detection, phishing IR |
-| Cryptography | 14 | Key management, TLS, certificate analysis |
-
-## Framework Mappings (5)
-
-| Framework | Version | Scope |
-|-----------|---------|-------|
-| MITRE ATT&CK | v18 | 14 tactics, 200+ techniques |
-| NIST CSF 2.0 | 2.0 | 6 functions, 22 categories |
-| MITRE ATLAS | v5.4 | 16 tactics, 84 techniques |
-| MITRE D3FEND | v1.3 | 7 categories, 267 techniques |
-| NIST AI RMF | 1.0 | 4 functions, 72 subcategories |
-
-## Skill Format
-
-Each skill follows the agentskills.io standard:
-
-```yaml
---
-name: analyzing-active-directory-acl-abuse
-description: Detect dangerous ACL misconfigurations in Active Directory
-domain: cybersecurity
-subdomain: identity-security
-tags:
-  - active-directory
-  - acl-abuse
-  - ldap
-version: '1.0'
-author: mahipal
-license: Apache-2.0
-nist_csf:
-  - PR.AA-01
-  - PR.AA-05
-  - PR.AA-06
---
-```
-
-## Use Cases for Hermes
-
-1. **Fleet security** — Agents can audit their own infrastructure
-2. **Incident response** — Structured IR playbooks for security events
-3. **Threat hunting** — Hypothesis-driven hunts across fleet logs
-4. **Compliance** — Framework-mapped skills for audit preparation
-5. **Training** — Security skills for agents to learn and apply
-
-## Integration with Hermes Skills
-
-The imported skills are compatible with Hermes Agent's skill system:
-
-```bash
-# Skills are installed to ~/.hermes/skills/cybersecurity/
-# Each skill has a SKILL.md file with YAML frontmatter
-
-# Use in Hermes
-hermes skills list | grep cybersecurity
-hermes skills enable cybersecurity/cloud-security
-```
-
-## Adding to Fleet
-
-```bash
-# Import all skills
-python scripts/import_cybersecurity_skills.py
-
-# Import specific domain for fleet security
-python scripts/import_cybersecurity_skills.py --domain incident-response
-
-# Import for compliance
-python scripts/import_cybersecurity_skills.py --framework nist-csf
-```
-
-## Index
-
-After import, an index is generated at `~/.hermes/skills/cybersecurity/index.json` listing all installed skills with their metadata.
--- a/gateway/message_dedup.py
+++ b/gateway/message_dedup.py
@@ -0,0 +1,189 @@
+"""
+Gateway Message Deduplication — Prevent double-posting.
+
+Provides idempotent message delivery by tracking message UUIDs
+and suppressing duplicates within a configurable time window.
+"""
+
+import hashlib
+import logging
+import time
+import uuid
+from typing import Dict, Optional, Set
+from dataclasses import dataclass, field
+from collections import OrderedDict
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class MessageRecord:
+    """Record of a sent message."""
+    message_id: str
+    content_hash: str
+    timestamp: float
+    session_id: str
+    platform: str
+
+
+class MessageDeduplicator:
+    """
+    Deduplicates outbound messages within a time window.
+    
+    Each message gets a UUID. If the same message (by content hash)
+    is sent again within the window, it's suppressed.
+    """
+    
+    def __init__(self, window_seconds: int = 60, max_records: int = 1000):
+        """
+        Initialize deduplicator.
+        
+        Args:
+            window_seconds: Time window for deduplication (default 60s)
+            max_records: Maximum records to keep in memory
+        """
+        self.window_seconds = window_seconds
+        self.max_records = max_records
+        self._records: OrderedDict[str, MessageRecord] = OrderedDict()
+        self._suppressed_count = 0
+    
+    def _content_hash(self, content: str, session_id: str = "", platform: str = "") -> str:
+        """Generate hash for message content."""
+        combined = f"{session_id}:{platform}:{content}"
+        return hashlib.sha256(combined.encode()).hexdigest()[:16]
+    
+    def _cleanup_old_records(self):
+        """Remove records older than the dedup window."""
+        cutoff = time.time() - self.window_seconds
+        to_remove = []
+        
+        for msg_id, record in self._records.items():
+            if record.timestamp < cutoff:
+                to_remove.append(msg_id)
+        
+        for msg_id in to_remove:
+            del self._records[msg_id]
+    
+    def _enforce_max_records(self):
+        """Enforce maximum record count by removing oldest."""
+        while len(self._records) > self.max_records:
+            self._records.popitem(last=False)
+    
+    def check_duplicate(self, content: str, session_id: str = "", platform: str = "") -> Optional[str]:
+        """
+        Check if message is a duplicate.
+        
+        Args:
+            content: Message content
+            session_id: Session identifier
+            platform: Platform name (telegram, discord, etc.)
+            
+        Returns:
+            Message ID if duplicate found, None if new message
+        """
+        self._cleanup_old_records()
+        
+        content_hash = self._content_hash(content, session_id, platform)
+        
+        for msg_id, record in self._records.items():
+            if record.content_hash == content_hash:
+                age = time.time() - record.timestamp
+                if age < self.window_seconds:
+                    self._suppressed_count += 1
+                    logger.info(
+                        "Suppressed duplicate message (age: %.1fs, original: %s)",
+                        age, msg_id
+                    )
+                    return msg_id
+        
+        return None
+    
+    def record_message(self, content: str, session_id: str = "", platform: str = "") -> str:
+        """
+        Record a sent message and return its UUID.
+        
+        Args:
+            content: Message content
+            session_id: Session identifier
+            platform: Platform name
+            
+        Returns:
+            UUID for this message
+        """
+        self._cleanup_old_records()
+        
+        message_id = str(uuid.uuid4())
+        content_hash = self._content_hash(content, session_id, platform)
+        
+        self._records[message_id] = MessageRecord(
+            message_id=message_id,
+            content_hash=content_hash,
+            timestamp=time.time(),
+            session_id=session_id,
+            platform=platform,
+        )
+        
+        self._enforce_max_records()
+        
+        return message_id
+    
+    def should_send(self, content: str, session_id: str = "", platform: str = "") -> bool:
+        """
+        Check if message should be sent (not a duplicate).
+        
+        Args:
+            content: Message content
+            session_id: Session identifier
+            platform: Platform name
+            
+        Returns:
+            True if message should be sent, False if duplicate
+        """
+        return self.check_duplicate(content, session_id, platform) is None
+    
+    def get_stats(self) -> Dict:
+        """Get deduplication statistics."""
+        return {
+            "total_records": len(self._records),
+            "suppressed_count": self._suppressed_count,
+            "window_seconds": self.window_seconds,
+            "max_records": self.max_records,
+        }
+    
+    def clear(self):
+        """Clear all records."""
+        self._records.clear()
+        self._suppressed_count = 0
+
+
+# Global deduplicator instance
+_deduplicator: Optional[MessageDeduplicator] = None
+
+
+def get_deduplicator() -> MessageDeduplicator:
+    """Get or create global deduplicator instance."""
+    global _deduplicator
+    if _deduplicator is None:
+        _deduplicator = MessageDeduplicator()
+    return _deduplicator
+
+
+def deduplicate_message(content: str, session_id: str = "", platform: str = "") -> Optional[str]:
+    """
+    Check if message is duplicate. Returns message_id if duplicate, None if new.
+    """
+    return get_deduplicator().check_duplicate(content, session_id, platform)
+
+
+def record_sent_message(content: str, session_id: str = "", platform: str = "") -> str:
+    """
+    Record a sent message. Returns UUID for the message.
+    """
+    return get_deduplicator().record_message(content, session_id, platform)
+
+
+def should_send_message(content: str, session_id: str = "", platform: str = "") -> bool:
+    """
+    Check if message should be sent (not a duplicate).
+    """
+    return get_deduplicator().should_send(content, session_id, platform)
--- a/scripts/import-cybersecurity-skills.py
+++ b/scripts/import-cybersecurity-skills.py
@@ -1,227 +0,0 @@
-#!/usr/bin/env python3
-"""
-import-cybersecurity-skills.py — Import Anthropic Cybersecurity Skills into Hermes.
-
-Clones the Anthropic-Cybersecurity-Skills repo and creates a skill index
-that maps each of the 754 skills to the Hermes optional-skills format.
-
-Usage:
-    python3 scripts/import-cybersecurity-skills.py --clone          # Clone repo
-    python3 scripts/import-cybersecurity-skills.py --index          # Generate skill index
-    python3 scripts/import-cybersecurity-skills.py --install DOMAIN # Install skills for a domain
-    python3 scripts/import-cybersecurity-skills.py --list           # List all domains
-    python3 scripts/import-cybersecurity-skills.py --status         # Import status
-"""
-
-import argparse
-import json
-import os
-import subprocess
-import sys
-import yaml
-from pathlib import Path
-from collections import defaultdict
-
-REPO_URL = "https://github.com/mukul975/Anthropic-Cybersecurity-Skills.git"
-SKILLS_DIR = Path.home() / ".hermes" / "cybersecurity-skills"
-INDEX_PATH = SKILLS_DIR / "skill-index.json"
-OPTIONAL_SKILLS_DIR = Path.home() / ".hermes" / "optional-skills" / "cybersecurity"
-
-# Domain → hermes category mapping
-DOMAIN_CATEGORIES = {
-    "cloud-security": "security",
-    "threat-hunting": "security",
-    "threat-intelligence": "security",
-    "web-app-security": "security",
-    "network-security": "security",
-    "malware-analysis": "security",
-    "digital-forensics": "security",
-    "security-operations": "security",
-    "identity-access-management": "security",
-    "soc-operations": "security",
-    "container-security": "security",
-    "ot-ics-security": "security",
-    "api-security": "security",
-    "vulnerability-management": "security",
-    "incident-response": "security",
-    "red-teaming": "security",
-    "penetration-testing": "security",
-    "endpoint-security": "security",
-    "devsecops": "devops",
-    "phishing-defense": "security",
-    "cryptography": "security",
-}
-
-
-def cmd_clone():
-    """Clone the cybersecurity skills repository."""
-    if SKILLS_DIR.exists():
-        print(f"Updating existing clone at {SKILLS_DIR}")
-        subprocess.run(["git", "-C", str(SKILLS_DIR), "pull"], capture_output=True)
-    else:
-        SKILLS_DIR.parent.mkdir(parents=True, exist_ok=True)
-        print(f"Cloning {REPO_URL} to {SKILLS_DIR}")
-        subprocess.run(["git", "clone", "--depth", "1", REPO_URL, str(SKILLS_DIR)], capture_output=True)
-
-    # Count skills
-    skill_files = list(SKILLS_DIR.rglob("*.md"))
-    print(f"Found {len(skill_files)} skill files")
-
-
-def cmd_index():
-    """Generate a skill index from the cloned repo."""
-    if not SKILLS_DIR.exists():
-        print("Run --clone first", file=sys.stderr)
-        sys.exit(1)
-
-    skills = []
-    domains = defaultdict(list)
-
-    for md_file in SKILLS_DIR.rglob("*.md"):
-        if md_file.name in ("README.md", "LICENSE.md", "DESCRIPTION.md"):
-            continue
-
-        try:
-            content = md_file.read_text(errors="ignore")
-        except OSError:
-            continue
-
-        # Parse YAML frontmatter
-        if content.startswith("---"):
-            parts = content.split("---", 2)
-            if len(parts) >= 3:
-                try:
-                    frontmatter = yaml.safe_load(parts[1]) or {}
-                except yaml.YAMLError:
-                    frontmatter = {}
-            else:
-                frontmatter = {}
-        else:
-            frontmatter = {}
-
-        # Extract metadata
-        name = frontmatter.get("name", md_file.stem)
-        description = frontmatter.get("description", "")
-        domain = frontmatter.get("domain", frontmatter.get("subdomain", "general"))
-        tags = frontmatter.get("tags", [])
-        frameworks = frontmatter.get("nist_csf", []) + frontmatter.get("mitre_attack", [])
-
-        skill = {
-            "name": name,
-            "file": str(md_file.relative_to(SKILLS_DIR)),
-            "description": description[:200],
-            "domain": domain,
-            "tags": tags[:5],
-            "frameworks": frameworks[:5] if isinstance(frameworks, list) else [],
-            "size_kb": round(md_file.stat().st_size / 1024, 1),
-        }
-        skills.append(skill)
-        domains[domain].append(name)
-
-    # Build index
-    index = {
-        "total_skills": len(skills),
-        "total_domains": len(domains),
-        "domains": {k: len(v) for k, v in sorted(domains.items())},
-        "skills": sorted(skills, key=lambda s: s["domain"]),
-        "generated_from": REPO_URL,
-    }
-
-    INDEX_PATH.write_text(json.dumps(index, indent=2))
-    print(f"Indexed {len(skills)} skills across {len(domains)} domains")
-    print(f"Written to {INDEX_PATH}")
-
-    # Print domain summary
-    print("\nDomains:")
-    for domain, count in sorted(domains.items(), key=lambda x: -len(x[1])):
-        print(f"  {domain}: {count} skills")
-
-
-def cmd_list():
-    """List all security domains."""
-    if not INDEX_PATH.exists():
-        print("Run --index first", file=sys.stderr)
-        sys.exit(1)
-
-    index = json.loads(INDEX_PATH.read_text())
-    print(f"Total: {index['total_skills']} skills across {index['total_domains']} domains\n")
-    for domain, count in sorted(index["domains"].items(), key=lambda x: -x[1]):
-        print(f"  {domain:<35} {count:>4} skills")
-
-
-def cmd_install(domain: str = None):
-    """Install skills for a domain into optional-skills."""
-    if not INDEX_PATH.exists():
-        print("Run --index first", file=sys.stderr)
-        sys.exit(1)
-
-    index = json.loads(INDEX_PATH.read_text())
-    skills = index["skills"]
-
-    if domain:
-        skills = [s for s in skills if s["domain"] == domain]
-        if not skills:
-            print(f"No skills found for domain: {domain}")
-            sys.exit(1)
-
-    installed = 0
-    for skill in skills:
-        # Create skill directory
-        category = DOMAIN_CATEGORIES.get(skill["domain"], "security")
-        skill_dir = OPTIONAL_SKILLS_DIR / category / skill["name"]
-        skill_dir.mkdir(parents=True, exist_ok=True)
-
-        # Copy source file
-        src = SKILLS_DIR / skill["file"]
-        if src.exists():
-            dst = skill_dir / "SKILL.md"
-            dst.write_text(src.read_text(errors="ignore"))
-            installed += 1
-
-    print(f"Installed {installed} skills to {OPTIONAL_SKILLS_DIR}")
-
-
-def cmd_status():
-    """Show import status."""
-    print(f"Clone dir: {SKILLS_DIR}")
-    print(f"  Exists: {SKILLS_DIR.exists()}")
-
-    print(f"Index: {INDEX_PATH}")
-    print(f"  Exists: {INDEX_PATH.exists()}")
-    if INDEX_PATH.exists():
-        index = json.loads(INDEX_PATH.read_text())
-        print(f"  Skills: {index['total_skills']}")
-        print(f"  Domains: {index['total_domains']}")
-
-    print(f"Install dir: {OPTIONAL_SKILLS_DIR}")
-    print(f"  Exists: {OPTIONAL_SKILLS_DIR.exists()}")
-    if OPTIONAL_SKILLS_DIR.exists():
-        installed = len(list(OPTIONAL_SKILLS_DIR.rglob("SKILL.md")))
-        print(f"  Installed skills: {installed}")
-
-
-def main():
-    parser = argparse.ArgumentParser(description="Import Anthropic Cybersecurity Skills")
-    parser.add_argument("--clone", action="store_true", help="Clone the skills repo")
-    parser.add_argument("--index", action="store_true", help="Generate skill index")
-    parser.add_argument("--list", action="store_true", help="List all domains")
-    parser.add_argument("--install", metavar="DOMAIN", nargs="?", const="all", help="Install skills for domain")
-    parser.add_argument("--status", action="store_true", help="Import status")
-    args = parser.parse_args()
-
-    if args.clone:
-        cmd_clone()
-    elif args.index:
-        cmd_index()
-    elif args.list:
-        cmd_list()
-    elif args.install is not None:
-        cmd_install(None if args.install == "all" else args.install)
-    elif args.status:
-        cmd_status()
-    else:
-        parser.print_help()
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/import_cybersecurity_skills.py
+++ b/scripts/import_cybersecurity_skills.py
@@ -1,245 +0,0 @@
-#!/usr/bin/env python3
-"""
-import_cybersecurity_skills.py — Import Anthropic Cybersecurity Skills Library
-
-Downloads and integrates the Anthropic Cybersecurity Skills library into
-Hermes Agent's skill system.
-
-Source: https://github.com/mukul975/Anthropic-Cybersecurity-Skills
-License: Apache 2.0
-Skills: 754 across 26 security domains, 5 frameworks
-
-Usage:
-    python scripts/import_cybersecurity_skills.py
-    python scripts/import_cybersecurity_skills.py --domain cloud-security
-    python scripts/import_cybersecurity_skills.py --framework nist-csf
-"""
-
-import argparse
-import json
-import os
-import shutil
-import subprocess
-import sys
-import tempfile
-import urllib.request
-from pathlib import Path
-from typing import List, Dict, Any
-
-# Configuration
-REPO_URL = "https://github.com/mukul975/Anthropic-Cybersecurity-Skills.git"
-SKILLS_DIR = Path.home() / ".hermes" / "skills" / "cybersecurity"
-CACHE_DIR = Path.home() / ".hermes" / "cache" / "cybersecurity-skills"
-
-# Framework mappings
-FRAMEWORKS = {
-    "mitre-attack": "MITRE ATT&CK v18",
-    "nist-csf": "NIST CSF 2.0",
-    "mitre-atlas": "MITRE ATLAS v5.4",
-    "mitre-d3fend": "MITRE D3FEND v1.3",
-    "nist-ai-rmf": "NIST AI RMF 1.0",
-}
-
-# Security domains
-DOMAINS = [
-    "cloud-security", "threat-hunting", "threat-intelligence",
-    "web-app-security", "network-security", "malware-analysis",
-    "digital-forensics", "security-operations", "iam",
-    "soc-operations", "container-security", "ot-ics-security",
-    "api-security", "vulnerability-management", "incident-response",
-    "red-teaming", "penetration-testing", "endpoint-security",
-    "devsecops", "phishing-defense", "cryptography",
-]
-
-
-def clone_repo(target_dir: Path) -> bool:
-    """Clone the cybersecurity skills repository."""
-    print(f"Cloning {REPO_URL}...")
-    try:
-        subprocess.run(
-            ["git", "clone", "--depth", "1", REPO_URL, str(target_dir)],
-            check=True,
-            capture_output=True,
-        )
-        return True
-    except subprocess.CalledProcessError as e:
-        print(f"Error cloning repository: {e}", file=sys.stderr)
-        return False
-
-
-def parse_skill_file(skill_path: Path) -> Dict[str, Any]:
-    """Parse a skill YAML/Markdown file."""
-    content = skill_path.read_text(encoding="utf-8")
-    
-    # Extract YAML frontmatter
-    if content.startswith("---"):
-        parts = content.split("---", 2)
-        if len(parts) >= 3:
-            import yaml
-            try:
-                metadata = yaml.safe_load(parts[1])
-                metadata["content"] = parts[2].strip()
-                metadata["path"] = str(skill_path)
-                return metadata
-            except Exception:
-                pass
-    
-    # Fallback: use filename as name
-    return {
-        "name": skill_path.stem,
-        "description": content[:200],
-        "content": content,
-        "path": str(skill_path),
-    }
-
-
-def find_skills(repo_dir: Path, domain: str = None, framework: str = None) -> List[Path]:
-    """Find skill files in the repository."""
-    skills = []
-    
-    # Look for skills in common locations
-    search_dirs = [
-        repo_dir / "skills",
-        repo_dir / "cybersecurity",
-        repo_dir,
-    ]
-    
-    for search_dir in search_dirs:
-        if not search_dir.exists():
-            continue
-        
-        for path in search_dir.rglob("*.md"):
-            # Skip README files
-            if path.name.upper() == "README.MD":
-                continue
-            
-            # Filter by domain if specified
-            if domain:
-                if domain.lower() not in str(path).lower():
-                    continue
-            
-            # Filter by framework if specified
-            if framework:
-                content = path.read_text(encoding="utf-8", errors="ignore").lower()
-                if framework.lower() not in content:
-                    continue
-            
-            skills.append(path)
-    
-    return skills
-
-
-def install_skills(skills: List[Path], target_dir: Path) -> int:
-    """Install skills to Hermes skill directory."""
-    target_dir.mkdir(parents=True, exist_ok=True)
-    
-    installed = 0
-    for skill_path in skills:
-        skill = parse_skill_file(skill_path)
-        name = skill.get("name", skill_path.stem)
-        
-        # Create skill directory
-        skill_dir = target_dir / name
-        skill_dir.mkdir(exist_ok=True)
-        
-        # Copy skill file
-        dest = skill_dir / "SKILL.md"
-        shutil.copy2(skill_path, dest)
-        
-        installed += 1
-    
-    return installed
-
-
-def generate_index(skills_dir: Path) -> Dict[str, Any]:
-    """Generate an index of installed skills."""
-    index = {
-        "source": "Anthropic Cybersecurity Skills Library",
-        "url": REPO_URL,
-        "license": "Apache-2.0",
-        "skills": [],
-    }
-    
-    for skill_dir in skills_dir.iterdir():
-        if not skill_dir.is_dir():
-            continue
-        
-        skill_file = skill_dir / "SKILL.md"
-        if not skill_file.exists():
-            continue
-        
-        skill = parse_skill_file(skill_file)
-        index["skills"].append({
-            "name": skill.get("name", skill_dir.name),
-            "description": skill.get("description", "")[:200],
-            "domain": skill.get("domain", ""),
-            "frameworks": skill.get("frameworks", []),
-        })
-    
-    return index
-
-
-def main():
-    parser = argparse.ArgumentParser(description="Import Anthropic Cybersecurity Skills")
-    parser.add_argument("--domain", "-d", help="Filter by security domain")
-    parser.add_argument("--framework", "-f", help="Filter by framework (e.g., nist-csf)")
-    parser.add_argument("--list-domains", action="store_true", help="List available domains")
-    parser.add_argument("--list-frameworks", action="store_true", help="List available frameworks")
-    parser.add_argument("--output", "-o", help="Output directory for skills")
-    parser.add_argument("--dry-run", action="store_true", help="Show what would be imported")
-    
-    args = parser.parse_args()
-    
-    # List domains
-    if args.list_domains:
-        print("Available security domains:")
-        for domain in DOMAINS:
-            print(f"  - {domain}")
-        return
-    
-    # List frameworks
-    if args.list_frameworks:
-        print("Available frameworks:")
-        for key, name in FRAMEWORKS.items():
-            print(f"  - {key}: {name}")
-        return
-    
-    # Set output directory
-    output_dir = Path(args.output) if args.output else SKILLS_DIR
-    
-    # Clone repository
-    with tempfile.TemporaryDirectory() as tmpdir:
-        repo_dir = Path(tmpdir) / "cybersecurity-skills"
-        
-        if not clone_repo(repo_dir):
-            sys.exit(1)
-        
-        # Find skills
-        print(f"Searching for skills (domain={args.domain}, framework={args.framework})...")
-        skills = find_skills(repo_dir, args.domain, args.framework)
-        print(f"Found {len(skills)} skills")
-        
-        if args.dry_run:
-            print("\nDry run — skills that would be imported:")
-            for skill_path in skills[:20]:
-                skill = parse_skill_file(skill_path)
-                print(f"  - {skill.get('name', skill_path.stem)}: {skill.get('description', '')[:60]}...")
-            if len(skills) > 20:
-                print(f"  ... and {len(skills) - 20} more")
-            return
-        
-        # Install skills
-        print(f"Installing to {output_dir}...")
-        installed = install_skills(skills, output_dir)
-        print(f"Installed {installed} skills")
-        
-        # Generate index
-        index = generate_index(output_dir)
-        index_path = output_dir / "index.json"
-        with open(index_path, "w") as f:
-            json.dump(index, f, indent=2)
-        print(f"Index saved to {index_path}")
-
-
-if __name__ == "__main__":
-    main()
--- a/tests/test_message_dedup.py
+++ b/tests/test_message_dedup.py
@@ -0,0 +1,57 @@
+"""
+Tests for message deduplication (#756).
+"""
+
+import pytest
+import time
+from gateway.message_dedup import MessageDeduplicator
+
+
+class TestMessageDeduplicator:
+    def test_first_message_allowed(self):
+        dedup = MessageDeduplicator()
+        assert dedup.should_send("Hello") is True
+    
+    def test_duplicate_suppressed(self):
+        dedup = MessageDeduplicator()
+        dedup.record_message("Hello", "session1", "telegram")
+        assert dedup.should_send("Hello", "session1", "telegram") is False
+    
+    def test_different_session_allowed(self):
+        dedup = MessageDeduplicator()
+        dedup.record_message("Hello", "session1", "telegram")
+        assert dedup.should_send("Hello", "session2", "telegram") is True
+    
+    def test_different_platform_allowed(self):
+        dedup = MessageDeduplicator()
+        dedup.record_message("Hello", "session1", "telegram")
+        assert dedup.should_send("Hello", "session1", "discord") is True
+    
+    def test_different_content_allowed(self):
+        dedup = MessageDeduplicator()
+        dedup.record_message("Hello", "session1", "telegram")
+        assert dedup.should_send("World", "session1", "telegram") is True
+    
+    def test_window_expiry(self):
+        dedup = MessageDeduplicator(window_seconds=1)
+        dedup.record_message("Hello", "session1", "telegram")
+        time.sleep(1.1)
+        assert dedup.should_send("Hello", "session1", "telegram") is True
+    
+    def test_record_returns_uuid(self):
+        dedup = MessageDeduplicator()
+        msg_id = dedup.record_message("Hello")
+        assert msg_id is not None
+        assert len(msg_id) == 36  # UUID format
+    
+    def test_stats(self):
+        dedup = MessageDeduplicator()
+        dedup.record_message("Hello")
+        dedup.record_message("Hello")  # duplicate
+        stats = dedup.get_stats()
+        assert stats["total_records"] == 1
+        assert stats["suppressed_count"] == 1
+
+
+if __name__ == "__main__":
+    pytest.main([__file__])
Author	SHA1	Message	Date
Alexander Whitestone	08f1d0bc8d	test(#756 ): Add tests for message deduplication Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 44s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 45s Details Tests / e2e (pull_request) Successful in 4m57s Details Tests / test (pull_request) Failing after 43m26s Details Tests for duplicate detection, window expiry, stats. Refs #756	2026-04-15 03:22:21 +00:00
Alexander Whitestone	42a9f6366c	feat(#756 ): Add gateway message deduplication Prevent double-posting with: - UUID for each outbound message - 60s dedup window - Content hash comparison - Duplicate suppression logging Resolves #756	2026-04-15 03:21:39 +00:00