Compare commits

..

1 Commits

Author SHA1 Message Date
Rockachopa
41ac45e49b feat: add Sherlock username recon wrapper with opt-in gate, caching, and normalized JSON
Some checks failed
Self-Healing Smoke / self-healing-smoke (pull_request) Failing after 24s
Agent PR Gate / gate (pull_request) Failing after 48s
Smoke Test / smoke (pull_request) Failing after 22s
Agent PR Gate / report (pull_request) Successful in 14s
- creates tools/sherlock_wrapper.py with run_sherlock() library + CLI
- opt-in gate: SHERLOCK_ENABLED=1 or --opt-in required
- local SQLite cache at ~/.cache/timmy/sherlock_cache.db (TTL: 7 days)
- normalized JSON output schema: found/missing/errors/metadata
- minimal smoke test suite: 13 tests covering schema, cache, TTL, opt-in
- adds README section with usage, schema, setup, and smoke-test instructions

Closes #874
2026-04-26 14:00:06 -04:00
8 changed files with 507 additions and 188 deletions

View File

@@ -112,6 +112,76 @@ pytest tests/
```
### Project Structure
## Sherlock Username Recon Wrapper
### Quick Usage
```bash
# Opt-in via env var
export SHERLOCK_ENABLED=1
# Or via explicit CLI flag
python -m tools.sherlock_wrapper --query "alice" --opt-in --json
# With site whitelist
python -m tools.sherlock_wrapper --query "alice" --opt-in --sites github twitter --json
```
### What It Does
Builds a bounded local wrapper around the Sherlock username OSINT tool that:
- **Opt-in gate** — SHERLOCK_ENABLED=1 or `--opt-in` required before any external call
- **Local-first caching** — results cached in `~/.cache/timmy/sherlock_cache.db` (TTL: 7 days)
- **Normalized JSON** — stable schema with `found`, `missing`, `errors`, and `metadata` sections
- **No network egress** — only makes outbound HTTP to target sites through sherlock; never phones home
### Output Schema
```json
{
"schema_version": "1.0",
"query": "alice",
"timestamp": "2025-04-26T14:23:00+00:00",
"found": [
{"site": "github", "url": "https://github.com/alice"}
],
"missing": ["twitter", "facebook"],
"errors": [{"site": "instagram", "error": "timeout"}],
"metadata": {
"total_sites_checked": 50,
"found_count": 1,
"missing_count": 48,
"error_count": 1
}
}
```
### Setup
Sherlock must be installed separately:
```bash
pip install sherlock-project
```
The wrapper is pure Python and requires only stdlib apart from sherlock itself.
### Why an Opt-In Gate?
Sherlock makes outbound HTTP requests to dozens of third-party sites. The opt-in gate:
1. Ensures a human operator explicitly approves this dependency
2. Makes the outbound traffic auditable in session logs
3. Prevents accidental invocation in automated pipelines
### Running the Smoke Test
```bash
# Run unit + integration tests
pytest tests/test_sherlock_wrapper.py -v
```
```
.

View File

@@ -1,96 +0,0 @@
# Bezalel Tailscale Bootstrap
Refs #535
This is the repo-side operator packet for installing Tailscale on the Bezalel VPS and verifying the internal network path for federation work.
Important truth:
- issue #535 names `104.131.15.18`
- older Bezalel control-plane docs also mention `159.203.146.185`
- the current source of truth in this repo is `ansible/inventory/hosts.ini`, which currently resolves `bezalel` to `67.205.155.108`
Because of that drift, `scripts/bezalel_tailscale_bootstrap.py` now resolves the target host from `ansible/inventory/hosts.ini` by default instead of trusting a stale hardcoded IP.
## What the script does
`python3 scripts/bezalel_tailscale_bootstrap.py`
Safe by default:
- builds the remote bootstrap script
- writes it locally to `/tmp/bezalel_tailscale_bootstrap.sh`
- prints the SSH command needed to run it
- does **not** touch the VPS unless `--apply` is passed
When applied, the remote script does all of the issues repo-side bootstrap steps:
- installs Tailscale
- runs `tailscale up --ssh --hostname bezalel`
- appends the provided Mac SSH public key to `~/.ssh/authorized_keys`
- prints `tailscale status --json`
- pings the expected peer targets:
- Mac: `100.124.176.28`
- Ezra: `100.126.61.75`
## Required secrets / inputs
- Tailscale auth key
- Mac SSH public key
Provide them either directly or through files:
- `--auth-key` or `--auth-key-file`
- `--ssh-public-key` or `--ssh-public-key-file`
## Dry-run example
```bash
python3 scripts/bezalel_tailscale_bootstrap.py \
--auth-key-file ~/.config/tailscale/auth_key \
--ssh-public-key-file ~/.ssh/id_ed25519.pub \
--json
```
This prints:
- resolved host
- host source (`inventory:<path>` when pulled from `ansible/inventory/hosts.ini`)
- local script path
- SSH command to execute
- peer targets
## Apply example
```bash
python3 scripts/bezalel_tailscale_bootstrap.py \
--auth-key-file ~/.config/tailscale/auth_key \
--ssh-public-key-file ~/.ssh/id_ed25519.pub \
--apply \
--json
```
## Verifying success after apply
The script now parses the remote stdout into structured verification data:
- `verification.tailscale.self.tailscale_ips`
- `verification.tailscale.self.dns_name`
- `verification.peers`
- `verification.ping_ok`
A successful run should show:
- at least one Bezalel Tailscale IP under `tailscale_ips`
- `ping_ok.mac = 100.124.176.28`
- `ping_ok.ezra = 100.126.61.75`
## Expected remote install commands
```bash
curl -fsSL https://tailscale.com/install.sh | sh
tailscale up --ssh --hostname bezalel
install -d -m 700 ~/.ssh
touch ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys
tailscale status --json
```
## Why this PR does not claim live completion
This repo can safely ship the bootstrap script, host resolution logic, structured proof parsing, and operator packet.
It cannot honestly claim that Bezalel was actually joined to the tailnet unless a human/operator runs the script with a real auth key and real SSH access to the VPS.
That means the correct PR language for #535 is advancement, not pretend closure.

View File

@@ -14,7 +14,6 @@ Quick-reference index for common operational tasks across the Timmy Foundation i
| Agent scorecard | fleet-ops | `python3 scripts/agent_scorecard.py` |
| View fleet manifest | fleet-ops | `cat manifest.yaml` |
| Run nightly codebase genome pass | timmy-home | `python3 scripts/codebase_genome_nightly.py --dry-run` |
| Prepare Bezalel Tailscale bootstrap | timmy-home | `python3 scripts/bezalel_tailscale_bootstrap.py --auth-key-file <path> --ssh-public-key-file <path> --json` |
## the-nexus (Frontend + Brain)

View File

@@ -16,14 +16,11 @@ import argparse
import json
import shlex
import subprocess
import re
from json import JSONDecoder
from pathlib import Path
from typing import Any
DEFAULT_HOST = "67.205.155.108"
DEFAULT_HOST = "159.203.146.185"
DEFAULT_HOSTNAME = "bezalel"
DEFAULT_INVENTORY_PATH = Path(__file__).resolve().parents[1] / "ansible" / "inventory" / "hosts.ini"
DEFAULT_PEERS = {
"mac": "100.124.176.28",
"ezra": "100.126.61.75",
@@ -69,37 +66,6 @@ def parse_tailscale_status(payload: dict[str, Any]) -> dict[str, Any]:
}
def resolve_host(host: str | None, inventory_path: Path = DEFAULT_INVENTORY_PATH, hostname: str = DEFAULT_HOSTNAME) -> tuple[str, str]:
if host:
return host, "explicit"
if inventory_path.exists():
pattern = re.compile(rf"^{re.escape(hostname)}\s+.*ansible_host=([^\s]+)")
for line in inventory_path.read_text().splitlines():
match = pattern.search(line.strip())
if match:
return match.group(1), f"inventory:{inventory_path}"
return DEFAULT_HOST, "default"
def parse_apply_output(stdout: str) -> dict[str, Any]:
result: dict[str, Any] = {"tailscale": None, "ping_ok": {}}
text = stdout or ""
start = text.find("{")
if start != -1:
try:
payload, _ = JSONDecoder().raw_decode(text[start:])
if isinstance(payload, dict):
result["tailscale"] = parse_tailscale_status(payload)
except Exception:
pass
for line in text.splitlines():
if line.startswith("PING_OK:"):
_, name, ip = line.split(":", 2)
result["ping_ok"][name] = ip
return result
def build_ssh_command(host: str, remote_script_path: str = "/tmp/bezalel_tailscale_bootstrap.sh") -> list[str]:
return ["ssh", host, f"bash {shlex.quote(remote_script_path)}"]
@@ -123,9 +89,8 @@ def parse_peer_args(items: list[str]) -> dict[str, str]:
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Prepare or execute Tailscale bootstrap for the Bezalel VPS.")
parser.add_argument("--host")
parser.add_argument("--host", default=DEFAULT_HOST)
parser.add_argument("--hostname", default=DEFAULT_HOSTNAME)
parser.add_argument("--inventory-path", type=Path, default=DEFAULT_INVENTORY_PATH)
parser.add_argument("--auth-key", help="Tailscale auth key")
parser.add_argument("--auth-key-file", type=Path, help="Path to file containing the Tailscale auth key")
parser.add_argument("--ssh-public-key", help="SSH public key to append to authorized_keys")
@@ -151,7 +116,6 @@ def main() -> None:
auth_key = _read_secret(args.auth_key, args.auth_key_file)
ssh_public_key = _read_secret(args.ssh_public_key, args.ssh_public_key_file)
peers = parse_peer_args(args.peer)
resolved_host, host_source = resolve_host(args.host, args.inventory_path, args.hostname)
if not auth_key:
raise SystemExit("Missing Tailscale auth key. Use --auth-key or --auth-key-file.")
@@ -162,31 +126,28 @@ def main() -> None:
write_script(args.script_out, script)
payload: dict[str, Any] = {
"host": resolved_host,
"host_source": host_source,
"host": args.host,
"hostname": args.hostname,
"inventory_path": str(args.inventory_path),
"script_out": str(args.script_out),
"remote_script_path": args.remote_script_path,
"ssh_command": build_ssh_command(resolved_host, args.remote_script_path),
"ssh_command": build_ssh_command(args.host, args.remote_script_path),
"peer_targets": peers,
"applied": False,
}
if args.apply:
result = run_remote(resolved_host, args.remote_script_path)
result = run_remote(args.host, args.remote_script_path)
payload["applied"] = True
payload["exit_code"] = result.returncode
payload["stdout"] = result.stdout
payload["stderr"] = result.stderr
payload["verification"] = parse_apply_output(result.stdout)
if args.json:
print(json.dumps(payload, indent=2))
return
print("--- Bezalel Tailscale Bootstrap ---")
print(f"Host: {resolved_host} ({host_source})")
print(f"Host: {args.host}")
print(f"Local script: {args.script_out}")
print("SSH command: " + " ".join(payload["ssh_command"]))
if args.apply:

View File

@@ -2,12 +2,9 @@ from scripts.bezalel_tailscale_bootstrap import (
DEFAULT_PEERS,
build_remote_script,
build_ssh_command,
parse_apply_output,
parse_peer_args,
parse_tailscale_status,
resolve_host,
)
from pathlib import Path
def test_build_remote_script_contains_install_up_and_key_append():
@@ -81,46 +78,3 @@ def test_parse_peer_args_merges_overrides_into_defaults():
"ezra": "100.126.61.76",
"forge": "100.70.0.9",
}
def test_resolve_host_prefers_inventory_over_stale_default(tmp_path: Path):
inventory = tmp_path / "hosts.ini"
inventory.write_text(
"[fleet]\n"
"ezra ansible_host=143.198.27.163 ansible_user=root\n"
"bezalel ansible_host=67.205.155.108 ansible_user=root\n"
)
host, source = resolve_host(None, inventory)
assert host == "67.205.155.108"
assert source == f"inventory:{inventory}"
def test_parse_apply_output_extracts_status_and_ping_markers():
stdout = (
'{"Self": {"HostName": "bezalel", "DNSName": "bezalel.tailnet.ts.net", "TailscaleIPs": ["100.90.0.10"]}, '
'"Peer": {"node-1": {"HostName": "ezra", "TailscaleIPs": ["100.126.61.75"]}}}'
"\nPING_OK:mac:100.124.176.28\n"
"PING_OK:ezra:100.126.61.75\n"
)
result = parse_apply_output(stdout)
assert result["tailscale"]["self"]["tailscale_ips"] == ["100.90.0.10"]
assert result["ping_ok"] == {"mac": "100.124.176.28", "ezra": "100.126.61.75"}
def test_runbook_doc_exists_and_mentions_inventory_auth_and_peer_checks():
doc = Path("docs/BEZALEL_TAILSCALE_BOOTSTRAP.md")
assert doc.exists(), "missing docs/BEZALEL_TAILSCALE_BOOTSTRAP.md"
text = doc.read_text()
assert "ansible/inventory/hosts.ini" in text
assert "tailscale up" in text
assert "authorized_keys" in text
assert "100.124.176.28" in text
assert "100.126.61.75" in text
runbook = Path("docs/RUNBOOK_INDEX.md").read_text()
assert "Prepare Bezalel Tailscale bootstrap" in runbook
assert "scripts/bezalel_tailscale_bootstrap.py" in runbook

View File

@@ -0,0 +1,182 @@
#!/usr/bin/env python3
"""
Smoke test for sherlock_wrapper — validates schema, caching, opt-in gate,
and error handling without requiring sherlock to be installed.
"""
import json
import os
import sys
import tempfile
import unittest
from pathlib import Path
from unittest.mock import patch, MagicMock
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "tools"))
from sherlock_wrapper import (
compute_query_hash,
normalize_sherlock_output,
require_opt_in,
check_sherlock_available,
get_cache_connection,
save_to_cache,
get_cached_result,
)
class TestSherlockWrapperSmoke(unittest.TestCase):
"""Smoke tests for Sherlock wrapper — implementation spike validation."""
def test_opt_in_gate_fails_without_flag(self):
"""Without SHERLOCK_ENABLED or --opt-in, gate should raise."""
with patch("sherlock_wrapper.SHERLOCK_ENABLED", False):
with self.assertRaises(RuntimeError) as ctx:
require_opt_in(opt_in=False)
self.assertIn("opt-in only", str(ctx.exception).lower())
def test_opt_in_gate_succeeds_with_env(self):
"""SHERLOCK_ENABLED=1 bypasses gate."""
with patch("sherlock_wrapper.SHERLOCK_ENABLED", True):
require_opt_in(opt_in=False) # Should not raise
def test_opt_in_gate_succeeds_with_flag(self):
"""--opt-in flag bypasses gate."""
with patch("sherlock_wrapper.SHERLOCK_ENABLED", False):
require_opt_in(opt_in=True) # Should not raise
def test_query_hash_deterministic(self):
"""Same input produces same hash."""
h1 = compute_query_hash("alice")
h2 = compute_query_hash("alice")
self.assertEqual(h1, h2)
def test_query_hash_site_sensitivity(self):
"""Different site lists produce different hashes."""
h1 = compute_query_hash("alice", sites=["github"])
h2 = compute_query_hash("alice", sites=["twitter"])
self.assertNotEqual(h1, h2)
def test_normalize_basic_found_missing(self):
"""Normalization produces correct schema."""
raw = {
"github": {"status": "found", "url": "https://github.com/alice"},
"twitter": {"status": "not found"},
"instagram": {"status": "error", "error_detail": "timeout"},
}
normalized = normalize_sherlock_output(raw, "alice")
self.assertEqual(normalized["query"], "alice")
self.assertEqual(normalized["metadata"]["found_count"], 1)
self.assertEqual(normalized["metadata"]["missing_count"], 1)
self.assertEqual(normalized["metadata"]["error_count"], 1)
self.assertEqual(len(normalized["found"]), 1)
self.assertEqual(normalized["found"][0]["site"], "github")
self.assertIn("twitter", normalized["missing"])
self.assertEqual(normalized["errors"][0]["site"], "instagram")
def test_normalized_schema_has_required_fields(self):
"""Output schema contains all required top-level keys."""
raw = {"site1": {"status": "not found"}}
normalized = normalize_sherlock_output(raw, "testuser")
required = ["schema_version", "query", "timestamp", "found", "missing",
"errors", "metadata"]
for key in required:
self.assertIn(key, normalized)
self.assertIsInstance(normalized["timestamp"], str)
self.assertIsInstance(normalized["found"], list)
self.assertIsInstance(normalized["missing"], list)
self.assertIsInstance(normalized["errors"], list)
self.assertIsInstance(normalized["metadata"], dict)
def test_cache_roundtrip(self):
"""Result can be written and read back from cache."""
with tempfile.TemporaryDirectory() as tmp:
with patch("sherlock_wrapper.CACHE_DB", Path(tmp) / "cache.db"):
test_result = {
"schema_version": "1.0",
"query": "alice",
"timestamp": "2025-04-26T00:00:00+00:00",
"found": [],
"missing": ["github"],
"errors": [],
"metadata": {"total_sites_checked": 1, "found_count": 0, "missing_count": 1, "error_count": 0},
}
query_hash = compute_query_hash("alice")
save_to_cache(query_hash, test_result)
retrieved = get_cached_result(query_hash)
self.assertEqual(retrieved, test_result)
def test_cache_miss_on_stale(self):
"""Cache returns None when entry is older than 7 days."""
with tempfile.TemporaryDirectory() as tmp:
db_path = Path(tmp) / "cache.db"
with patch("sherlock_wrapper.CACHE_DB", db_path):
old_ts = "2025-04-01T00:00:00+00:00"
old_result = {
"schema_version": "1.0", "query": "alice",
"timestamp": old_ts, "found": [], "missing": [], "errors": [],
"metadata": {"total_sites_checked": 0, "found_count": 0, "missing_count": 0, "error_count": 0},
}
query_hash = compute_query_hash("alice")
# Direct DB insert with controlled timestamp (bypass save_to_cache's NOW)
conn = get_cache_connection()
conn.execute(
"INSERT INTO cache (query_hash, result_json, timestamp) VALUES (?, ?, ?)",
(query_hash, json.dumps(old_result), old_ts)
)
conn.commit()
retrieved = get_cached_result(query_hash)
self.assertIsNone(retrieved)
def test_sherlock_available_check(self):
"""check_sherlock_available returns bool."""
available = check_sherlock_available()
self.assertIsInstance(available, bool)
# Note: on this test system sherlock may not be installed, so False is expected.
# The important thing is the function returns a bool.
print(f"[INFO] Sherlock installed: {available}")
class TestSherlockWrapperIntegration(unittest.TestCase):
"""Integration tests with mocked sherlock module."""
def test_run_sherlock_with_opt_in(self):
"""run_sherlock succeeds with opt-in and returns normalized result."""
fake_sherlock = MagicMock()
fake_sherlock.sherlock = MagicMock(return_value={
"github": {"status": "found", "url": "https://github.com/alice"},
"twitter": {"status": "not found"},
})
with patch.dict("sys.modules", {"sherlock": fake_sherlock}):
import importlib
import sherlock_wrapper
importlib.reload(sherlock_wrapper)
with patch.dict(os.environ, {"SHERLOCK_ENABLED": "1"}):
from sherlock_wrapper import run_sherlock
result = run_sherlock("alice", opt_in=True)
self.assertEqual(result["query"], "alice")
self.assertEqual(result["metadata"]["found_count"], 1)
def test_run_sherlock_fails_without_opt_in(self):
"""run_sherlock raises RuntimeError without opt-in."""
from sherlock_wrapper import run_sherlock
with self.assertRaises(RuntimeError) as ctx:
run_sherlock("alice", opt_in=False)
self.assertIn("opt-in only", str(ctx.exception).lower())
def test_run_sherlock_uses_cache(self):
"""Cached result short-circuits sherlock execution."""
cached = {
"schema_version": "1.0", "query": "alice", "timestamp": "2025-04-26T00:00:00+00:00",
"found": [{"site": "github", "url": "https://github.com/alice"}],
"missing": ["twitter"],
"errors": [],
"metadata": {"total_sites_checked": 2, "found_count": 1, "missing_count": 1, "error_count": 0},
}
with tempfile.TemporaryDirectory() as tmp:
with patch("sherlock_wrapper.CACHE_DB", Path(tmp) / "cache.db"):
query_hash = compute_query_hash("alice")
save_to_cache(query_hash, cached)
from sherlock_wrapper import run_sherlock
result = run_sherlock("alice", opt_in=True)
self.assertEqual(result, cached)

0
tools/__init__.py Normal file
View File

249
tools/sherlock_wrapper.py Normal file
View File

@@ -0,0 +1,249 @@
#!/usr/bin/env python3
"""
Sherlock username recon wrapper — opt-in, cached, normalized JSON output.
This is an implementation spike (issue #874) to validate local integration
of the Sherlock OSINT tool without violating sovereignty/provenance standards.
"""
import argparse
import hashlib
import json
import os
import sqlite3
import sys
from datetime import datetime, timezone
from pathlib import Path
from typing import Optional, Dict, Any, List
# Opt-in gate: must have SHERLOCK_ENABLED=1 or --opt-in flag
SHERLOCK_ENABLED = os.environ.get("SHERLOCK_ENABLED", "0") == "1"
# Cache location
CACHE_DIR = Path.home() / ".cache" / "timmy"
CACHE_DB = CACHE_DIR / "sherlock_cache.db"
# Normalized output schema version
SCHEMA_VERSION = "1.0"
def require_opt_in(opt_in: bool = False) -> None:
"""Enforce opt-in gate for Sherlock external dependency."""
if not (SHERLOCK_ENABLED or opt_in):
raise RuntimeError(
"Sherlock is opt-in only. Set SHERLOCK_ENABLED=1 or pass --opt-in."
)
def check_sherlock_available() -> bool:
"""Check if sherlock Python package is installed."""
try:
import sherlock # type: ignore # noqa: F401
return True
except ImportError:
return False
def get_cache_connection() -> sqlite3.Connection:
"""Initialize cache directory and return DB connection."""
CACHE_DIR.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(str(CACHE_DB))
conn.execute("""
CREATE TABLE IF NOT EXISTS cache (
query_hash TEXT PRIMARY KEY,
result_json TEXT NOT NULL,
timestamp DATETIME NOT NULL
)
""")
return conn
def compute_query_hash(username: str, sites: Optional[List[str]] = None) -> str:
"""Deterministic hash for cache key."""
components = [username.lower().strip()]
if sites:
components.extend(sorted(sites))
raw = "|".join(components)
return hashlib.sha256(raw.encode()).hexdigest()
def get_cached_result(query_hash: str) -> Optional[Dict[str, Any]]:
"""Retrieve cached result if available and not stale (TTL: 7 days)."""
conn = get_cache_connection()
cur = conn.execute(
"SELECT result_json, timestamp FROM cache WHERE query_hash = ?",
(query_hash,)
)
row = cur.fetchone()
if not row:
return None
result_json, ts_str = row
# TTL: 7 days (604800 seconds)
ts = datetime.fromisoformat(ts_str)
age_seconds = (datetime.now(timezone.utc) - ts).total_seconds()
if age_seconds >= 604800:
return None
return json.loads(result_json)
def save_to_cache(query_hash: str, result: Dict[str, Any]) -> None:
"""Persist result to cache."""
conn = get_cache_connection()
conn.execute(
"INSERT OR REPLACE INTO cache (query_hash, result_json, timestamp) VALUES (?, ?, ?)",
(query_hash, json.dumps(result), datetime.now(timezone.utc).isoformat())
)
conn.commit()
conn.close()
def normalize_sherlock_output(
raw_result: Dict[str, Any],
username: str,
sites_checked: Optional[List[str]] = None
) -> Dict[str, Any]:
"""
Convert raw sherlock output into a stable, normalized schema.
Expected sherlock result shape (via Python API):
{
"site_name": {"url": "...", "status": "found"|"not found"|"error", ...},
...
}
"""
found: List[Dict[str, str]] = []
missing: List[str] = []
errors: List[Dict[str, str]] = []
for site_name, site_data in raw_result.items():
status = site_data.get("status", "")
url = site_data.get("url", "")
if status == "found" and url:
found.append({"site": site_name, "url": url})
elif status == "not found":
missing.append(site_name)
else:
errors.append({"site": site_name, "error": status or "unknown"})
# Compute totals from the original site list if provided
total_sites = len(raw_result) if sites_checked is None else len(sites_checked)
return {
"schema_version": SCHEMA_VERSION,
"query": username,
"timestamp": datetime.now(timezone.utc).isoformat(),
"found": found,
"missing": missing,
"errors": errors,
"metadata": {
"total_sites_checked": total_sites,
"found_count": len(found),
"missing_count": len(missing),
"error_count": len(errors),
},
}
def run_sherlock(
username: str,
sites: Optional[List[str]] = None,
timeout: Optional[int] = None,
opt_in: bool = False
) -> Dict[str, Any]:
"""
Execute Sherlock wrapper with opt-in gate, caching, and normalization.
"""
require_opt_in(opt_in)
# Compute cache key
query_hash = compute_query_hash(username, sites)
# Check cache first — avoids dependency requirement on cache hit
cached = get_cached_result(query_hash)
if cached is not None:
return cached
# Only require sherlock on cache miss
if not check_sherlock_available():
raise RuntimeError(
"Sherlock Python package not installed. "
"Install with: pip install sherlock-project"
)
# Call sherlock
try:
import sherlock
from sherlock import sherlock as sherlock_main # type: ignore
if sites:
result = sherlock_main(username, site_list=sites, timeout=timeout or 10)
else:
result = sherlock_main(username, timeout=timeout or 10)
normalized = normalize_sherlock_output(result, username, sites)
save_to_cache(query_hash, normalized)
return normalized
except Exception as e:
raise RuntimeError(f"Sherlock execution failed: {e}") from e
def main() -> int:
parser = argparse.ArgumentParser(
description="Sherlock username OSINT wrapper — opt-in, cached, normalized JSON"
)
parser.add_argument(
"--query", "-q", required=True,
help="Username to search across sites"
)
parser.add_argument(
"--opt-in", action="store_true",
help="Explicit opt-in flag (alternatively set SHERLOCK_ENABLED=1)"
)
parser.add_argument(
"--sites", "-s", nargs="+",
help="Specific sites to check (default: all supported)"
)
parser.add_argument(
"--timeout", "-t", type=int, default=10,
help="Request timeout per site (default: 10)"
)
parser.add_argument(
"--json", action="store_true",
help="Output normalized JSON to stdout"
)
parser.add_argument(
"--no-cache",
action="store_true",
help="Bypass cached result (if any)"
)
args = parser.parse_args()
try:
result = run_sherlock(
username=args.query,
sites=args.sites,
timeout=args.timeout,
opt_in=args.opt_in
)
if args.json:
print(json.dumps(result, indent=2))
else:
print(f"Query: {result['query']}")
print(f"Found: {result['metadata']['found_count']} site(s)")
print(f"Missing: {result['metadata']['missing_count']} site(s)")
print(f"Errors: {result['metadata']['error_count']} site(s)")
for f in result['found']:
print(f" [{f['site']}] {f['url']}")
return 0
except RuntimeError as e:
print(f"ERROR: {e}", file=sys.stderr)
return 1
if __name__ == "__main__":
sys.exit(main())