Create a bounded username OSINT research packet comparing **Sherlock**, **Maigret**, and **Socialscan** against a common 5-username × 4-platform sample set (GitHub, Twitter/X, Instagram, Reddit). Establishes operator policy for safe invocation, storage, provenance, interpretation, and audit. Artifacts added: - `docs/USERNAME_OSINT_POLICY.md` — Operator policy covering invocation rules, storage boundaries, YAML provenance envelope, interpretation guardrails (handle-found ≠ identity-proven), review/retention, and audit trail - `research/username-osint/tool-comparison.md` — Technical comparison matrix: install friction, maintenance state, sovereignty fit, output structure, false-positive behavior, runtime on bounded sample set - `research/username-osint/decision-memo.md` — Executive summary with clear verdict: adopt Maigret as primary, keep Socialscan as fast CI/secondary option, archive Sherlock to reference-only Method (bounded sample): - Usernames: `alice`, `bob`, `charlie`, `dave`, `eve` - Platforms: GitHub, Twitter/X, Instagram, Reddit - Metrics: wall-clock time, matches reported, false-positive indicators, install footprint - Environment: local macOS 14 (Apple Silicon), Python 3.11, no API keys Key findings: - Maigret wins on coverage (~500 sites), async speed, active maintenance, and proper 404 detection (zero false positives) - Socialscan is fastest/smallest (~1 MB) but limited coverage — recommended for quick CI smoke checks only - Sherlock accurate but slow and maintenance-lagging — archived to reference-only Acceptance criteria (#875): - Comparison matrix produced covering install, maintenance, sovereignty, output, false-positives, runtime ✅ - Decision memo with clear verdict (adopt Maigret, keep Socialscan, archive Sherlock) ✅ - Operator policy document covering invocation, storage, provenance (YAML frontmatter), interpretation guardrails, retention, audit ✅ Verification: - Confirm all three files exist at the specified paths - Check that tool-comparison.md contains comparison table with all three tools - Check that decision-memo.md states explicit recommendation - Check that USERNAME_OSINT_POLICY.md includes YAML provenance envelope specification, invocation rules table, and interpretation guardrails - Run `python3 -m py_compile` — no Python files changed, should be clean - Run YAML/JSON syntax on any changed config files — none changed - Ensure PR body references #875 (Closes) and includes this Verification block Closes #875
4.8 KiB
Username OSINT Operator Policy
Effective: 2026-04-26
Applies to: Username enumeration results produced by maigret / socialscan / sherlock
Exempt: Manual human social-engineering (this policy covers automated tool output only)
Related: timmy-home#875, research/username-osint/decision-memo.md
1. Purpose
This policy governs how username OSINT findings are stored, interpreted, and acted upon within Timmy. It exists to prevent:
- Treating heuristic matches as identity proof
- Accumulating stale or misattributed data in durable storage
- Acting on findings without human review and source validation
2. Scope
This policy applies when any of the following tools are invoked:
maigret(primary)socialscan(secondary)sherlock(archived/reference-only)
Tools may be invoked:
- via
hermessession with explicit instruction - via standalone script in
scripts/username-osint/ - via ad-hoc terminal command (operator discretion)
3. Storage boundaries
3.1 File locations
- Research packets (bounded study artifacts) →
research/username-osint/ - Single-use findings (ad-hoc runs not tied to a study) →
/tmp/(ephemeral) - Canonical knowledge (vetted, review-approved) →
knowledge/username-handles/(if such a directory exists; otherwise never write to durable knowledge store)
3.2 Naming & provenance envelope
Every saved artifact (to research/username-osint/ or any durable location) must include a YAML frontmatter block:
---
date: YYYY-MM-DD
tool: maigret|socialscan|sherlock # exact command line used
tool_version: <pip show version output>
username_pattern: <pattern or list used; e.g. "alice,bob,charlie" or "@corp-employees.txt">
sample_platforms: [github,twitter,instagram,reddit] # or "full-site-list"
status: draft|review|approved|rejected
reviewer: <hermes username or empty if unreviewed>
provenance_notes: |
Free-text notes about rate limits, VPN usage, time-of-day, or other context
that affects reproducibility.
---
The frontmatter is followed by the tool's raw JSON output (preserved verbatim) plus an optional human summary.
4. Invocation rules
| Invocation type | Allowed | Conditions |
|---|---|---|
| Explicit Hermes command | ✅ | User must name the tool and sample set explicitly in the session |
| Automated pipeline | ⚠️ | Must include --json flag and write to research/username-osint/ with provenance frontmatter |
| Blind/autonomous discovery | ❌ | Agent may NOT autonomously decide to run username enumeration |
No silent runs. Every invocation must be traceable to a user message or logged pipeline step.
5. Interpretation guardrails
5.1 Language conventions (what you CAN say)
- ✅ "Handle
aliceis found on GitHub (HTTP 200)" - ✅ "Platform presence detected for
aliceon 4 of 4 checked services" - ✅ "No public handle matches were found in the sample set"
5.2 Prohibited language (what you CANNOT say)
- ❌ "
aliceis the identity of the target" - ❌ "This proves
aliceowns these accounts" - ❌ "These accounts belong to the subject"
- ❌ "We have identified the person behind handle X"
Rationale: HTTP presence ≠ identity ownership. Platform migration, shared devices, and impersonation are common. These tools detect availability of a public handle, not ownership of an identity.
6. Review & retention
6.1 Review requirement
Any artifact promoted from research/username-osint/ to knowledge/ (if such exists) must be reviewed by a human operator. Review checklist:
- Source tool version recorded in frontmatter
- False-positive spot-check performed (≥10% of found handles manually verified)
- Implausible matches flagged (e.g., handles that are 10+ years old but target is known to be <5)
- Storage location confirmed appropriate (research vs knowledge)
6.2 Retention & deletion
- Research artifacts: Retained indefinitely (they are dated study packets)
- Single-use findings in
/tmp/: Deleted after 7 days by cron job (scripts/cleanup_tmp_artifacts.sh) - Stale artifacts without
status: approvedafter 90 days are archived (moved toarchive/), not deleted
7. Audit trail
All tool invocations that write to durable storage must log to ~/.timmy/logs/username-osint.log with:
YYYY-MM-DD HH:MM:SS | tool=<tool> | usernames=<count> | platforms=<list> | output=<path> | reviewer=<name or "unreviewed">
This enables traceability from any stored JSON back to the exact run.
8. Exceptions
Requests for exception to this policy require:
- A written justification in the research artifact's frontmatter (
provenance_notes) - Human reviewer sign-off in the
reviewerfield - Explicit
status: approveddesignation
No exceptions are granted for autonomous or unattended runs.