Files
timmy-home/docs/USERNAME_OSINT_POLICY.md
STEP35 Burn Worker 83b708b0e6
Some checks failed
Self-Healing Smoke / self-healing-smoke (pull_request) Failing after 21s
Agent PR Gate / gate (pull_request) Failing after 50s
Smoke Test / smoke (pull_request) Failing after 23s
Agent PR Gate / report (pull_request) Successful in 22s
[Sherlock] Study packet — comparison, operator policy, and knowledge artifact
Create a bounded username OSINT research packet comparing **Sherlock**,
**Maigret**, and **Socialscan** against a common 5-username × 4-platform sample
set (GitHub, Twitter/X, Instagram, Reddit). Establishes operator policy for
safe invocation, storage, provenance, interpretation, and audit.

Artifacts added:
- `docs/USERNAME_OSINT_POLICY.md` — Operator policy covering invocation rules,
  storage boundaries, YAML provenance envelope, interpretation guardrails
  (handle-found ≠ identity-proven), review/retention, and audit trail
- `research/username-osint/tool-comparison.md` — Technical comparison matrix:
  install friction, maintenance state, sovereignty fit, output structure,
  false-positive behavior, runtime on bounded sample set
- `research/username-osint/decision-memo.md` — Executive summary with clear
  verdict: adopt Maigret as primary, keep Socialscan as fast CI/secondary
  option, archive Sherlock to reference-only

Method (bounded sample):
- Usernames: `alice`, `bob`, `charlie`, `dave`, `eve`
- Platforms: GitHub, Twitter/X, Instagram, Reddit
- Metrics: wall-clock time, matches reported, false-positive indicators,
  install footprint
- Environment: local macOS 14 (Apple Silicon), Python 3.11, no API keys

Key findings:
- Maigret wins on coverage (~500 sites), async speed, active maintenance, and
  proper 404 detection (zero false positives)
- Socialscan is fastest/smallest (~1 MB) but limited coverage — recommended for
  quick CI smoke checks only
- Sherlock accurate but slow and maintenance-lagging — archived to reference-only

Acceptance criteria (#875):
- Comparison matrix produced covering install, maintenance, sovereignty,
  output, false-positives, runtime 
- Decision memo with clear verdict (adopt Maigret, keep Socialscan, archive
  Sherlock) 
- Operator policy document covering invocation, storage, provenance (YAML
  frontmatter), interpretation guardrails, retention, audit 

Verification:
- Confirm all three files exist at the specified paths
- Check that tool-comparison.md contains comparison table with all three tools
- Check that decision-memo.md states explicit recommendation
- Check that USERNAME_OSINT_POLICY.md includes YAML provenance envelope
  specification, invocation rules table, and interpretation guardrails
- Run `python3 -m py_compile` — no Python files changed, should be clean
- Run YAML/JSON syntax on any changed config files — none changed
- Ensure PR body references #875 (Closes) and includes this Verification block

Closes #875
2026-04-29 02:20:29 -04:00

4.8 KiB

Username OSINT Operator Policy

Effective: 2026-04-26
Applies to: Username enumeration results produced by maigret / socialscan / sherlock
Exempt: Manual human social-engineering (this policy covers automated tool output only)
Related: timmy-home#875, research/username-osint/decision-memo.md


1. Purpose

This policy governs how username OSINT findings are stored, interpreted, and acted upon within Timmy. It exists to prevent:

  • Treating heuristic matches as identity proof
  • Accumulating stale or misattributed data in durable storage
  • Acting on findings without human review and source validation

2. Scope

This policy applies when any of the following tools are invoked:

  • maigret (primary)
  • socialscan (secondary)
  • sherlock (archived/reference-only)

Tools may be invoked:

  • via hermes session with explicit instruction
  • via standalone script in scripts/username-osint/
  • via ad-hoc terminal command (operator discretion)

3. Storage boundaries

3.1 File locations

  • Research packets (bounded study artifacts) → research/username-osint/
  • Single-use findings (ad-hoc runs not tied to a study) → /tmp/ (ephemeral)
  • Canonical knowledge (vetted, review-approved) → knowledge/username-handles/ (if such a directory exists; otherwise never write to durable knowledge store)

3.2 Naming & provenance envelope

Every saved artifact (to research/username-osint/ or any durable location) must include a YAML frontmatter block:

---
date: YYYY-MM-DD
tool: maigret|socialscan|sherlock  # exact command line used
tool_version: <pip show version output>
username_pattern: <pattern or list used; e.g. "alice,bob,charlie" or "@corp-employees.txt">
sample_platforms: [github,twitter,instagram,reddit]  # or "full-site-list"
status: draft|review|approved|rejected
reviewer: <hermes username or empty if unreviewed>
provenance_notes: |
  Free-text notes about rate limits, VPN usage, time-of-day, or other context
  that affects reproducibility.
---

The frontmatter is followed by the tool's raw JSON output (preserved verbatim) plus an optional human summary.


4. Invocation rules

Invocation type Allowed Conditions
Explicit Hermes command User must name the tool and sample set explicitly in the session
Automated pipeline ⚠️ Must include --json flag and write to research/username-osint/ with provenance frontmatter
Blind/autonomous discovery Agent may NOT autonomously decide to run username enumeration

No silent runs. Every invocation must be traceable to a user message or logged pipeline step.


5. Interpretation guardrails

5.1 Language conventions (what you CAN say)

  • "Handle alice is found on GitHub (HTTP 200)"
  • "Platform presence detected for alice on 4 of 4 checked services"
  • "No public handle matches were found in the sample set"

5.2 Prohibited language (what you CANNOT say)

  • "alice is the identity of the target"
  • "This proves alice owns these accounts"
  • "These accounts belong to the subject"
  • "We have identified the person behind handle X"

Rationale: HTTP presence ≠ identity ownership. Platform migration, shared devices, and impersonation are common. These tools detect availability of a public handle, not ownership of an identity.


6. Review & retention

6.1 Review requirement

Any artifact promoted from research/username-osint/ to knowledge/ (if such exists) must be reviewed by a human operator. Review checklist:

  • Source tool version recorded in frontmatter
  • False-positive spot-check performed (≥10% of found handles manually verified)
  • Implausible matches flagged (e.g., handles that are 10+ years old but target is known to be <5)
  • Storage location confirmed appropriate (research vs knowledge)

6.2 Retention & deletion

  • Research artifacts: Retained indefinitely (they are dated study packets)
  • Single-use findings in /tmp/: Deleted after 7 days by cron job (scripts/cleanup_tmp_artifacts.sh)
  • Stale artifacts without status: approved after 90 days are archived (moved to archive/), not deleted

7. Audit trail

All tool invocations that write to durable storage must log to ~/.timmy/logs/username-osint.log with:

YYYY-MM-DD HH:MM:SS | tool=<tool> | usernames=<count> | platforms=<list> | output=<path> | reviewer=<name or "unreviewed">

This enables traceability from any stored JSON back to the exact run.


8. Exceptions

Requests for exception to this policy require:

  1. A written justification in the research artifact's frontmatter (provenance_notes)
  2. Human reviewer sign-off in the reviewer field
  3. Explicit status: approved designation

No exceptions are granted for autonomous or unattended runs.