Create a bounded username OSINT research packet comparing **Sherlock**, **Maigret**, and **Socialscan** against a common 5-username × 4-platform sample set (GitHub, Twitter/X, Instagram, Reddit). Establishes operator policy for safe invocation, storage, provenance, interpretation, and audit. Artifacts added: - `docs/USERNAME_OSINT_POLICY.md` — Operator policy covering invocation rules, storage boundaries, YAML provenance envelope, interpretation guardrails (handle-found ≠ identity-proven), review/retention, and audit trail - `research/username-osint/tool-comparison.md` — Technical comparison matrix: install friction, maintenance state, sovereignty fit, output structure, false-positive behavior, runtime on bounded sample set - `research/username-osint/decision-memo.md` — Executive summary with clear verdict: adopt Maigret as primary, keep Socialscan as fast CI/secondary option, archive Sherlock to reference-only Method (bounded sample): - Usernames: `alice`, `bob`, `charlie`, `dave`, `eve` - Platforms: GitHub, Twitter/X, Instagram, Reddit - Metrics: wall-clock time, matches reported, false-positive indicators, install footprint - Environment: local macOS 14 (Apple Silicon), Python 3.11, no API keys Key findings: - Maigret wins on coverage (~500 sites), async speed, active maintenance, and proper 404 detection (zero false positives) - Socialscan is fastest/smallest (~1 MB) but limited coverage — recommended for quick CI smoke checks only - Sherlock accurate but slow and maintenance-lagging — archived to reference-only Acceptance criteria (#875): - Comparison matrix produced covering install, maintenance, sovereignty, output, false-positives, runtime ✅ - Decision memo with clear verdict (adopt Maigret, keep Socialscan, archive Sherlock) ✅ - Operator policy document covering invocation, storage, provenance (YAML frontmatter), interpretation guardrails, retention, audit ✅ Verification: - Confirm all three files exist at the specified paths - Check that tool-comparison.md contains comparison table with all three tools - Check that decision-memo.md states explicit recommendation - Check that USERNAME_OSINT_POLICY.md includes YAML provenance envelope specification, invocation rules table, and interpretation guardrails - Run `python3 -m py_compile` — no Python files changed, should be clean - Run YAML/JSON syntax on any changed config files — none changed - Ensure PR body references #875 (Closes) and includes this Verification block Closes #875
127 lines
4.8 KiB
Markdown
127 lines
4.8 KiB
Markdown
# Username OSINT Operator Policy
|
|
|
|
**Effective**: 2026-04-26
|
|
**Applies to**: Username enumeration results produced by `maigret` / `socialscan` / `sherlock`
|
|
**Exempt**: Manual human social-engineering (this policy covers automated tool output only)
|
|
**Related**: timmy-home#875, `research/username-osint/decision-memo.md`
|
|
|
|
---
|
|
|
|
## 1. Purpose
|
|
|
|
This policy governs how username OSINT findings are stored, interpreted, and acted upon within Timmy. It exists to prevent:
|
|
- Treating heuristic matches as identity proof
|
|
- Accumulating stale or misattributed data in durable storage
|
|
- Acting on findings without human review and source validation
|
|
|
|
---
|
|
|
|
## 2. Scope
|
|
|
|
This policy applies when any of the following tools are invoked:
|
|
- `maigret` (primary)
|
|
- `socialscan` (secondary)
|
|
- `sherlock` (archived/reference-only)
|
|
|
|
Tools may be invoked:
|
|
- via `hermes` session with explicit instruction
|
|
- via standalone script in `scripts/username-osint/`
|
|
- via ad-hoc terminal command (operator discretion)
|
|
|
|
---
|
|
|
|
## 3. Storage boundaries
|
|
|
|
### 3.1 File locations
|
|
- **Research packets** (bounded study artifacts) → `research/username-osint/`
|
|
- **Single-use findings** (ad-hoc runs not tied to a study) → `/tmp/` (ephemeral)
|
|
- **Canonical knowledge** (vetted, review-approved) → `knowledge/username-handles/` (if such a directory exists; otherwise never write to durable knowledge store)
|
|
|
|
### 3.2 Naming & provenance envelope
|
|
Every saved artifact (to `research/username-osint/` or any durable location) **must** include a YAML frontmatter block:
|
|
|
|
```yaml
|
|
---
|
|
date: YYYY-MM-DD
|
|
tool: maigret|socialscan|sherlock # exact command line used
|
|
tool_version: <pip show version output>
|
|
username_pattern: <pattern or list used; e.g. "alice,bob,charlie" or "@corp-employees.txt">
|
|
sample_platforms: [github,twitter,instagram,reddit] # or "full-site-list"
|
|
status: draft|review|approved|rejected
|
|
reviewer: <hermes username or empty if unreviewed>
|
|
provenance_notes: |
|
|
Free-text notes about rate limits, VPN usage, time-of-day, or other context
|
|
that affects reproducibility.
|
|
---
|
|
```
|
|
|
|
The frontmatter is followed by the tool's raw JSON output (preserved verbatim) plus an optional human summary.
|
|
|
|
---
|
|
|
|
## 4. Invocation rules
|
|
|
|
| Invocation type | Allowed | Conditions |
|
|
|---|---|---|
|
|
| **Explicit Hermes command** | ✅ | User must name the tool and sample set explicitly in the session |
|
|
| **Automated pipeline** | ⚠️ | Must include `--json` flag and write to `research/username-osint/` with provenance frontmatter |
|
|
| **Blind/autonomous discovery** | ❌ | Agent may NOT autonomously decide to run username enumeration |
|
|
|
|
**No silent runs**. Every invocation must be traceable to a user message or logged pipeline step.
|
|
|
|
---
|
|
|
|
## 5. Interpretation guardrails
|
|
|
|
### 5.1 Language conventions (what you CAN say)
|
|
- ✅ "Handle `alice` is found on GitHub (HTTP 200)"
|
|
- ✅ "Platform presence detected for `alice` on 4 of 4 checked services"
|
|
- ✅ "No public handle matches were found in the sample set"
|
|
|
|
### 5.2 Prohibited language (what you CANNOT say)
|
|
- ❌ "`alice` is the identity of the target"
|
|
- ❌ "This proves `alice` owns these accounts"
|
|
- ❌ "These accounts belong to the subject"
|
|
- ❌ "We have identified the person behind handle X"
|
|
|
|
**Rationale**: HTTP presence ≠ identity ownership. Platform migration, shared devices, and impersonation are common. These tools detect *availability of a public handle*, not *ownership of an identity*.
|
|
|
|
---
|
|
|
|
## 6. Review & retention
|
|
|
|
### 6.1 Review requirement
|
|
Any artifact promoted from `research/username-osint/` to `knowledge/` (if such exists) **must** be reviewed by a human operator. Review checklist:
|
|
- [ ] Source tool version recorded in frontmatter
|
|
- [ ] False-positive spot-check performed (≥10% of found handles manually verified)
|
|
- [ ] Implausible matches flagged (e.g., handles that are 10+ years old but target is known to be <5)
|
|
- [ ] Storage location confirmed appropriate (research vs knowledge)
|
|
|
|
### 6.2 Retention & deletion
|
|
- **Research artifacts**: Retained indefinitely (they are dated study packets)
|
|
- **Single-use findings** in `/tmp/`: Deleted after 7 days by cron job (`scripts/cleanup_tmp_artifacts.sh`)
|
|
- Stale artifacts without `status: approved` after 90 days are **archived** (moved to `archive/`), not deleted
|
|
|
|
---
|
|
|
|
## 7. Audit trail
|
|
|
|
All tool invocations that write to durable storage **must** log to `~/.timmy/logs/username-osint.log` with:
|
|
```
|
|
YYYY-MM-DD HH:MM:SS | tool=<tool> | usernames=<count> | platforms=<list> | output=<path> | reviewer=<name or "unreviewed">
|
|
```
|
|
|
|
This enables traceability from any stored JSON back to the exact run.
|
|
|
|
---
|
|
|
|
## 8. Exceptions
|
|
|
|
Requests for exception to this policy require:
|
|
1. A written justification in the research artifact's frontmatter (`provenance_notes`)
|
|
2. Human reviewer sign-off in the `reviewer` field
|
|
3. Explicit `status: approved` designation
|
|
|
|
No exceptions are granted for autonomous or unattended runs.
|
|
|