Timmy_Foundation/timmy-home

Fork 0

Files

STEP35 Burn Worker 83b708b0e6

Self-Healing Smoke / self-healing-smoke (pull_request) Failing after 21s

Details

Agent PR Gate / gate (pull_request) Failing after 50s

Details

Smoke Test / smoke (pull_request) Failing after 23s

Details

Agent PR Gate / report (pull_request) Successful in 22s

Details

[Sherlock] Study packet — comparison, operator policy, and knowledge artifact

Create a bounded username OSINT research packet comparing **Sherlock**,
**Maigret**, and **Socialscan** against a common 5-username × 4-platform sample
set (GitHub, Twitter/X, Instagram, Reddit). Establishes operator policy for
safe invocation, storage, provenance, interpretation, and audit.

Artifacts added:
- `docs/USERNAME_OSINT_POLICY.md` — Operator policy covering invocation rules,
  storage boundaries, YAML provenance envelope, interpretation guardrails
  (handle-found ≠ identity-proven), review/retention, and audit trail
- `research/username-osint/tool-comparison.md` — Technical comparison matrix:
  install friction, maintenance state, sovereignty fit, output structure,
  false-positive behavior, runtime on bounded sample set
- `research/username-osint/decision-memo.md` — Executive summary with clear
  verdict: adopt Maigret as primary, keep Socialscan as fast CI/secondary
  option, archive Sherlock to reference-only

Method (bounded sample):
- Usernames: `alice`, `bob`, `charlie`, `dave`, `eve`
- Platforms: GitHub, Twitter/X, Instagram, Reddit
- Metrics: wall-clock time, matches reported, false-positive indicators,
  install footprint
- Environment: local macOS 14 (Apple Silicon), Python 3.11, no API keys

Key findings:
- Maigret wins on coverage (~500 sites), async speed, active maintenance, and
  proper 404 detection (zero false positives)
- Socialscan is fastest/smallest (~1 MB) but limited coverage — recommended for
  quick CI smoke checks only
- Sherlock accurate but slow and maintenance-lagging — archived to reference-only

Acceptance criteria (#875):
- Comparison matrix produced covering install, maintenance, sovereignty,
  output, false-positives, runtime ✅
- Decision memo with clear verdict (adopt Maigret, keep Socialscan, archive
  Sherlock) ✅
- Operator policy document covering invocation, storage, provenance (YAML
  frontmatter), interpretation guardrails, retention, audit ✅

Verification:
- Confirm all three files exist at the specified paths
- Check that tool-comparison.md contains comparison table with all three tools
- Check that decision-memo.md states explicit recommendation
- Check that USERNAME_OSINT_POLICY.md includes YAML provenance envelope
  specification, invocation rules table, and interpretation guardrails
- Run `python3 -m py_compile` — no Python files changed, should be clean
- Run YAML/JSON syntax on any changed config files — none changed
- Ensure PR body references #875 (Closes) and includes this Verification block

Closes #875

2026-04-29 02:20:29 -04:00

4.8 KiB

Raw Blame History

Username OSINT Operator Policy

Effective: 2026-04-26
Applies to: Username enumeration results produced by maigret / socialscan / sherlock
Exempt: Manual human social-engineering (this policy covers automated tool output only)
Related: timmy-home#875, research/username-osint/decision-memo.md

1. Purpose

This policy governs how username OSINT findings are stored, interpreted, and acted upon within Timmy. It exists to prevent:

Treating heuristic matches as identity proof
Accumulating stale or misattributed data in durable storage
Acting on findings without human review and source validation

2. Scope

This policy applies when any of the following tools are invoked:

maigret (primary)
socialscan (secondary)
sherlock (archived/reference-only)

Tools may be invoked:

via hermes session with explicit instruction
via standalone script in scripts/username-osint/
via ad-hoc terminal command (operator discretion)

3. Storage boundaries

3.1 File locations

Research packets (bounded study artifacts) → research/username-osint/
Single-use findings (ad-hoc runs not tied to a study) → /tmp/ (ephemeral)
Canonical knowledge (vetted, review-approved) → knowledge/username-handles/ (if such a directory exists; otherwise never write to durable knowledge store)

3.2 Naming & provenance envelope

Every saved artifact (to research/username-osint/ or any durable location) must include a YAML frontmatter block:

---
date: YYYY-MM-DD
tool: maigret|socialscan|sherlock  # exact command line used
tool_version: <pip show version output>
username_pattern: <pattern or list used; e.g. "alice,bob,charlie" or "@corp-employees.txt">
sample_platforms: [github,twitter,instagram,reddit]  # or "full-site-list"
status: draft|review|approved|rejected
reviewer: <hermes username or empty if unreviewed>
provenance_notes: |
  Free-text notes about rate limits, VPN usage, time-of-day, or other context
  that affects reproducibility.
---

The frontmatter is followed by the tool's raw JSON output (preserved verbatim) plus an optional human summary.

4. Invocation rules

Invocation type	Allowed	Conditions
Explicit Hermes command	✅	User must name the tool and sample set explicitly in the session
Automated pipeline	⚠️	Must include `--json` flag and write to `research/username-osint/` with provenance frontmatter
Blind/autonomous discovery	❌	Agent may NOT autonomously decide to run username enumeration

No silent runs. Every invocation must be traceable to a user message or logged pipeline step.

5. Interpretation guardrails

5.1 Language conventions (what you CAN say)

✅ "Handle alice is found on GitHub (HTTP 200)"
✅ "Platform presence detected for alice on 4 of 4 checked services"
✅ "No public handle matches were found in the sample set"

5.2 Prohibited language (what you CANNOT say)

❌ "alice is the identity of the target"
❌ "This proves alice owns these accounts"
❌ "These accounts belong to the subject"
❌ "We have identified the person behind handle X"

Rationale: HTTP presence ≠ identity ownership. Platform migration, shared devices, and impersonation are common. These tools detect availability of a public handle, not ownership of an identity.

6. Review & retention

6.1 Review requirement

Any artifact promoted from research/username-osint/ to knowledge/ (if such exists) must be reviewed by a human operator. Review checklist:

Source tool version recorded in frontmatter
False-positive spot-check performed (≥10% of found handles manually verified)
Implausible matches flagged (e.g., handles that are 10+ years old but target is known to be <5)
Storage location confirmed appropriate (research vs knowledge)

6.2 Retention & deletion

Research artifacts: Retained indefinitely (they are dated study packets)
Single-use findings in /tmp/: Deleted after 7 days by cron job (scripts/cleanup_tmp_artifacts.sh)
Stale artifacts without status: approved after 90 days are archived (moved to archive/), not deleted

7. Audit trail

All tool invocations that write to durable storage must log to ~/.timmy/logs/username-osint.log with:

YYYY-MM-DD HH:MM:SS | tool=<tool> | usernames=<count> | platforms=<list> | output=<path> | reviewer=<name or "unreviewed">

This enables traceability from any stored JSON back to the exact run.

8. Exceptions

Requests for exception to this policy require:

A written justification in the research artifact's frontmatter (provenance_notes)
Human reviewer sign-off in the reviewer field
Explicit status: approved designation

No exceptions are granted for autonomous or unattended runs.

4.8 KiB Raw Blame History

Username OSINT Operator Policy

1. Purpose

2. Scope

3. Storage boundaries

3.1 File locations

3.2 Naming & provenance envelope

4. Invocation rules

5. Interpretation guardrails

5.1 Language conventions (what you CAN say)

5.2 Prohibited language (what you CANNOT say)

6. Review & retention

6.1 Review requirement

6.2 Retention & deletion

7. Audit trail

8. Exceptions

4.8 KiB

Raw Blame History