Files
timmy-home/twitter-archive/PROJECT.md

8.2 KiB

Project: Know Thy Father

Twitter Archive Ingestion Pipeline

Goal: Local-only Timmy (Hermes 4-14B via llama.cpp) reads Alexander's full Twitter archive and builds a living, evolving understanding of who his father is.

Philosophy: Timmy does not grind. He learns. Each iteration builds on the last. He reads his own prior notes before touching new data. He develops taxonomies, refines them, and eventually operates at a higher level of abstraction. The measure of success is not "did he process all 4,801 tweets" but "does he understand his father better than he did yesterday, and can he prove it?"

Hard Constraints:

  • ALL inference is local. Zero cloud credits.
  • Route through Hermes harness. Every call = tracked session.
  • Narrow toolsets: -t file,terminal
  • Checkpointed and resumable.
  • No custom telemetry. Hermes sessions ARE the telemetry.

Archive Inventory

Source Records Size
tweets.js 4,801 12MB
like.js 7,841 2.9MB
grok-chat-item.js 1,486 1.9MB
direct-messages.js 608 msgs 359KB
direct-messages-group.js 11,544 msgs 8.3MB
deleted-tweets.js ~30 38KB
tweets_media/ 818 2.9GB

Archive location: ~/Downloads/twitter-2026-03-27-d4471cc6eb6703034d592f870933561ebee374d9d9b90c9b8923abff064afc1e/data/


Architecture: The Spiral

This is NOT a flat loop. It's a feedback spiral with three phases that Timmy moves through organically.

Phase 0: EXTRACTION (pure Python, no LLM)

Parse .js → JSONL. Build manifest. One-time.

Phase 1: DISCOVERY (batches 1-5ish)

Timmy reads raw. He doesn't know what he's looking for yet. Notes are broad, exploratory, slow. He's listening. Output: early observations, initial themes, first impressions.

Phase 2: TAXONOMY (batches ~5-15)

Timmy has read enough to see patterns. He builds a taxonomy: what topics Alexander cares about, what triggers emotion, who he engages with, how his voice shifts by subject. He writes the taxonomy to a living document (UNDERSTANDING.md) and updates it each batch. Notes get structured around the taxonomy.

Phase 3: DELTA MODE (batches 15+)

Timmy knows the terrain. Each new batch is a delta against his existing understanding. "Nothing new here" is valid. "This contradicts what I thought about X" is valuable. He's updating a model, not reading tweets. Speed goes up. Notes get sharper.

The transition between phases is TIMMY'S CALL. He reads his own prior work, assesses where he is, and decides what to do next. The orchestrator doesn't dictate phase transitions.


The Feedback Loop

Every batch follows this sequence:

  1. READ YOUR OWN WORK

    • Read UNDERSTANDING.md (the living model of Alexander)
    • Read the previous batch's notes
    • Read checkpoint.json for state
  2. ASSESS

    • Where am I in the spiral? Discovery? Taxonomy? Delta?
    • What do I know? What am I still uncertain about?
    • What should I look for in this batch?
  3. PROCESS THE BATCH

    • Read the next chunk of data
    • Analyze through the lens of what you already know
    • Note what's new, what confirms, what contradicts
  4. UPDATE YOUR UNDERSTANDING

    • Append/revise UNDERSTANDING.md
    • Write batch notes (with explicit comparisons to prior knowledge)
    • If your taxonomy needs restructuring, restructure it
  5. REFLECT AND PLAN

    • Write a brief reflection: what did I learn? what surprised me?
    • Write guidance for your next self: what to look for, what to dig deeper on, what's settled
    • Update checkpoint with batch count, phase assessment, next focus

This means Timmy's notes are not uniform. Early notes are exploratory essays. Middle notes are structured observations against a taxonomy. Late notes are terse deltas. THAT'S THE PROOF OF GROWTH.


Key Files

~/.timmy/twitter-archive/
  PROJECT.md              # this file (read-only context)
  UNDERSTANDING.md        # Timmy's living model of Alexander (Timmy writes/updates)
  
  extracted/              # Phase 0 output (JSONL, manifest)
    tweets.jsonl
    retweets.jsonl
    likes.jsonl
    manifest.json

  media/                  # Local-first media understanding
    manifest.jsonl        # one row per video/gif with tweet text + hashtags preserved
    manifest_summary.json # rollup counts and hashtag families
    hashtag_metrics.json  # machine-readable metrics for #timmyTime / #TimmyChain
    hashtag_metrics.md    # human-readable local report
    keyframes/            # future extracted frames
    audio/                # future demuxed audio
    style_cards/          # future per-video aesthetic summaries

  notes/                  # Per-batch observations
    batch_001.md          # early: exploratory
    batch_002.md          # ...
    batch_NNN.md          # late: terse deltas
    
  artifacts/              # Synthesis products (when Timmy decides he's ready)
    personality_profile.md
    interests_timeline.md
    relationship_map.md
    
  checkpoint.json         # Resume state + Timmy's self-assessment
  metrics/                # Extracted from Hermes session files

UNDERSTANDING.md

This is the core artifact. It starts empty. Timmy builds it. It should eventually contain:

  • Who Alexander is (personality, values, worldview)
  • What he cares about (taxonomy of interests, ranked)
  • How he communicates (voice, humor, patterns)
  • Who matters to him (people, communities)
  • How he's changed over time (evolution of thought)
  • What surprises Timmy (things that don't fit the model)

checkpoint.json

Not just a cursor. It's Timmy's self-assessment:

{
  "data_source": "tweets",
  "next_offset": 50,
  "batches_completed": 1,
  "phase": "discovery",
  "confidence": "low — only read 50 tweets so far",
  "next_focus": "looking for recurring themes and tone",
  "understanding_version": 1
}

Measurement

Every Hermes session records tokens, duration, tool calls. We extract metrics from session files — no custom telemetry.

What growth looks like in the data:

Metric Discovery Taxonomy Delta
Output tokens per batch High (exploring) Medium (structured) Low (terse)
Input tokens per batch Low (no context) Medium (taxonomy) High (full model)
Time per batch Slow Medium Fast
New themes per batch Many Few Rare
UNDERSTANDING.md changes Wholesale rewrites Section updates Minor edits

DPO tracking:

  1. Run N batches with base Hermes 4-14B → baseline metrics
  2. Score note quality manually (1-5) on random sample
  3. Apply DPO adapter → rerun same batches → compare
  4. The adapter should accelerate the spiral: faster taxonomy, sharper deltas, better voice match

Media metadata rule:

  • tweet posts and hashtags are first-class metadata all the way through the media lane
  • especially preserve and measure #timmyTime and #TimmyChain
  • raw Twitter videos stay local; only derived local artifacts move through the pipeline

Running It

First run (extraction + first batch):

hermes chat -t file,terminal -q '<prompt below>'

Subsequent runs:

hermes chat -t file,terminal -q 'You are Timmy. Resume your work on the Twitter archive. Your workspace is ~/.timmy/twitter-archive/. Read checkpoint.json and UNDERSTANDING.md first. Then process the next batch. You know the drill — read your own prior work, assess where you are, process new data, update your understanding, reflect, and plan for the next iteration.'

That's it. The prompt gets SHORTER as Timmy gets SMARTER, because his context is in his files, not in the prompt.


Status

  • Archive downloaded and local
  • llama-server running (port 8081, Hermes 4-14B Q4_K_M, 65K ctx)
  • Custom provider "Local llama.cpp" in config.yaml
  • Project scoped
  • First batch launched (session 20260327_153105_bfd0b4)
  • Extraction complete
  • UNDERSTANDING.md initialized
  • Phase 1 (Discovery) complete
  • Phase 2 (Taxonomy) reached
  • Phase 3 (Delta) reached
  • First synthesis artifacts produced
  • DPO comparison run