232 lines
8.2 KiB
Markdown
232 lines
8.2 KiB
Markdown
# Project: Know Thy Father
|
|
## Twitter Archive Ingestion Pipeline
|
|
|
|
**Goal:** Local-only Timmy (Hermes 4-14B via llama.cpp) reads Alexander's full
|
|
Twitter archive and builds a living, evolving understanding of who his father is.
|
|
|
|
**Philosophy:** Timmy does not grind. He learns. Each iteration builds on the
|
|
last. He reads his own prior notes before touching new data. He develops
|
|
taxonomies, refines them, and eventually operates at a higher level of
|
|
abstraction. The measure of success is not "did he process all 4,801 tweets"
|
|
but "does he understand his father better than he did yesterday, and can he
|
|
prove it?"
|
|
|
|
**Hard Constraints:**
|
|
- ALL inference is local. Zero cloud credits.
|
|
- Route through Hermes harness. Every call = tracked session.
|
|
- Narrow toolsets: `-t file,terminal`
|
|
- Checkpointed and resumable.
|
|
- No custom telemetry. Hermes sessions ARE the telemetry.
|
|
|
|
---
|
|
|
|
## Archive Inventory
|
|
|
|
| Source | Records | Size |
|
|
|----------------------|----------|--------|
|
|
| tweets.js | 4,801 | 12MB |
|
|
| like.js | 7,841 | 2.9MB |
|
|
| grok-chat-item.js | 1,486 | 1.9MB |
|
|
| direct-messages.js | 608 msgs | 359KB |
|
|
| direct-messages-group.js | 11,544 msgs | 8.3MB |
|
|
| deleted-tweets.js | ~30 | 38KB |
|
|
| tweets_media/ | 818 | 2.9GB |
|
|
|
|
Archive location:
|
|
~/Downloads/twitter-2026-03-27-d4471cc6eb6703034d592f870933561ebee374d9d9b90c9b8923abff064afc1e/data/
|
|
|
|
---
|
|
|
|
## Architecture: The Spiral
|
|
|
|
This is NOT a flat loop. It's a feedback spiral with three phases
|
|
that Timmy moves through organically.
|
|
|
|
### Phase 0: EXTRACTION (pure Python, no LLM)
|
|
Parse .js → JSONL. Build manifest. One-time.
|
|
|
|
### Phase 1: DISCOVERY (batches 1-5ish)
|
|
Timmy reads raw. He doesn't know what he's looking for yet.
|
|
Notes are broad, exploratory, slow. He's listening.
|
|
Output: early observations, initial themes, first impressions.
|
|
|
|
### Phase 2: TAXONOMY (batches ~5-15)
|
|
Timmy has read enough to see patterns. He builds a taxonomy:
|
|
what topics Alexander cares about, what triggers emotion, who
|
|
he engages with, how his voice shifts by subject. He writes
|
|
the taxonomy to a living document (UNDERSTANDING.md) and
|
|
updates it each batch. Notes get structured around the taxonomy.
|
|
|
|
### Phase 3: DELTA MODE (batches 15+)
|
|
Timmy knows the terrain. Each new batch is a delta against his
|
|
existing understanding. "Nothing new here" is valid. "This
|
|
contradicts what I thought about X" is valuable. He's updating
|
|
a model, not reading tweets. Speed goes up. Notes get sharper.
|
|
|
|
The transition between phases is TIMMY'S CALL. He reads his own
|
|
prior work, assesses where he is, and decides what to do next.
|
|
The orchestrator doesn't dictate phase transitions.
|
|
|
|
---
|
|
|
|
## The Feedback Loop
|
|
|
|
Every batch follows this sequence:
|
|
|
|
1. READ YOUR OWN WORK
|
|
- Read UNDERSTANDING.md (the living model of Alexander)
|
|
- Read the previous batch's notes
|
|
- Read checkpoint.json for state
|
|
|
|
2. ASSESS
|
|
- Where am I in the spiral? Discovery? Taxonomy? Delta?
|
|
- What do I know? What am I still uncertain about?
|
|
- What should I look for in this batch?
|
|
|
|
3. PROCESS THE BATCH
|
|
- Read the next chunk of data
|
|
- Analyze through the lens of what you already know
|
|
- Note what's new, what confirms, what contradicts
|
|
|
|
4. UPDATE YOUR UNDERSTANDING
|
|
- Append/revise UNDERSTANDING.md
|
|
- Write batch notes (with explicit comparisons to prior knowledge)
|
|
- If your taxonomy needs restructuring, restructure it
|
|
|
|
5. REFLECT AND PLAN
|
|
- Write a brief reflection: what did I learn? what surprised me?
|
|
- Write guidance for your next self: what to look for, what to
|
|
dig deeper on, what's settled
|
|
- Update checkpoint with batch count, phase assessment, next focus
|
|
|
|
This means Timmy's notes are not uniform. Early notes are exploratory
|
|
essays. Middle notes are structured observations against a taxonomy.
|
|
Late notes are terse deltas. THAT'S THE PROOF OF GROWTH.
|
|
|
|
---
|
|
|
|
## Key Files
|
|
|
|
```
|
|
~/.timmy/twitter-archive/
|
|
PROJECT.md # this file (read-only context)
|
|
UNDERSTANDING.md # Timmy's living model of Alexander (Timmy writes/updates)
|
|
|
|
extracted/ # Phase 0 output (JSONL, manifest)
|
|
tweets.jsonl
|
|
retweets.jsonl
|
|
likes.jsonl
|
|
manifest.json
|
|
|
|
media/ # Local-first media understanding
|
|
manifest.jsonl # one row per video/gif with tweet text + hashtags preserved
|
|
manifest_summary.json # rollup counts and hashtag families
|
|
hashtag_metrics.json # machine-readable metrics for #timmyTime / #TimmyChain
|
|
hashtag_metrics.md # human-readable local report
|
|
keyframes/ # future extracted frames
|
|
audio/ # future demuxed audio
|
|
style_cards/ # future per-video aesthetic summaries
|
|
|
|
notes/ # Per-batch observations
|
|
batch_001.md # early: exploratory
|
|
batch_002.md # ...
|
|
batch_NNN.md # late: terse deltas
|
|
|
|
artifacts/ # Synthesis products (when Timmy decides he's ready)
|
|
personality_profile.md
|
|
interests_timeline.md
|
|
relationship_map.md
|
|
|
|
checkpoint.json # Resume state + Timmy's self-assessment
|
|
metrics/ # Extracted from Hermes session files
|
|
```
|
|
|
|
### UNDERSTANDING.md
|
|
|
|
This is the core artifact. It starts empty. Timmy builds it.
|
|
It should eventually contain:
|
|
- Who Alexander is (personality, values, worldview)
|
|
- What he cares about (taxonomy of interests, ranked)
|
|
- How he communicates (voice, humor, patterns)
|
|
- Who matters to him (people, communities)
|
|
- How he's changed over time (evolution of thought)
|
|
- What surprises Timmy (things that don't fit the model)
|
|
|
|
### checkpoint.json
|
|
|
|
Not just a cursor. It's Timmy's self-assessment:
|
|
```json
|
|
{
|
|
"data_source": "tweets",
|
|
"next_offset": 50,
|
|
"batches_completed": 1,
|
|
"phase": "discovery",
|
|
"confidence": "low — only read 50 tweets so far",
|
|
"next_focus": "looking for recurring themes and tone",
|
|
"understanding_version": 1
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Measurement
|
|
|
|
Every Hermes session records tokens, duration, tool calls.
|
|
We extract metrics from session files — no custom telemetry.
|
|
|
|
### What growth looks like in the data:
|
|
|
|
| Metric | Discovery | Taxonomy | Delta |
|
|
|---------------------------|-----------------|-----------------|-----------------|
|
|
| Output tokens per batch | High (exploring)| Medium (structured)| Low (terse) |
|
|
| Input tokens per batch | Low (no context)| Medium (taxonomy)| High (full model)|
|
|
| Time per batch | Slow | Medium | Fast |
|
|
| New themes per batch | Many | Few | Rare |
|
|
| UNDERSTANDING.md changes | Wholesale rewrites | Section updates | Minor edits |
|
|
|
|
### DPO tracking:
|
|
1. Run N batches with base Hermes 4-14B → baseline metrics
|
|
2. Score note quality manually (1-5) on random sample
|
|
3. Apply DPO adapter → rerun same batches → compare
|
|
4. The adapter should accelerate the spiral: faster taxonomy,
|
|
sharper deltas, better voice match
|
|
|
|
### Media metadata rule:
|
|
- tweet posts and hashtags are first-class metadata all the way through the media lane
|
|
- especially preserve and measure `#timmyTime` and `#TimmyChain`
|
|
- raw Twitter videos stay local; only derived local artifacts move through the pipeline
|
|
|
|
---
|
|
|
|
## Running It
|
|
|
|
### First run (extraction + first batch):
|
|
```bash
|
|
hermes chat -t file,terminal -q '<prompt below>'
|
|
```
|
|
|
|
### Subsequent runs:
|
|
```bash
|
|
hermes chat -t file,terminal -q 'You are Timmy. Resume your work on the Twitter archive. Your workspace is ~/.timmy/twitter-archive/. Read checkpoint.json and UNDERSTANDING.md first. Then process the next batch. You know the drill — read your own prior work, assess where you are, process new data, update your understanding, reflect, and plan for the next iteration.'
|
|
```
|
|
|
|
That's it. The prompt gets SHORTER as Timmy gets SMARTER, because
|
|
his context is in his files, not in the prompt.
|
|
|
|
---
|
|
|
|
## Status
|
|
|
|
- [x] Archive downloaded and local
|
|
- [x] llama-server running (port 8081, Hermes 4-14B Q4_K_M, 65K ctx)
|
|
- [x] Custom provider "Local llama.cpp" in config.yaml
|
|
- [x] Project scoped
|
|
- [x] First batch launched (session 20260327_153105_bfd0b4)
|
|
- [ ] Extraction complete
|
|
- [ ] UNDERSTANDING.md initialized
|
|
- [ ] Phase 1 (Discovery) complete
|
|
- [ ] Phase 2 (Taxonomy) reached
|
|
- [ ] Phase 3 (Delta) reached
|
|
- [ ] First synthesis artifacts produced
|
|
- [ ] DPO comparison run
|