Build private Twitter archive learning pipeline for Timmy #3

Closed
opened 2026-03-27 22:09:04 +00:00 by codex-agent · 2 comments
Member

Summary

Build a private local-only Twitter archive learning pipeline for Timmy on top of the existing Hermes + know_thy_father flow.

The goal is not just to summarize tweets. The goal is to make Timmy systematically better at:

  • reading information about Alexander
  • extracting grounded knowledge
  • producing actionable insights
  • turning archive work into DPO/adaptation signal
  • improving local models over time without checking raw data into shared repos

Decisions Locked

  • v1 ingests tweets + retweets only
  • primary output is training signal
  • archive-derived artifacts stay local under ~/.timmy/twitter-archive/
  • memory/training generation is mostly automatic
  • model training/promotion is mostly automatic, but only after offline eval gates pass

Deliverables

  • deterministic archive extractor
  • two-pass batch reader (draft + critique/rewrite)
  • structured knowledge candidate schema
  • consolidated profile.json
  • weekly actionable insight artifact
  • DPO pair builder from archive sessions
  • eval gates for auto-train and auto-promotion
  • docs/spec for repo boundaries and privacy rules

Acceptance Criteria

  • raw archive never lands in tracked repo content
  • each batch produces notes, structured knowledge, and training examples
  • durable knowledge is evidence-linked
  • DPO pairs are generated automatically from archive work
  • candidate models only auto-promote when archive eval improves and safety specs do not regress
  • interrupted runs resume from checkpoint without duplicating work

Implementation Boundary

  • timmy-config: orchestration/tasks/prompts/scheduling
  • timmy-home: scripts/schemas/eval rubrics/spec
  • ~/.timmy/twitter-archive/: private runtime artifacts only

Review Focus

  • privacy boundary
  • evidence requirements
  • whether the auto-promotion thresholds are strict enough
  • whether the two-pass batch design is the right way to create training signal

Assumptions

  • ~/.timmy/twitter-archive/ remains the canonical private workspace because it already exists and already contains extracted tweet artifacts
  • local-only means the private archive workspace lives under ~/.timmy, but its derived artifacts remain untracked and unpushed
  • v1 does not ingest likes, DMs, deleted tweets, group DMs, or Grok chat
  • v1 does not depend on a new external training harness; it builds on Hermes session capture and local offline eval
## Summary Build a private local-only Twitter archive learning pipeline for Timmy on top of the existing Hermes + `know_thy_father` flow. The goal is not just to summarize tweets. The goal is to make Timmy systematically better at: - reading information about Alexander - extracting grounded knowledge - producing actionable insights - turning archive work into DPO/adaptation signal - improving local models over time without checking raw data into shared repos ## Decisions Locked - v1 ingests tweets + retweets only - primary output is training signal - archive-derived artifacts stay local under `~/.timmy/twitter-archive/` - memory/training generation is mostly automatic - model training/promotion is mostly automatic, but only after offline eval gates pass ## Deliverables - deterministic archive extractor - two-pass batch reader (`draft` + `critique/rewrite`) - structured knowledge candidate schema - consolidated `profile.json` - weekly actionable insight artifact - DPO pair builder from archive sessions - eval gates for auto-train and auto-promotion - docs/spec for repo boundaries and privacy rules ## Acceptance Criteria - raw archive never lands in tracked repo content - each batch produces notes, structured knowledge, and training examples - durable knowledge is evidence-linked - DPO pairs are generated automatically from archive work - candidate models only auto-promote when archive eval improves and safety specs do not regress - interrupted runs resume from checkpoint without duplicating work ## Implementation Boundary - `timmy-config`: orchestration/tasks/prompts/scheduling - `timmy-home`: scripts/schemas/eval rubrics/spec - `~/.timmy/twitter-archive/`: private runtime artifacts only ## Review Focus - privacy boundary - evidence requirements - whether the auto-promotion thresholds are strict enough - whether the two-pass batch design is the right way to create training signal ## Assumptions - `~/.timmy/twitter-archive/` remains the canonical private workspace because it already exists and already contains extracted tweet artifacts - local-only means the private archive workspace lives under `~/.timmy`, but its derived artifacts remain untracked and unpushed - v1 does not ingest likes, DMs, deleted tweets, group DMs, or Grok chat - v1 does not depend on a new external training harness; it builds on Hermes session capture and local offline eval
Member

Both PRs merged:

  • timmy-home PR #4: deterministic pipeline scripts, schemas, eval contracts, privacy boundary
  • timmy-config PR #29: two-pass orchestration, periodic tick, eval-gated training/promotion

All five review criteria pass: privacy boundary clean, repo boundaries respected, batch artifacts emit DPO pairs and profile data, training/promotion gated behind explicit eval checks, periodic tick is safe with per-step gating.

Both PRs merged: - timmy-home PR #4: deterministic pipeline scripts, schemas, eval contracts, privacy boundary - timmy-config PR #29: two-pass orchestration, periodic tick, eval-gated training/promotion All five review criteria pass: privacy boundary clean, repo boundaries respected, batch artifacts emit DPO pairs and profile data, training/promotion gated behind explicit eval checks, periodic tick is safe with per-step gating.
Member

Both PRs merged:

  • timmy-home PR #4: deterministic pipeline scripts, schemas, eval contracts, privacy boundary
  • timmy-config PR #29: two-pass orchestration, periodic tick, eval-gated training/promotion

All five review criteria pass: privacy boundary clean, repo boundaries respected, batch artifacts emit DPO pairs and profile data, training/promotion gated behind explicit eval checks, periodic tick is safe with per-step gating.

Both PRs merged: - timmy-home PR #4: deterministic pipeline scripts, schemas, eval contracts, privacy boundary - timmy-config PR #29: two-pass orchestration, periodic tick, eval-gated training/promotion All five review criteria pass: privacy boundary clean, repo boundaries respected, batch artifacts emit DPO pairs and profile data, training/promotion gated behind explicit eval checks, periodic tick is safe with per-step gating.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#3