Files
timmy-home/specs/twitter-archive-learning-pipeline.md
2026-03-27 18:09:28 -04:00

2.3 KiB

Twitter Archive Learning Pipeline

This repo owns the tracked code, schemas, prompts, and eval contracts for Timmy's private Twitter archive learning loop.

Privacy Boundary

  • Raw archive files stay outside git.
  • Derived runtime artifacts live under ~/.timmy/twitter-archive/.
  • twitter-archive/ is ignored by timmy-home so private notes and training artifacts do not get pushed by accident.

Tracked here:

  • deterministic extraction and consolidation scripts
  • output schemas
  • eval gate contract
  • prompt/orchestration code in timmy-config

Not tracked here:

  • raw tweets
  • extracted tweet text
  • batch notes
  • private profile artifacts
  • local-only DPO pairs
  • local eval outputs

Runtime Layout

The runtime workspace is:

~/.timmy/twitter-archive/
  extracted/
  notes/
  knowledge/
  insights/
  training/
  checkpoint.json
  metrics/progress.json
  source_config.json
  pipeline_config.json

Source Config

Optional local file:

{
  "source_path": "~/Downloads/twitter-.../data"
}

Environment override:

TIMMY_TWITTER_ARCHIVE_SOURCE=~/Downloads/twitter-.../data

Knowledge Candidate Schema

Each batch candidate file contains:

  • id
  • category
  • claim
  • evidence_tweet_ids
  • evidence_quotes
  • confidence
  • status
  • first_seen_at
  • last_confirmed_at
  • contradicts

The consolidator computes durable vs provisional vs retracted from these fields.

Candidate Eval Contract

Local eval JSON files under ~/.timmy/twitter-archive/training/evals/ must use:

{
  "candidate_id": "timmy-archive-v0.1",
  "baseline_composite": 0.71,
  "candidate_composite": 0.76,
  "refusal_over_fabrication_regression": false,
  "source_distinction_regression": false,
  "evidence_citation_rate": 0.98,
  "rollback_model": "timmy-archive-v0.0"
}

Promotion gate:

  • candidate composite improves by at least 5%
  • no refusal regression
  • no source distinction regression
  • evidence citation rate stays at or above 95%

Training Command Contract

Optional local file pipeline_config.json can define:

{
  "train_command": "bash -lc 'echo train me'",
  "promote_command": "bash -lc 'echo promote me'"
}

If these commands are absent, the pipeline still prepares artifacts and run manifests, but training/promotion stays in a ready state instead of executing.