# Twitter Archive Learning Pipeline This repo owns the tracked code, schemas, prompts, and eval contracts for Timmy's private Twitter archive learning loop. ## Privacy Boundary - Raw archive files stay outside git. - Derived runtime artifacts live under `~/.timmy/twitter-archive/`. - `twitter-archive/` is ignored by `timmy-home` so private notes and training artifacts do not get pushed by accident. Tracked here: - deterministic extraction and consolidation scripts - output schemas - eval gate contract - prompt/orchestration code in `timmy-config` Not tracked here: - raw tweets - extracted tweet text - batch notes - private profile artifacts - local-only DPO pairs - local eval outputs ## Runtime Layout The runtime workspace is: ```text ~/.timmy/twitter-archive/ extracted/ notes/ knowledge/ insights/ training/ checkpoint.json metrics/progress.json source_config.json pipeline_config.json ``` ## Source Config Optional local file: ```json { "source_path": "~/Downloads/twitter-.../data" } ``` Environment override: ```bash TIMMY_TWITTER_ARCHIVE_SOURCE=~/Downloads/twitter-.../data ``` ## Knowledge Candidate Schema Each batch candidate file contains: - `id` - `category` - `claim` - `evidence_tweet_ids` - `evidence_quotes` - `confidence` - `status` - `first_seen_at` - `last_confirmed_at` - `contradicts` The consolidator computes durable vs provisional vs retracted from these fields. ## Candidate Eval Contract Local eval JSON files under `~/.timmy/twitter-archive/training/evals/` must use: ```json { "candidate_id": "timmy-archive-v0.1", "baseline_composite": 0.71, "candidate_composite": 0.76, "refusal_over_fabrication_regression": false, "source_distinction_regression": false, "evidence_citation_rate": 0.98, "rollback_model": "timmy-archive-v0.0" } ``` Promotion gate: - candidate composite improves by at least 5% - no refusal regression - no source distinction regression - evidence citation rate stays at or above 95% ## Training Command Contract Optional local file `pipeline_config.json` can define: ```json { "train_command": "bash -lc 'echo train me'", "promote_command": "bash -lc 'echo promote me'" } ``` If these commands are absent, the pipeline still prepares artifacts and run manifests, but training/promotion stays in a ready state instead of executing.