Files
timmy-home/specs/twitter-archive-learning-pipeline.md

113 lines
2.3 KiB
Markdown
Raw Normal View History

# Twitter Archive Learning Pipeline
This repo owns the tracked code, schemas, prompts, and eval contracts for
Timmy's private Twitter archive learning loop.
## Privacy Boundary
- Raw archive files stay outside git.
- Derived runtime artifacts live under `~/.timmy/twitter-archive/`.
- `twitter-archive/` is ignored by `timmy-home` so private notes and training
artifacts do not get pushed by accident.
Tracked here:
- deterministic extraction and consolidation scripts
- output schemas
- eval gate contract
- prompt/orchestration code in `timmy-config`
Not tracked here:
- raw tweets
- extracted tweet text
- batch notes
- private profile artifacts
- local-only DPO pairs
- local eval outputs
## Runtime Layout
The runtime workspace is:
```text
~/.timmy/twitter-archive/
extracted/
notes/
knowledge/
insights/
training/
checkpoint.json
metrics/progress.json
source_config.json
pipeline_config.json
```
## Source Config
Optional local file:
```json
{
"source_path": "~/Downloads/twitter-.../data"
}
```
Environment override:
```bash
TIMMY_TWITTER_ARCHIVE_SOURCE=~/Downloads/twitter-.../data
```
## Knowledge Candidate Schema
Each batch candidate file contains:
- `id`
- `category`
- `claim`
- `evidence_tweet_ids`
- `evidence_quotes`
- `confidence`
- `status`
- `first_seen_at`
- `last_confirmed_at`
- `contradicts`
The consolidator computes durable vs provisional vs retracted from these fields.
## Candidate Eval Contract
Local eval JSON files under `~/.timmy/twitter-archive/training/evals/` must use:
```json
{
"candidate_id": "timmy-archive-v0.1",
"baseline_composite": 0.71,
"candidate_composite": 0.76,
"refusal_over_fabrication_regression": false,
"source_distinction_regression": false,
"evidence_citation_rate": 0.98,
"rollback_model": "timmy-archive-v0.0"
}
```
Promotion gate:
- candidate composite improves by at least 5%
- no refusal regression
- no source distinction regression
- evidence citation rate stays at or above 95%
## Training Command Contract
Optional local file `pipeline_config.json` can define:
```json
{
"train_command": "bash -lc 'echo train me'",
"promote_command": "bash -lc 'echo promote me'"
}
```
If these commands are absent, the pipeline still prepares artifacts and run
manifests, but training/promotion stays in a ready state instead of executing.