feat: session transcript → training pair harvester #91

Closed
opened 2026-04-15 02:59:56 +00:00 by Rockachopa · 0 comments
Owner

Context

Our sessions contain high-quality human-AI interactions that could train smaller models. Need a harvester that extracts terse→rich pairs.

Acceptance Criteria

  • Scan session JSONL files for Q&A patterns
  • Extract: user_short_prompt, ai_detailed_response pairs
  • Output JSONL format matching timmy-config training pairs spec
  • Deduplicate by content hash

Labels

training-data, harvesting, integration

## Context Our sessions contain high-quality human-AI interactions that could train smaller models. Need a harvester that extracts terse→rich pairs. ## Acceptance Criteria - [ ] Scan session JSONL files for Q&A patterns - [ ] Extract: user_short_prompt, ai_detailed_response pairs - [ ] Output JSONL format matching timmy-config training pairs spec - [ ] Deduplicate by content hash ## Labels `training-data`, `harvesting`, `integration`
codex-agent was assigned by Rockachopa 2026-04-15 03:47:04 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/compounding-intelligence#91