[PIPELINE] DPO cycle — conversation corrections become training signal #13

Closed
opened 2026-03-26 14:01:27 +00:00 by Timmy · 7 comments
Owner

The Learning Loop

This is the ticket that closes the neural feedback loop. Alexander's one wish.

Pipeline

Alexander talks to Timmy (Hermes session captured)
    |
    v
Alexander corrects Timmy or approves response
    |
    v
Session export extracts (prompt, chosen, rejected) pairs
    - chosen = response Alexander approved or didn't correct
    - rejected = response from stock model OR corrected response
    |
    v
DPO training on MLX (timmy-config #5)
    |
    v
Merge + convert to GGUF
    |
    v
New timmy model deployed to Ollama
    |
    v
Timmy is better at predicting what Alexander prefers
    |
    v (loop)

What exists

  • Session capture (Hermes does this automatically)
  • Session export Huey task (every 4h, 2001 files exported)
  • DPO trainer (timmy-config #5 — antigravity working on it)
  • Rejection generation (run stock model on same prompts)
  • Merge + GGUF conversion pipeline
  • Automated eval before deployment

Acceptance criteria

  • One full cycle completed: session → DPO pairs → train → merge → GGUF → deploy → verify
  • Timmy's response quality measurably improves on the eval suite
  • Process documented so it can repeat without manual intervention

This is the sovereignty loop. Conversations become weights. The apprentice learns from the master.

## The Learning Loop This is the ticket that closes the neural feedback loop. Alexander's one wish. ## Pipeline ``` Alexander talks to Timmy (Hermes session captured) | v Alexander corrects Timmy or approves response | v Session export extracts (prompt, chosen, rejected) pairs - chosen = response Alexander approved or didn't correct - rejected = response from stock model OR corrected response | v DPO training on MLX (timmy-config #5) | v Merge + convert to GGUF | v New timmy model deployed to Ollama | v Timmy is better at predicting what Alexander prefers | v (loop) ``` ## What exists - [x] Session capture (Hermes does this automatically) - [x] Session export Huey task (every 4h, 2001 files exported) - [ ] DPO trainer (timmy-config #5 — antigravity working on it) - [ ] Rejection generation (run stock model on same prompts) - [ ] Merge + GGUF conversion pipeline - [ ] Automated eval before deployment ## Acceptance criteria - [ ] One full cycle completed: session → DPO pairs → train → merge → GGUF → deploy → verify - [ ] Timmy's response quality measurably improves on the eval suite - [ ] Process documented so it can repeat without manual intervention This is the sovereignty loop. Conversations become weights. The apprentice learns from the master.
Rockachopa was assigned by Timmy 2026-03-26 14:01:27 +00:00
Timmy self-assigned this 2026-03-26 14:01:27 +00:00
Author
Owner

Dispatched to kimi. Huey task queued.

⚡ Dispatched to `kimi`. Huey task queued.
Author
Owner

Dispatched to grok. Huey task queued.

⚡ Dispatched to `grok`. Huey task queued.
Author
Owner

Dispatched to perplexity. Huey task queued.

⚡ Dispatched to `perplexity`. Huey task queued.
Member

🔧 gemini working on this via Huey. Branch: gemini/issue-13

🔧 `gemini` working on this via Huey. Branch: `gemini/issue-13`
Member

🔧 grok working on this via Huey. Branch: grok/issue-13

🔧 `grok` working on this via Huey. Branch: `grok/issue-13`
Member

⚠️ grok produced no changes for this issue. Skipping.

⚠️ `grok` produced no changes for this issue. Skipping.
Author
Owner

Closing during the 2026-03-28 backlog burn-down.

Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.

Closing during the 2026-03-28 backlog burn-down. Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.
Timmy closed this issue 2026-03-28 04:53:08 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-config#13