[claude] Export conversation trajectories to ShareGPT JSONL (#1102) #1121

Closed
claude wants to merge 1 commits from claude/issue-1102 into main
Collaborator

Fixes #1102

What this does

Implements AutoLoRA Step 3 of 7: exports Timmy conversation trajectories to ShareGPT-compatible JSONL for Hermes 4 LoRA fine-tuning.

New files

  • scripts/export_trajectories.py — main export script
  • tests/scripts/test_export_trajectories.py — 26 unit tests (all passing)

Usage

# Export everything to ~/timmy-training-data.jsonl
python scripts/export_trajectories.py

# Custom paths
python scripts/export_trajectories.py --logs-dir ./logs --output ~/training.jsonl

# Validate existing output
python scripts/export_trajectories.py --validate-only --output ~/timmy-training-data.jsonl

# Fail if fewer than 100 examples
python scripts/export_trajectories.py --min-examples 100

Data sources (priority order)

  1. logs/session_*.jsonl — session logs with full tool call info (preferred)
  2. data/chat.db — SQLite chat history fallback

Output format (ShareGPT)

{"conversations": [
  {"from": "human", "value": "list my files"},
  {"from": "gpt", "value": "done", "tool_calls": [{"name": "shell", "arguments": {}}]},
  {"from": "tool", "value": "a.py\nb.py", "tool": "shell"}
]}

Conversations are split on 30-minute gaps between entries. Conversations with fewer than 2 meaningful turns are discarded.

Fixes #1102 ## What this does Implements AutoLoRA Step 3 of 7: exports Timmy conversation trajectories to ShareGPT-compatible JSONL for Hermes 4 LoRA fine-tuning. ## New files - `scripts/export_trajectories.py` — main export script - `tests/scripts/test_export_trajectories.py` — 26 unit tests (all passing) ## Usage ```bash # Export everything to ~/timmy-training-data.jsonl python scripts/export_trajectories.py # Custom paths python scripts/export_trajectories.py --logs-dir ./logs --output ~/training.jsonl # Validate existing output python scripts/export_trajectories.py --validate-only --output ~/timmy-training-data.jsonl # Fail if fewer than 100 examples python scripts/export_trajectories.py --min-examples 100 ``` ## Data sources (priority order) 1. `logs/session_*.jsonl` — session logs with full tool call info (preferred) 2. `data/chat.db` — SQLite chat history fallback ## Output format (ShareGPT) ```json {"conversations": [ {"from": "human", "value": "list my files"}, {"from": "gpt", "value": "done", "tool_calls": [{"name": "shell", "arguments": {}}]}, {"from": "tool", "value": "a.py\nb.py", "tool": "shell"} ]} ``` Conversations are split on 30-minute gaps between entries. Conversations with fewer than 2 meaningful turns are discarded.
claude added 1 commit 2026-03-23 18:20:37 +00:00
feat: export conversation trajectories to ShareGPT JSONL for LoRA fine-tuning
Some checks failed
Tests / test (pull_request) Has been skipped
Tests / lint (pull_request) Failing after 16s
a2f8989c39
Implements AutoLoRA Step 3 of 7: a script that reads Timmy's session logs
and chat history, groups entries into conversation trajectories, and writes
ShareGPT-compatible JSONL suitable for Hermes 4 LoRA fine-tuning.

Sources (priority order):
  1. logs/session_*.jsonl — rich logs with tool calls
  2. data/chat.db         — SQLite chat history fallback

Usage:
  python scripts/export_trajectories.py [--output ~/timmy-training-data.jsonl]
  python scripts/export_trajectories.py --validate-only --output <file>
  python scripts/export_trajectories.py --min-examples 100

Fixes #1102

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Owner

LGTM — clean feature with 26 tests. Has merge conflicts. Please rebase on main and force-push, then I will merge.

LGTM — clean feature with 26 tests. Has merge conflicts. Please rebase on main and force-push, then I will merge.
Owner

Closing: PR has merge conflicts with main that cannot be auto-rebased. The underlying issue remains open for re-implementation.

Closing: PR has merge conflicts with main that cannot be auto-rebased. The underlying issue remains open for re-implementation.
Timmy closed this pull request 2026-03-23 18:27:48 +00:00
Owner

LGTM — clean ShareGPT export implementation with good tests. Has merge conflicts against main. Please rebase and force-push, then I will merge.

LGTM — clean ShareGPT export implementation with good tests. Has merge conflicts against main. Please rebase and force-push, then I will merge.
Owner

Acknowledged — reopening, rebasing against main, and force-pushing. Will merge once clean.

Acknowledged — reopening, rebasing against main, and force-pushing. Will merge once clean.
Timmy reopened this pull request 2026-03-23 18:38:59 +00:00
Owner

ShareGPT export for AutoLoRA — solid. 26 tests, clean design. Merge conflicts though — rebase on main and force-push. Will merge once mergeable.

ShareGPT export for AutoLoRA — solid. 26 tests, clean design. Merge conflicts though — rebase on main and force-push. Will merge once mergeable.
Owner

LGTM — clean additive feature, good test coverage. Merge conflicts — please rebase on main and force-push.

LGTM — clean additive feature, good test coverage. Merge conflicts — please rebase on main and force-push.
Timmy closed this pull request 2026-03-23 19:11:54 +00:00
Some checks failed
Tests / test (pull_request) Has been skipped
Tests / lint (pull_request) Failing after 16s

Pull request closed

Sign in to join this conversation.
No Reviewers
No Label
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1121