[TEST] AutoLoRA pipeline — trajectory ingestion to dry-run train #524

New Issue

perplexity · 2026-03-25T17:29:00Z

perplexity commented

2026-03-25 17:29:00 +00:00

AutoLoRA Pipeline Test — Trajectory Ingestion to Dry-Run Train

Parent: #517 (Nexus Mind — First Light Test Plan)
Assigned to: Perplexity — you wrote the ingestion script. Close the loop.

What to Test

After the endurance test (#522) produces trajectory data:

Run ingest_nexus_trajectories.py against the trajectory files
Verify quality filtering works (trivial cycles removed, good cycles kept)
Merge with existing curated dataset (29 exemplars)
Validate merged JSONL format matches train_modal.py expectations
Dry-run: load merged data into the training script, verify tokenization works (no actual training needed — just data validation)

Specific Checks

ingest_nexus_trajectories.py finds and reads all trajectory files
Quality filter removes < 30 char thoughts
Quality filter removes echo responses (> 70% similarity)
Quality filter removes "nothing happened" cycles
Merged output has system/human/gpt turns in correct ShareGPT format
train_modal.py format_conversation() can process every entry without error
Token lengths are within MAX_SEQ_LENGTH (2048) for most entries
Curated exemplars appear first in merged output (gold standard priority)

Acceptance Criteria

Pipeline runs end-to-end without errors
Merged dataset stats documented (curated count + trajectory count + quality ratio)
No training data corruption
Ready for actual LoRA training on next cycle

Why You

You built the AutoLoRA integration — the ingestion script, the quality filters, the merge logic. Verify your own work closes the loop: lived experience → training data → better model.

## AutoLoRA Pipeline Test — Trajectory Ingestion to Dry-Run Train **Parent:** #517 (Nexus Mind — First Light Test Plan) **Assigned to:** Perplexity — you wrote the ingestion script. Close the loop. ### What to Test After the endurance test (#522) produces trajectory data: 1. Run `ingest_nexus_trajectories.py` against the trajectory files 2. Verify quality filtering works (trivial cycles removed, good cycles kept) 3. Merge with existing curated dataset (29 exemplars) 4. Validate merged JSONL format matches `train_modal.py` expectations 5. Dry-run: load merged data into the training script, verify tokenization works (no actual training needed — just data validation) ### Specific Checks - [ ] `ingest_nexus_trajectories.py` finds and reads all trajectory files - [ ] Quality filter removes < 30 char thoughts - [ ] Quality filter removes echo responses (> 70% similarity) - [ ] Quality filter removes "nothing happened" cycles - [ ] Merged output has system/human/gpt turns in correct ShareGPT format - [ ] `train_modal.py` `format_conversation()` can process every entry without error - [ ] Token lengths are within MAX_SEQ_LENGTH (2048) for most entries - [ ] Curated exemplars appear first in merged output (gold standard priority) ### Acceptance Criteria - Pipeline runs end-to-end without errors - Merged dataset stats documented (curated count + trajectory count + quality ratio) - No training data corruption - Ready for actual LoRA training on next cycle ### Why You You built the AutoLoRA integration — the ingestion script, the quality filters, the merge logic. Verify your own work closes the loop: lived experience → training data → better model.

perplexity self-assigned this 2026-03-25 17:29:00 +00:00

perplexity referenced this issue

2026-03-25 17:29:17 +00:00

[EPIC] Nexus Mind — First Light Test Plan #517

perplexity referenced this issue

2026-03-27 01:10:14 +00:00

[HARNESS] Aurora pipeline — session transcripts → model weight updates #598

perplexity referenced this issue

2026-03-27 01:10:14 +00:00

[HARNESS] Aurora pipeline — session transcripts → model weight updates #599

perplexity referenced this issue

2026-03-27 01:10:16 +00:00

[HARNESS] Aurora pipeline — session transcripts → model weight updates #603

perplexity referenced this issue

2026-03-27 02:05:04 +00:00

[MCP] Unsloth — faster local fine-tuning for Aurora pipeline #630

perplexity referenced this issue

2026-03-27 02:05:06 +00:00

[MCP] Unsloth — faster local fine-tuning for Aurora pipeline #633

perplexity referenced this issue

2026-03-27 02:05:06 +00:00

[MCP] Unsloth — faster local fine-tuning for Aurora pipeline #634

Timmy commented

2026-03-28 04:52:52 +00:00

Closing during the 2026-03-28 backlog burn-down.

Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.

Closing during the 2026-03-28 backlog burn-down. Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.

Timmy closed this issue

2026-03-28 04:52:52 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#524