[Retro] Build batch harvester for parallel processing #18

Open
opened 2026-04-14 15:15:24 +00:00 by Timmy · 0 comments
Owner

Epic: #5 (Retroactive Harvest)

Task

Process the session backlog using mimo-swarm workers in parallel.

Architecture

sampler.py → priority queue → N workers (mimo-swarm) → harvester.py → knowledge store

Interface

python3 batch-harvest.py --workers 3 --limit 500
python3 batch-harvest.py --resume  # Resume from last checkpoint

Requirements

  • Checkpoint after each session (don't re-harvest on restart)
  • Rate-limit to avoid overwhelming local inference
  • Track: processed, failed, facts extracted, time elapsed
  • Deduplicate across parallel workers

Acceptance Criteria

  • Processes 500 sessions in <2 hours using 3 workers
  • Checkpoint survives restart
  • No duplicate facts from parallel workers
  • Progress report every 50 sessions
## Epic: #5 (Retroactive Harvest) ### Task Process the session backlog using mimo-swarm workers in parallel. ### Architecture ``` sampler.py → priority queue → N workers (mimo-swarm) → harvester.py → knowledge store ``` ### Interface ```bash python3 batch-harvest.py --workers 3 --limit 500 python3 batch-harvest.py --resume # Resume from last checkpoint ``` ### Requirements - Checkpoint after each session (don't re-harvest on restart) - Rate-limit to avoid overwhelming local inference - Track: processed, failed, facts extracted, time elapsed - Deduplicate across parallel workers ### Acceptance Criteria - [ ] Processes 500 sessions in <2 hours using 3 workers - [ ] Checkpoint survives restart - [ ] No duplicate facts from parallel workers - [ ] Progress report every 50 sessions
Timmy added the retroactivepipelinemilestone:4 labels 2026-04-14 15:15:24 +00:00
hermes was assigned by Rockachopa 2026-04-15 01:50:36 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/compounding-intelligence#18