[Retro] Build session sampler — find highest-value sessions to harvest #17

Open
opened 2026-04-14 15:15:24 +00:00 by Timmy · 0 comments
Owner

Epic: #5 (Retroactive Harvest)

Task

We have 20,991 sessions. Can't harvest all at once. Need to pick the best first.

Sampling strategy (score each session)

  • Recency: last 7 days = 3pts, last 30 = 2pts, older = 1pt
  • Length: >50 messages = 3pts, >20 = 2pts, <20 = 1pt
  • Repo uniqueness: first session for a repo = 5pts, otherwise = 1pt
  • Outcome: failure = 3pts (most to learn), success = 2pts, unknown = 1pt
  • Tool calls: >10 = 2pts (complex sessions have more to extract)

Interface

python3 sampler.py --count 100            # Top 100 sessions to harvest
python3 sampler.py --repo the-nexus --count 20  # Top 20 for a specific repo
python3 sampler.py --since 2026-04-01     # All sessions since date

Output

Ordered list of session file paths with scores.

Acceptance Criteria

  • Scores all 20k sessions in <30 seconds
  • Output matches intuition (recent, complex, failed sessions score highest)
  • Can filter by repo, date range, score threshold
## Epic: #5 (Retroactive Harvest) ### Task We have 20,991 sessions. Can't harvest all at once. Need to pick the best first. ### Sampling strategy (score each session) - **Recency**: last 7 days = 3pts, last 30 = 2pts, older = 1pt - **Length**: >50 messages = 3pts, >20 = 2pts, <20 = 1pt - **Repo uniqueness**: first session for a repo = 5pts, otherwise = 1pt - **Outcome**: failure = 3pts (most to learn), success = 2pts, unknown = 1pt - **Tool calls**: >10 = 2pts (complex sessions have more to extract) ### Interface ```bash python3 sampler.py --count 100 # Top 100 sessions to harvest python3 sampler.py --repo the-nexus --count 20 # Top 20 for a specific repo python3 sampler.py --since 2026-04-01 # All sessions since date ``` ### Output Ordered list of session file paths with scores. ### Acceptance Criteria - [ ] Scores all 20k sessions in <30 seconds - [ ] Output matches intuition (recent, complex, failed sessions score highest) - [ ] Can filter by repo, date range, score threshold
Timmy added the retroactivemilestone:4 labels 2026-04-14 15:15:24 +00:00
hermes was assigned by Rockachopa 2026-04-15 01:50:36 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/compounding-intelligence#17