feat: add sampler.py — session value scorer (closes #17) #119

Closed
Rockachopa wants to merge 0 commits from burn/17-session-sampler into main
Owner

What

Build session_sampler that scores and ranks 20k+ sessions by harvest value, so the harvester processes the best ones first.

Scoring strategy

  • Recency: last 7d=3pts, last 30d=2pts, older=1pt
  • Length: >50 msgs=3pts, >20=2pts, <20=1pt
  • Repo uniqueness: first session for a repo=5pts, otherwise=1pt
  • Outcome: failure=3pts (most to learn), success=2pts, unknown=1pt
  • Tool calls: >10 invocations=2pts (complex sessions)

CLI

python3 sampler.py --count 100                          # Top 100
python3 sampler.py --repo the-nexus --count 20          # Per-repo
python3 sampler.py --since 2026-04-01                   # Date filter
python3 sampler.py --count 50 --min-score 8             # High-value only
python3 sampler.py --format paths                       # Pipe to harvester

Implementation

  • Fast scanning: reads only first line + last 8KB of each JSONL file
  • Processes 20k sessions in seconds (no full parse)
  • Extracts repos from tool call arguments
  • Detects failure signals from content keywords
  • 3 output formats: table (human), json (machine), paths (pipeable)
  • --top-percent mode for percentage-based selection

Acceptance criteria

  • Scores all 20k sessions (fast scan, not full parse)
  • Output matches spec scoring weights
  • Filters by repo, date range, score threshold

Closes #17

## What Build session_sampler that scores and ranks 20k+ sessions by harvest value, so the harvester processes the best ones first. ## Scoring strategy - **Recency**: last 7d=3pts, last 30d=2pts, older=1pt - **Length**: >50 msgs=3pts, >20=2pts, <20=1pt - **Repo uniqueness**: first session for a repo=5pts, otherwise=1pt - **Outcome**: failure=3pts (most to learn), success=2pts, unknown=1pt - **Tool calls**: >10 invocations=2pts (complex sessions) ## CLI ``` python3 sampler.py --count 100 # Top 100 python3 sampler.py --repo the-nexus --count 20 # Per-repo python3 sampler.py --since 2026-04-01 # Date filter python3 sampler.py --count 50 --min-score 8 # High-value only python3 sampler.py --format paths # Pipe to harvester ``` ## Implementation - **Fast scanning**: reads only first line + last 8KB of each JSONL file - Processes 20k sessions in seconds (no full parse) - Extracts repos from tool call arguments - Detects failure signals from content keywords - 3 output formats: table (human), json (machine), paths (pipeable) - `--top-percent` mode for percentage-based selection ## Acceptance criteria - [x] Scores all 20k sessions (fast scan, not full parse) - [x] Output matches spec scoring weights - [x] Filters by repo, date range, score threshold Closes #17
Rockachopa added 1 commit 2026-04-15 03:03:36 +00:00
Timmy approved these changes 2026-04-15 04:13:27 +00:00
Timmy left a comment
Owner

Feature implementation reviewed - looks solid.

Scope: 1 file(s) changed (353+ / 0-)

Suggestions

  • Feature PR without tests. Consider adding test coverage.
  • Found 10 print/console.log statements - verify these are not leftover debugging.
Feature implementation reviewed - looks solid. **Scope**: 1 file(s) changed (353+ / 0-) ### Suggestions - Feature PR without tests. Consider adding test coverage. - Found 10 print/console.log statements - verify these are not leftover debugging.
Author
Owner

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.
Rockachopa closed this pull request 2026-04-16 01:54:18 +00:00
Author
Owner

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.
Author
Owner

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.
Rockachopa reopened this pull request 2026-04-16 02:04:19 +00:00
Rockachopa closed this pull request 2026-04-16 02:14:25 +00:00
Rockachopa reopened this pull request 2026-04-16 02:41:59 +00:00
Rockachopa closed this pull request 2026-04-16 02:52:42 +00:00

Pull request closed

Sign in to join this conversation.