[Harvester] Build harvester.py — extract knowledge from a single session #8

Open
opened 2026-04-14 15:15:17 +00:00 by Timmy · 1 comment
Owner

Epic: #2 (Session Harvester)

Task

Build the harvester script that combines session_reader + extraction prompt.

Flow

  1. Read session JSONL with session_reader.py
  2. Truncate to fit context window (keep first 50 + last 50 messages for long sessions)
  3. Run extraction prompt through local mimo
  4. Parse structured output
  5. Deduplicate against existing knowledge (check knowledge/ dir)
  6. Write new facts to knowledge store

Interface

python3 harvester.py --session ~/.hermes/sessions/session_xxx.jsonl --output knowledge/
python3 harvester.py --batch --since 2026-04-01 --limit 100

Acceptance Criteria

  • Processes one session in <30 seconds
  • Deduplicates against existing knowledge
  • Writes to correct knowledge/ subdirectory
  • Handles extraction failures gracefully (logs, doesn't crash)
## Epic: #2 (Session Harvester) ### Task Build the harvester script that combines session_reader + extraction prompt. ### Flow 1. Read session JSONL with session_reader.py 2. Truncate to fit context window (keep first 50 + last 50 messages for long sessions) 3. Run extraction prompt through local mimo 4. Parse structured output 5. Deduplicate against existing knowledge (check knowledge/ dir) 6. Write new facts to knowledge store ### Interface ```bash python3 harvester.py --session ~/.hermes/sessions/session_xxx.jsonl --output knowledge/ python3 harvester.py --batch --since 2026-04-01 --limit 100 ``` ### Acceptance Criteria - [ ] Processes one session in <30 seconds - [ ] Deduplicates against existing knowledge - [ ] Writes to correct knowledge/ subdirectory - [ ] Handles extraction failures gracefully (logs, doesn't crash)
Timmy added the harvestermilestone:1 labels 2026-04-14 15:15:17 +00:00
hermes was assigned by Rockachopa 2026-04-15 01:50:45 +00:00
Author
Owner

Starting Work on Issue #8

I'm picking up this issue to complete the harvester implementation.

Context

  • Issue #7 (knowledge extraction prompt) is complete
  • harvester.py exists in burn/8-harvester-py branch (PR #20 was closed)
  • Need to merge harvester.py and ensure it works with mimo-v2-pro

Plan

  1. Review existing harvester.py code
  2. Test with the 5 test sessions from issue #7
  3. Ensure acceptance criteria are met:
    • Processes one session in <30 seconds
    • Deduplicates against existing knowledge
    • Writes to correct knowledge/ subdirectory
    • Handles extraction failures gracefully

Next Steps

  • Run harvester.py against test sessions
  • Validate output format
  • Create PR to merge to main

Starting now.

## Starting Work on Issue #8 I'm picking up this issue to complete the harvester implementation. ### Context - Issue #7 (knowledge extraction prompt) is complete - harvester.py exists in burn/8-harvester-py branch (PR #20 was closed) - Need to merge harvester.py and ensure it works with mimo-v2-pro ### Plan 1. Review existing harvester.py code 2. Test with the 5 test sessions from issue #7 3. Ensure acceptance criteria are met: - Processes one session in <30 seconds - Deduplicates against existing knowledge - Writes to correct knowledge/ subdirectory - Handles extraction failures gracefully ### Next Steps - Run harvester.py against test sessions - Validate output format - Create PR to merge to main Starting now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/compounding-intelligence#8