Knowledge Mine: Parallel Batch Harvester — Process 20K Sessions #24

Open
opened 2026-04-14 17:57:27 +00:00 by claude · 0 comments
Member

Knowledge Mine — Sub-Issue

The core engine. Read sessions, extract knowledge, store it.

Task

Build ~/.hermes/pipelines/knowledge-mine.py:

  1. Load session scores from session-sampler
  2. For each session (high-value first):
    • Read .jsonl transcript
    • Send to LLM for knowledge extraction (prompt: extract decisions, patterns, errors, tool quirks)
    • Parse structured output
    • Store in ~/.hermes/knowledge/sessions/{session_id}.json
  3. Run 10 parallel workers
  4. Progress tracking + resume on failure

Knowledge Extraction Prompt

For each session, extract:

  • Decisions: What was decided and why
  • Patterns: Recurring approaches that worked
  • Errors: What went wrong and how it was fixed
  • Tool quirks: Non-obvious behavior of tools/APIs
  • User preferences: Corrections, style, workflow preferences
  • Environment facts: OS, hardware, installed tools, configs

Acceptance Criteria

  • Processes 20K sessions in <6 hours with 10 workers
  • Each session produces structured JSON knowledge file
  • Resumable (can restart from last processed session)
  • Token budget tracked and reported
  • Output quality spot-checked on 100 random sessions

Tokens: ~200M (10K per session × 20K sessions)

Depends On: Knowledge Mine: Session Sampler

## Knowledge Mine — Sub-Issue The core engine. Read sessions, extract knowledge, store it. ### Task Build `~/.hermes/pipelines/knowledge-mine.py`: 1. Load session scores from session-sampler 2. For each session (high-value first): - Read .jsonl transcript - Send to LLM for knowledge extraction (prompt: extract decisions, patterns, errors, tool quirks) - Parse structured output - Store in `~/.hermes/knowledge/sessions/{session_id}.json` 3. Run 10 parallel workers 4. Progress tracking + resume on failure ### Knowledge Extraction Prompt For each session, extract: - **Decisions**: What was decided and why - **Patterns**: Recurring approaches that worked - **Errors**: What went wrong and how it was fixed - **Tool quirks**: Non-obvious behavior of tools/APIs - **User preferences**: Corrections, style, workflow preferences - **Environment facts**: OS, hardware, installed tools, configs ### Acceptance Criteria - [ ] Processes 20K sessions in <6 hours with 10 workers - [ ] Each session produces structured JSON knowledge file - [ ] Resumable (can restart from last processed session) - [ ] Token budget tracked and reported - [ ] Output quality spot-checked on 100 random sessions ### Tokens: ~200M (10K per session × 20K sessions) ### Depends On: Knowledge Mine: Session Sampler
claude added the batch-pipelineharvester labels 2026-04-14 17:57:27 +00:00
hermes was assigned by Rockachopa 2026-04-15 01:50:32 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/compounding-intelligence#24