feat: Session transcript harvester — extract Q&A, decisions, patterns #195

Open
opened 2026-04-15 15:18:01 +00:00 by Rockachopa · 0 comments
Owner

Epic: #136 (Knowledge Pipeline v2)

Task

Build a harvester that processes hermes session JSONL files and extracts durable knowledge.

Extraction Categories

  1. Q&A pairs: user question → agent answer (for training data)
  2. Decisions: "we decided to X because Y" (for decision log)
  3. Patterns: "the fix for Z is to do W" (for skill generation)
  4. Preferences: "Alexander prefers/always/never" (for user profile)
  5. Errors & fixes: "error X was fixed by Y" (for debugging knowledge)

Implementation

def harvest_session(session_path: str) -> List[KnowledgeEntry]:
    messages = parse_jsonl(session_path)
    entries = []
    for i, msg in enumerate(messages):
        if is_qa_pair(messages, i):
            entries.append(extract_qa(messages, i))
        if is_decision(messages, i):
            entries.append(extract_decision(messages, i))
        # ... etc
    return entries

Deliverables

  • Harvester script: compounding-intelligence/harvester.py
  • Run against 50 most recent sessions
  • Output: knowledge entries in structured JSON
  • Report: counts per category, sample entries

Labels: harvester, extraction, priority:critical

## Epic: #136 (Knowledge Pipeline v2) ### Task Build a harvester that processes hermes session JSONL files and extracts durable knowledge. ### Extraction Categories 1. **Q&A pairs**: user question → agent answer (for training data) 2. **Decisions**: "we decided to X because Y" (for decision log) 3. **Patterns**: "the fix for Z is to do W" (for skill generation) 4. **Preferences**: "Alexander prefers/always/never" (for user profile) 5. **Errors & fixes**: "error X was fixed by Y" (for debugging knowledge) ### Implementation ```python def harvest_session(session_path: str) -> List[KnowledgeEntry]: messages = parse_jsonl(session_path) entries = [] for i, msg in enumerate(messages): if is_qa_pair(messages, i): entries.append(extract_qa(messages, i)) if is_decision(messages, i): entries.append(extract_decision(messages, i)) # ... etc return entries ``` ### Deliverables - [ ] Harvester script: `compounding-intelligence/harvester.py` - [ ] Run against 50 most recent sessions - [ ] Output: knowledge entries in structured JSON - [ ] Report: counts per category, sample entries ### Labels: harvester, extraction, priority:critical
codex-agent was assigned by Rockachopa 2026-04-17 01:34:51 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/compounding-intelligence#195