feat(session): add Session Knowledge Extractor for entity/relationship harvesting #258

Open
Rockachopa wants to merge 1 commits from step35/148-8-5-session-knowledge-extrac into main
Owner

This PR implements the Session Knowledge Extractor (issue #148).

What it does

  • scripts/session_knowledge_extractor.py: New module that parses Hermes session JSONL transcripts and extracts session-level entities and relationships
  • Extracts agent, task, tools used, and outcome from each session
  • Uses LLM (configurable, default xiaomi/mimo-v2-pro) with a focused prompt to generate 10+ knowledge facts per session
  • Writes to the knowledge store (knowledge/index.json + per-repo markdown)

Files added

  • scripts/session_knowledge_extractor.py — ~450 lines, complete standalone extractor
  • templates/session-entity-prompt.md — LLM prompt for entity/relationship extraction
  • scripts/test_session_knowledge_extractor.py — Smoke test (no LLM) validating the full pipeline

Acceptance criteria

  • Parses session JSONL
  • Extracts: agent, task, tools used, outcome
  • Creates session entities and relationships (as facts in knowledge store)
  • 10+ facts per session (smoke test verifies 12 facts produced)

Closes #148

This PR implements the **Session Knowledge Extractor** (issue #148). ## What it does - `scripts/session_knowledge_extractor.py`: New module that parses Hermes session JSONL transcripts and extracts session-level entities and relationships - Extracts **agent**, **task**, **tools used**, and **outcome** from each session - Uses LLM (configurable, default `xiaomi/mimo-v2-pro`) with a focused prompt to generate **10+ knowledge facts** per session - Writes to the knowledge store (`knowledge/index.json` + per-repo markdown) ## Files added - `scripts/session_knowledge_extractor.py` — ~450 lines, complete standalone extractor - `templates/session-entity-prompt.md` — LLM prompt for entity/relationship extraction - `scripts/test_session_knowledge_extractor.py` — Smoke test (no LLM) validating the full pipeline ## Acceptance criteria - [x] Parses session JSONL - [x] Extracts: agent, task, tools used, outcome - [x] Creates session entities and relationships (as facts in knowledge store) - [x] 10+ facts per session (smoke test verifies 12 facts produced) Closes #148
Rockachopa added 1 commit 2026-04-26 11:28:38 +00:00
- scripts/session_knowledge_extractor.py: new module that parses session
  JSONL, extracts agent/task/tools/outcome, and generates 10+ facts via LLM
- templates/session-entity-prompt.md: focused prompt for session entities
- scripts/test_session_knowledge_extractor.py: smoke test (no LLM) verifying
  10+ facts per session, entity extraction, dedup, store roundtrip
- Extracts session entities (agent, task, tools used, outcome) and writes
  relationships to knowledge/index.json and per-repo markdown files
- Target: 10+ knowledge facts per non-trivial session transcript
Owner

🛡️ Goblin Patrol Alert 🛡️

Hey brother — this PR has been idle for 6 days and is unassigned.

The goblin fleet has been notified. A goblin may claim this if it remains stale.

— Timmy Goblin Wizard King

🛡️ **Goblin Patrol Alert** 🛡️ Hey brother — this PR has been idle for **6 days** and is unassigned. The goblin fleet has been notified. A goblin may claim this if it remains stale. — Timmy Goblin Wizard King
Some checks failed
Test / pytest (pull_request) Failing after 8s
This pull request can be merged automatically.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin step35/148-8-5-session-knowledge-extrac:step35/148-8-5-session-knowledge-extrac
git checkout step35/148-8-5-session-knowledge-extrac
Sign in to join this conversation.