Add session_metadata.py — structured session metadata extractor (#6) #52
Closed
Rockachopa
wants to merge 0 commits from
feat/session-metadata into main
pull from: feat/session-metadata
merge into: Timmy_Foundation:main
Timmy_Foundation:main
Timmy_Foundation:step35/150-8-7-graph-query-engine
Timmy_Foundation:step35/230-atlas-memory-eval-run-a-live
Timmy_Foundation:step35/89-3-10-test-generation-orchest
Timmy_Foundation:step35/87-3-8-regression-test-generato
Timmy_Foundation:step35/231-atlas-wiki-build-the-llm-wik
Timmy_Foundation:step35/108-5-2-vulnerability-scanner
Timmy_Foundation:step35/233-atlas-connectors-sovereign-p
Timmy_Foundation:step35/195-feat-session-transcript-harv
Timmy_Foundation:step35/199-feat-training-data-pipeline
Timmy_Foundation:step35/232-atlas-research-solve-the-swa
Timmy_Foundation:step35/127-6-9-review-quality-scorer
Timmy_Foundation:step35/99-4-4-architecture-doc-generat
Timmy_Foundation:step35/172-10-7-knowledge-gap-identifier
Timmy_Foundation:step35/162-9-8-code-duplication-detecto
Timmy_Foundation:step35/121-6-3-logic-reviewer
Timmy_Foundation:step35/104-4-9-doc-freshness-checker
Timmy_Foundation:step35/157-9-3-type-checker
Timmy_Foundation:step35/171-10-6-performance-bottleneck
Timmy_Foundation:step35/161-9-7-dependency-freshness
Timmy_Foundation:step35/140-7-8-citation-tracker
Timmy_Foundation:step35/132-feat-codebase-genome-diff-de
Timmy_Foundation:step35/135-feat-pr-complexity-scorer-es
Timmy_Foundation:step35/124-6-6-test-coverage-checker
Timmy_Foundation:step35/113-5-7-security-patch-applier
Timmy_Foundation:step35/109-5-3-update-checker
Timmy_Foundation:step35/170-10-5-automation-opportunity
Timmy_Foundation:step35/148-8-5-session-knowledge-extrac
Timmy_Foundation:step35/147-8-4-cross-repo-connector
Timmy_Foundation:step35/126-review-comment-generator
Timmy_Foundation:step35/134-gh-trending
Timmy_Foundation:step35/138-7-6-conference-talk-summariz
Timmy_Foundation:step35/96-4-1-docstring-generator
Timmy_Foundation:step35/98-4-3-api-doc-generator
Timmy_Foundation:step35/205-feat-zero-shot-knowledge-syn
Timmy_Foundation:step35/173-10-8-progress-tracker
Timmy_Foundation:step35/137-7-5-release-note-analyzer
Timmy_Foundation:step35/107-5-1-dependency-inventory
Timmy_Foundation:step35/111-5-5-transitive-dependency-an
Timmy_Foundation:step35/90-feat-gitea-issue-body-parser
Timmy_Foundation:step35/158-9-4-security-linter
Timmy_Foundation:step35/155-9-1-linter-runner
Timmy_Foundation:step35/133-feat-import-graph-visualizat
Timmy_Foundation:step35/93-feat-cross-repo-dependency-g
Timmy_Foundation:step35/112-5-6-dependency-bloat-detecto
Timmy_Foundation:step35/97-4-2-readme-generator
Timmy_Foundation:step35/91-feat-session-transcript-trai
Timmy_Foundation:step35/144-8-1-entity-extractor
Timmy_Foundation:step35/151-8-8-graph-visualizer
Timmy_Foundation:step35/88-3-9-test-documentation-gener
Timmy_Foundation:step35/197-feat-provenance-chain-source
Timmy_Foundation:step35/103-4-8-doc-link-validator
Timmy_Foundation:burn/196-1776306000
Timmy_Foundation:feat/200-knowledge-freshness-cron
Timmy_Foundation:fix/syntax-bottleneck-211
Timmy_Foundation:fix/212-dependency-graph-dot-quoting
Timmy_Foundation:fix/211-syntax-errors
Timmy_Foundation:fix/210-refactoring-opportunity-api
Timmy_Foundation:fix/210-refactoring-opportunity-finder
Timmy_Foundation:burn/210-1776305000
Timmy_Foundation:burn/211-1776305100
Timmy_Foundation:fix/211-syntax-error
Timmy_Foundation:fix/212-dot-quoting
Timmy_Foundation:fix/perf-bottleneck-syntax-211
Timmy_Foundation:fix/211-perf-bottleneck-syntax
Timmy_Foundation:burn/212-fix-dot-quoting
Timmy_Foundation:fix/211
Timmy_Foundation:fix/212-dependency-graph-quoting
Timmy_Foundation:fix/676
Timmy_Foundation:fix/198-quality-gate
Timmy_Foundation:fix/201-pytest-warnings
Timmy_Foundation:burn/210-1776852000
Timmy_Foundation:fix/676-genome-ci
Timmy_Foundation:fix/190
Timmy_Foundation:burn/170-1776263897
Timmy_Foundation:burn/169-1776263898
Timmy_Foundation:burn/174-1776263883
Timmy_Foundation:burn/171-1776263896
Timmy_Foundation:burn/168-1776263899
Timmy_Foundation:burn/172-1776263893
Timmy_Foundation:burn/175-1776263877
Timmy_Foundation:feat/179-staleness-check
Timmy_Foundation:feat/176-diff-analyzer
Timmy_Foundation:feat/177-issue-parser
Timmy_Foundation:feat/94-dead-code-detector
Timmy_Foundation:burn/172-1776218600
Timmy_Foundation:feat/93-dependency-graph
Timmy_Foundation:feat/92-knowledge-staleness-detector
Timmy_Foundation:feat/91-session-pair-harvester
Timmy_Foundation:feat/90-issue-body-parser
Timmy_Foundation:burn/110-license-checker
Timmy_Foundation:burn/118-1776218500
Timmy_Foundation:burn/17-session-sampler
Timmy_Foundation:fix/7-extraction-prompt
Timmy_Foundation:docs/genome-676
Timmy_Foundation:fix/10-knowledge-format
Timmy_Foundation:fix/14-measurer
Timmy_Foundation:fix/9-auto-harvest-cron
Timmy_Foundation:fix/19-migrate-memory
Timmy_Foundation:fix/11-bootstrapper
Timmy_Foundation:fix/8-harvester
Timmy_Foundation:feat/session-reader
Timmy_Foundation:burn/8-harvester-py
Labels
Clear labels
acceptance-criteria
batch-pipeline
bootstrapper
epic
harvester
measurer
milestone:1
milestone:2
milestone:3
milestone:4
pipeline
pipeline
priority:high
priority:medium
retroactive
throughput-10x
token-masterplan
Token masterplan batch pipeline
Pre-session context injection
Epic-level issue
Session knowledge extraction
Compounding metrics
Milestone 1: Foundation
Milestone 2: Integration
Milestone 3: Measurement
Milestone 4: Retroactive
Pipeline/integration work
Processing existing sessions
throughput-10x label
token-masterplan label
No Label
Milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
Rockachopa
Timmy
allegro
antigravity
bezalel
claude
codex-agent
ezra
gemini
google
grok
hermes
kimi
manus
perplexity
Clear assignees
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Timmy_Foundation/compounding-intelligence#52
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "feat/session-metadata"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #6
Adds session_metadata.py module that works alongside the existing session_reader.py to provide structured session metadata extraction.
Design
This module complements (does not replace) session_reader.py:
Features
python session_metadata.py path/to/session.jsonlCompatibility
Review: session_metadata.py (#6)
Overall this is a clean, well-structured metadata extractor. A few issues to address:
Bug —
read_sessionimport mismatch:from session_reader import read_sessionimports a bare function, butsession_reader.py(from PR #29) definesread_sessionas a method on theSessionReaderclass, not as a module-level function. This import will fail at runtime withImportError. Either add a module-levelread_session()wrapper tosession_reader.py, or change this to instantiateSessionReaderand callreader.read_session(path).Bug —
read_sessionreturn type assumption:The code assumes
read_session()returns a flat list of message dicts, butSessionReader.read_session()returns a dict with amessageskey. The subsequentfor entry in messagesloop would iterate over dict keys instead of message objects.Minor — duplicate
import re:reis imported at the top of the file, but also imported again insideextract_session_metadatain the regex blocks. Remove the inner imports.Minor — timestamp parsing:
The timestamp parsing loop with multiple
strptimeformats is fine but could usedatetime.fromisoformat()as a fallback (like session_reader.py does) to handle timezone-aware ISO strings.Positive:
The import/API mismatch with session_reader is a blocking issue — this will crash on any invocation. Please fix the integration with SessionReader before merge.
Review: session_metadata.py
Overall: Well-structured module that complements session_reader.py with higher-level metadata extraction. Good separation of concerns.
Strengths:
SessionSummary— good use of typed fields with sensible defaultsIssues to address:
len(content.split())counts words, not tokens. The comment says ~4 chars/token but the code counts words. Either uselen(content) // 4or rename the field toword_count_estimate.breakafter finding a repo match means if the first content match is a false positive (e.g., a URL in an error message), the real repo is never found. Consider scanning all messages and taking the most frequent match.__main__guard — the script has CLI functionality but noif __name__ == "__main__":guard, which means importing it will trigger argparse.total_tokens_estimatecounted as words — the field name says tokens but the accumulator countssplit()results (words). This will undercount by ~1.3x.Minor:
asdictimport is unused in the visible code — confirm it is used in the full file or remove it.to_json()convenience method onSessionSummary.Approve with the token estimation fix.
The session_metadata.py module is well-structured with a clear dataclass, proper error handling for missing files, and sensible defaults. A few issues to address:
Import dependency:
from session_reader import read_sessionwill fail unless session_reader.py is on sys.path. The script does not add its own directory to sys.path like other scripts in this repo do (e.g., harvester.py usessys.path.insert(0, str(SCRIPT_DIR))). This will cause an ImportError when run from outside the scripts/ directory.Outcome classification is fragile: Checking for the presence of words like "done" or "completed" in the last assistant message is prone to false positives (e.g., "I could not get this done"). Consider more robust heuristics or at minimum checking for negative modifiers.
Error message truncation inconsistency: Errors are filtered to
len(error_msg) < 200then truncated to[:200]— the filter makes the truncation redundant. Pick one approach.Token estimation by word count:
total_tokens += len(content.split())counts words, not tokens. The field is namedtotal_tokens_estimatewhich is fine, but word count significantly underestimates token count (typically 1.3x). Consider using a chars/4 heuristic for consistency with bootstrapper.py.The import path issue (#1) is a real bug that will break usage. Please fix that before merge.
Closing: unmergeable due to conflicts or branch protection. Reopen if needed.
Closing: unmergeable due to conflicts or branch protection. Reopen if needed.
Closing: unmergeable due to conflicts or branch protection. Reopen if needed.
Pull request closed