feat: measurer.py — compounding intelligence metrics engine #30

Rockachopa · 2026-04-14T18:18:03Z

Rockachopa commented

2026-04-14 18:18:03 +00:00

Closes #14

What

Metrics engine that computes 7 intelligence metrics from session transcripts and the knowledge store.

Metrics

Metric	Source	Direction
Knowledge velocity	index.json	Up = good
Knowledge coverage	YAML files	Up = good
Hit rate	Session messages	Up = good
Error recurrence	Session messages	Down = good
Task completion	Session end_reason	Up = good
First-try success	tool/msg ratio	Up = good
Knowledge age	last_confirmed dates	Up = good

Features

--format json|markdown output
--since YYYY-MM-DD time range filter
--repo NAME per-domain metrics
--save-snapshot persists for 7-day trend tracking
Runs in 33ms on 11.9K sessions / 192K messages

Acceptance Criteria (from issue)

All 7 metrics computable from session transcripts + knowledge store
Runs in <10 seconds (33ms actual)
Outputs JSON for dashboard + markdown for human reading
Tracks 7-day trend for each metric

Note

Also includes knowledge store schema (SCHEMA.md) and YAML files that were on this branch from other work.

Closes #14 ## What Metrics engine that computes 7 intelligence metrics from session transcripts and the knowledge store. ## Metrics | Metric | Source | Direction | |--------|--------|-----------| | Knowledge velocity | index.json | Up = good | | Knowledge coverage | YAML files | Up = good | | Hit rate | Session messages | Up = good | | Error recurrence | Session messages | Down = good | | Task completion | Session end_reason | Up = good | | First-try success | tool/msg ratio | Up = good | | Knowledge age | last_confirmed dates | Up = good | ## Features - `--format json|markdown` output - `--since YYYY-MM-DD` time range filter - `--repo NAME` per-domain metrics - `--save-snapshot` persists for 7-day trend tracking - Runs in 33ms on 11.9K sessions / 192K messages ## Acceptance Criteria (from issue) - [x] All 7 metrics computable from session transcripts + knowledge store - [x] Runs in <10 seconds (33ms actual) - [x] Outputs JSON for dashboard + markdown for human reading - [x] Tracks 7-day trend for each metric ## Note Also includes knowledge store schema (SCHEMA.md) and YAML files that were on this branch from other work.

Rockachopa added 1 commit 2026-04-14 18:18:04 +00:00

feat: measurer.py — compounding intelligence metrics engine bf003cd944

Implements issue #14: 7 metrics that prove knowledge compounding.

Metrics:
- Knowledge velocity: new facts/day (from index.json)
- Knowledge coverage: % domains with 10+ facts (from YAML files)
- Hit rate: % sessions referencing bootstrap knowledge
- Error recurrence: same errors across sessions (should decrease)
- Task completion: % sessions with successful end_reason
- First-try success: actions without backtracking (tool/msg ratio)
- Knowledge age: staleness of facts (freshness score)

Data sources:
- knowledge/index.json + YAML files for fact metrics
- ~/.hermes/state.db sessions + messages tables

Features:
- JSON and markdown output formats
- --since, --repo, --format flags
- 7-day trend tracking via snapshot persistence
- Runs in 33ms on 11.9K sessions / 192K messages
- Dashboard auto-generation with --save-snapshot

Closes #14

Timmy referenced this pull request

2026-04-14 19:14:18 +00:00

feat(#10): knowledge file format schema + example knowledge files #31

Timmy requested changes 2026-04-14 19:14:19 +00:00

Dismissed

Timmy left a comment

Changes requested.

This PR creates SCHEMA.md which conflicts with #31. #31 includes the same schema plus example knowledge files, making it the more complete PR. If #31 merges first, this PR's SCHEMA.md will conflict.
The measurer.py component needs to be reviewed separately — recommend splitting it into its own PR without the schema file, since #31 covers the schema better.
The knowledge files (pitfalls.yaml, patterns.yaml) overlap with #31's examples.

Recommend: Rebase after #31 merges, remove the SCHEMA.md and duplicate knowledge files, and keep only the measurer.py component.

Changes requested. 1. This PR creates SCHEMA.md which conflicts with #31. #31 includes the same schema plus example knowledge files, making it the more complete PR. If #31 merges first, this PR's SCHEMA.md will conflict. 2. The measurer.py component needs to be reviewed separately — recommend splitting it into its own PR without the schema file, since #31 covers the schema better. 3. The knowledge files (pitfalls.yaml, patterns.yaml) overlap with #31's examples. Recommend: Rebase after #31 merges, remove the SCHEMA.md and duplicate knowledge files, and keep only the measurer.py component.

Timmy reviewed 2026-04-14 22:12:52 +00:00

Timmy left a comment

Review: measurer.py — Compounding Intelligence Metrics Engine

Overall: Good metrics engine design with 7 well-chosen metrics. The SCHEMA.md included here is a superset of the one in PR #31 — these PRs have overlapping content that needs coordination.

Strengths:

7 metrics cover the key dimensions of intelligence compounding (velocity, coverage, hit rate, error rate, reuse rate, confidence drift, staleness)
Clean separation between metric computation and reporting
JSON output format is machine-consumable for dashboards

Issues:

Overlapping content with PR #31 — This PR includes its own version of SCHEMA.md and the same pitfalls.yaml. If PR #31 merges first, this PR will have merge conflicts. Coordinate merge order or rebase.
The diff is very large (1190 lines) — includes knowledge files that belong in the schema PR (#31), not the metrics PR. The measurer.py should be the only new file here.
Metrics rely on index.json structure — ensure the index.json schema is stable before merging the measurer. If the schema changes (e.g., field renames), the measurer breaks silently.

Recommendation: Rebase onto PR #31 after it merges, then the diff will be just measurer.py. This will be much cleaner to review and merge.

## Review: measurer.py — Compounding Intelligence Metrics Engine **Overall**: Good metrics engine design with 7 well-chosen metrics. The SCHEMA.md included here is a superset of the one in PR #31 — these PRs have overlapping content that needs coordination. **Strengths:** - 7 metrics cover the key dimensions of intelligence compounding (velocity, coverage, hit rate, error rate, reuse rate, confidence drift, staleness) - Clean separation between metric computation and reporting - JSON output format is machine-consumable for dashboards **Issues:** 1. **Overlapping content with PR #31** — This PR includes its own version of SCHEMA.md and the same pitfalls.yaml. If PR #31 merges first, this PR will have merge conflicts. Coordinate merge order or rebase. 2. **The diff is very large (1190 lines)** — includes knowledge files that belong in the schema PR (#31), not the metrics PR. The measurer.py should be the only new file here. 3. **Metrics rely on index.json structure** — ensure the index.json schema is stable before merging the measurer. If the schema changes (e.g., field renames), the measurer breaks silently. **Recommendation:** Rebase onto PR #31 after it merges, then the diff will be just measurer.py. This will be much cleaner to review and merge.

Timmy referenced this pull request

2026-04-14 22:13:09 +00:00

feat: Add session harvester with auto-harvest cron (#9) #29

Timmy requested changes 2026-04-15 00:19:02 +00:00

Timmy left a comment

This PR has significant overlap with PR #31 (both add SCHEMA.md, YAML knowledge files, index.json, validate_knowledge.py, and repo YAML files). Beyond the shared content, it adds measurer.py and metrics output. Reviewing the measurer-specific additions:

Metrics dashboard/snapshot: The generated metrics look reasonable, but the hit_rate metric shows 8058/11922 sessions which seems implausibly high for a system described as early-stage. This suggests the hit_rate calculation may be matching too broadly (e.g., any session containing words that happen to appear in the knowledge store).

Conflict concern: This PR will have massive merge conflicts with PR #31 since both modify the same files (SCHEMA.md, index.json, all YAML files). These should be coordinated — one should merge first and the other should rebase.

Missing measurer.py source: The diff includes the output files (dashboard.md, latest_snapshot.json) but the measurer.py script itself should be reviewed for correctness of the metric calculations. The metrics output format is clean and well-structured.

Requesting changes primarily due to the merge conflict issue with #31 — please coordinate the merge order and rebase whichever lands second.

This PR has significant overlap with PR #31 (both add SCHEMA.md, YAML knowledge files, index.json, validate_knowledge.py, and repo YAML files). Beyond the shared content, it adds measurer.py and metrics output. Reviewing the measurer-specific additions: **Metrics dashboard/snapshot**: The generated metrics look reasonable, but the hit_rate metric shows 8058/11922 sessions which seems implausibly high for a system described as early-stage. This suggests the hit_rate calculation may be matching too broadly (e.g., any session containing words that happen to appear in the knowledge store). **Conflict concern**: This PR will have massive merge conflicts with PR #31 since both modify the same files (SCHEMA.md, index.json, all YAML files). These should be coordinated — one should merge first and the other should rebase. **Missing measurer.py source**: The diff includes the output files (dashboard.md, latest_snapshot.json) but the measurer.py script itself should be reviewed for correctness of the metric calculations. The metrics output format is clean and well-structured. Requesting changes primarily due to the merge conflict issue with #31 — please coordinate the merge order and rebase whichever lands second.

Timmy referenced this pull request

2026-04-15 00:19:44 +00:00

fix(#19): Migrate MemPalace + fact_store into knowledge store #27

Rockachopa commented

2026-04-16 01:55:26 +00:00

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.

Rockachopa closed this pull request

2026-04-16 01:55:28 +00:00

Rockachopa commented

2026-04-16 01:55:38 +00:00

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.