feat: measurer.py — compounding intelligence metrics engine #30

Closed
Rockachopa wants to merge 1 commits from fix/14-measurer into main
Owner

Closes #14

What

Metrics engine that computes 7 intelligence metrics from session transcripts and the knowledge store.

Metrics

Metric Source Direction
Knowledge velocity index.json Up = good
Knowledge coverage YAML files Up = good
Hit rate Session messages Up = good
Error recurrence Session messages Down = good
Task completion Session end_reason Up = good
First-try success tool/msg ratio Up = good
Knowledge age last_confirmed dates Up = good

Features

  • --format json|markdown output
  • --since YYYY-MM-DD time range filter
  • --repo NAME per-domain metrics
  • --save-snapshot persists for 7-day trend tracking
  • Runs in 33ms on 11.9K sessions / 192K messages

Acceptance Criteria (from issue)

  • All 7 metrics computable from session transcripts + knowledge store
  • Runs in <10 seconds (33ms actual)
  • Outputs JSON for dashboard + markdown for human reading
  • Tracks 7-day trend for each metric

Note

Also includes knowledge store schema (SCHEMA.md) and YAML files that were on this branch from other work.

Closes #14 ## What Metrics engine that computes 7 intelligence metrics from session transcripts and the knowledge store. ## Metrics | Metric | Source | Direction | |--------|--------|-----------| | Knowledge velocity | index.json | Up = good | | Knowledge coverage | YAML files | Up = good | | Hit rate | Session messages | Up = good | | Error recurrence | Session messages | Down = good | | Task completion | Session end_reason | Up = good | | First-try success | tool/msg ratio | Up = good | | Knowledge age | last_confirmed dates | Up = good | ## Features - `--format json|markdown` output - `--since YYYY-MM-DD` time range filter - `--repo NAME` per-domain metrics - `--save-snapshot` persists for 7-day trend tracking - Runs in 33ms on 11.9K sessions / 192K messages ## Acceptance Criteria (from issue) - [x] All 7 metrics computable from session transcripts + knowledge store - [x] Runs in <10 seconds (33ms actual) - [x] Outputs JSON for dashboard + markdown for human reading - [x] Tracks 7-day trend for each metric ## Note Also includes knowledge store schema (SCHEMA.md) and YAML files that were on this branch from other work.
Rockachopa added 1 commit 2026-04-14 18:18:04 +00:00
Implements issue #14: 7 metrics that prove knowledge compounding.

Metrics:
- Knowledge velocity: new facts/day (from index.json)
- Knowledge coverage: % domains with 10+ facts (from YAML files)
- Hit rate: % sessions referencing bootstrap knowledge
- Error recurrence: same errors across sessions (should decrease)
- Task completion: % sessions with successful end_reason
- First-try success: actions without backtracking (tool/msg ratio)
- Knowledge age: staleness of facts (freshness score)

Data sources:
- knowledge/index.json + YAML files for fact metrics
- ~/.hermes/state.db sessions + messages tables

Features:
- JSON and markdown output formats
- --since, --repo, --format flags
- 7-day trend tracking via snapshot persistence
- Runs in 33ms on 11.9K sessions / 192K messages
- Dashboard auto-generation with --save-snapshot

Closes #14
Timmy requested changes 2026-04-14 19:14:19 +00:00
Dismissed
Timmy left a comment
Owner

Changes requested.

  1. This PR creates SCHEMA.md which conflicts with #31. #31 includes the same schema plus example knowledge files, making it the more complete PR. If #31 merges first, this PR's SCHEMA.md will conflict.

  2. The measurer.py component needs to be reviewed separately — recommend splitting it into its own PR without the schema file, since #31 covers the schema better.

  3. The knowledge files (pitfalls.yaml, patterns.yaml) overlap with #31's examples.

Recommend: Rebase after #31 merges, remove the SCHEMA.md and duplicate knowledge files, and keep only the measurer.py component.

Changes requested. 1. This PR creates SCHEMA.md which conflicts with #31. #31 includes the same schema plus example knowledge files, making it the more complete PR. If #31 merges first, this PR's SCHEMA.md will conflict. 2. The measurer.py component needs to be reviewed separately — recommend splitting it into its own PR without the schema file, since #31 covers the schema better. 3. The knowledge files (pitfalls.yaml, patterns.yaml) overlap with #31's examples. Recommend: Rebase after #31 merges, remove the SCHEMA.md and duplicate knowledge files, and keep only the measurer.py component.
Timmy reviewed 2026-04-14 22:12:52 +00:00
Timmy left a comment
Owner

Review: measurer.py — Compounding Intelligence Metrics Engine

Overall: Good metrics engine design with 7 well-chosen metrics. The SCHEMA.md included here is a superset of the one in PR #31 — these PRs have overlapping content that needs coordination.

Strengths:

  • 7 metrics cover the key dimensions of intelligence compounding (velocity, coverage, hit rate, error rate, reuse rate, confidence drift, staleness)
  • Clean separation between metric computation and reporting
  • JSON output format is machine-consumable for dashboards

Issues:

  1. Overlapping content with PR #31 — This PR includes its own version of SCHEMA.md and the same pitfalls.yaml. If PR #31 merges first, this PR will have merge conflicts. Coordinate merge order or rebase.
  2. The diff is very large (1190 lines) — includes knowledge files that belong in the schema PR (#31), not the metrics PR. The measurer.py should be the only new file here.
  3. Metrics rely on index.json structure — ensure the index.json schema is stable before merging the measurer. If the schema changes (e.g., field renames), the measurer breaks silently.

Recommendation: Rebase onto PR #31 after it merges, then the diff will be just measurer.py. This will be much cleaner to review and merge.

## Review: measurer.py — Compounding Intelligence Metrics Engine **Overall**: Good metrics engine design with 7 well-chosen metrics. The SCHEMA.md included here is a superset of the one in PR #31 — these PRs have overlapping content that needs coordination. **Strengths:** - 7 metrics cover the key dimensions of intelligence compounding (velocity, coverage, hit rate, error rate, reuse rate, confidence drift, staleness) - Clean separation between metric computation and reporting - JSON output format is machine-consumable for dashboards **Issues:** 1. **Overlapping content with PR #31** — This PR includes its own version of SCHEMA.md and the same pitfalls.yaml. If PR #31 merges first, this PR will have merge conflicts. Coordinate merge order or rebase. 2. **The diff is very large (1190 lines)** — includes knowledge files that belong in the schema PR (#31), not the metrics PR. The measurer.py should be the only new file here. 3. **Metrics rely on index.json structure** — ensure the index.json schema is stable before merging the measurer. If the schema changes (e.g., field renames), the measurer breaks silently. **Recommendation:** Rebase onto PR #31 after it merges, then the diff will be just measurer.py. This will be much cleaner to review and merge.
Timmy requested changes 2026-04-15 00:19:02 +00:00
Timmy left a comment
Owner

This PR has significant overlap with PR #31 (both add SCHEMA.md, YAML knowledge files, index.json, validate_knowledge.py, and repo YAML files). Beyond the shared content, it adds measurer.py and metrics output. Reviewing the measurer-specific additions:

Metrics dashboard/snapshot: The generated metrics look reasonable, but the hit_rate metric shows 8058/11922 sessions which seems implausibly high for a system described as early-stage. This suggests the hit_rate calculation may be matching too broadly (e.g., any session containing words that happen to appear in the knowledge store).

Conflict concern: This PR will have massive merge conflicts with PR #31 since both modify the same files (SCHEMA.md, index.json, all YAML files). These should be coordinated — one should merge first and the other should rebase.

Missing measurer.py source: The diff includes the output files (dashboard.md, latest_snapshot.json) but the measurer.py script itself should be reviewed for correctness of the metric calculations. The metrics output format is clean and well-structured.

Requesting changes primarily due to the merge conflict issue with #31 — please coordinate the merge order and rebase whichever lands second.

This PR has significant overlap with PR #31 (both add SCHEMA.md, YAML knowledge files, index.json, validate_knowledge.py, and repo YAML files). Beyond the shared content, it adds measurer.py and metrics output. Reviewing the measurer-specific additions: **Metrics dashboard/snapshot**: The generated metrics look reasonable, but the hit_rate metric shows 8058/11922 sessions which seems implausibly high for a system described as early-stage. This suggests the hit_rate calculation may be matching too broadly (e.g., any session containing words that happen to appear in the knowledge store). **Conflict concern**: This PR will have massive merge conflicts with PR #31 since both modify the same files (SCHEMA.md, index.json, all YAML files). These should be coordinated — one should merge first and the other should rebase. **Missing measurer.py source**: The diff includes the output files (dashboard.md, latest_snapshot.json) but the measurer.py script itself should be reviewed for correctness of the metric calculations. The metrics output format is clean and well-structured. Requesting changes primarily due to the merge conflict issue with #31 — please coordinate the merge order and rebase whichever lands second.
Author
Owner

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.
Rockachopa closed this pull request 2026-04-16 01:55:28 +00:00
Author
Owner

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.
Author
Owner

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.
Rockachopa reopened this pull request 2026-04-16 02:04:29 +00:00
Author
Owner

Superseded by merged PR #31. Closing as content is already in main.

Superseded by merged PR #31. Closing as content is already in main.
Rockachopa closed this pull request 2026-04-16 02:13:41 +00:00
Author
Owner

Closed — subset of PR #27 (all 9 files: measurer, validator, schema, configs included in #27). Merging #27 instead.

Closed — subset of PR #27 (all 9 files: measurer, validator, schema, configs included in #27). Merging #27 instead.

Pull request closed

Sign in to join this conversation.