Compare commits

..

7 Commits

16 changed files with 632 additions and 1135 deletions

View File

@@ -1,164 +0,0 @@
# Knowledge File Format Specification
**Version:** 1
**Issue:** #10
**Status:** Draft
---
## Overview
The knowledge system has two layers:
1. **index.json** — Machine-readable fact index. Fast lookups by ID, category, repo, tags.
2. **Knowledge files** (YAML) — Human-readable, editable facts organized by domain.
The harvester writes to both. The bootstrapper reads from index.json. Humans edit the YAML files directly.
---
## index.json Schema
```json
{
"version": 1,
"last_updated": "ISO-8601 timestamp",
"total_facts": 0,
"facts": []
}
```
### Fact Object
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `id` | string | yes | Unique identifier: `{domain}:{category}:{sequence}` |
| `fact` | string | yes | One-sentence description of the knowledge |
| `category` | enum | yes | One of: `fact`, `pitfall`, `pattern`, `tool-quirk`, `question` |
| `domain` | string | yes | Where this applies: repo name, `global`, or agent name |
| `confidence` | float | yes | 0.0-1.0. How certain is this knowledge? |
| `tags` | string[] | no | Searchable labels: `["git", "auth", "gitea"]` |
| `source_count` | int | no | How many sessions confirmed this fact |
| `first_seen` | date | no | ISO-8601 date first extracted |
| `last_confirmed` | date | no | ISO-8601 date last seen in a session |
| `expires` | date | no | Optional. After this date, fact is stale |
| `related` | string[] | no | IDs of related facts |
### ID Format
```
{domain}:{category}:{sequence}
```
- `domain` — repo name, `global`, or agent type
- `category` — one of the 5 categories
- `sequence` — zero-padded 3-digit number: `001`, `002`, ...
Examples:
- `the-nexus:pitfall:001`
- `global:tool-quirk:012`
- `hermes-agent:pattern:003`
### Categories
| Category | Definition | Example |
|----------|------------|---------|
| `fact` | Concrete, verifiable information | "Gitea API requires token auth at /api/v1" |
| `pitfall` | Errors, wrong assumptions, time-wasters | "Assumed env var GITEA_TOKEN; actual path is ~/.config/gitea/token" |
| `pattern` | Successful sequences of actions | "To deploy: test -> build -> push -> webhook" |
| `tool-quirk` | Environment-specific behaviors | "URL format requires trailing slash on macOS" |
| `question` | Identified but unanswered | "Need optimal batch size for harvesting" |
### Confidence Scoring
| Range | Meaning |
|-------|---------|
| 0.9-1.0 | Explicitly stated and verified |
| 0.7-0.8 | Clearly implied by multiple data points |
| 0.5-0.6 | Suggested but not fully verified |
| 0.3-0.4 | Inferred from limited data |
| 0.1-0.2 | Speculative or uncertain |
---
## Knowledge Files (YAML)
Human-readable files stored in `knowledge/` subdirectories.
### Directory Structure
```
knowledge/
├── index.json # Machine-readable fact index
├── SCHEMA.md # This file
├── global/ # Cross-repo knowledge
│ ├── pitfalls.yaml # Pitfalls that span multiple repos
│ ├── patterns.yaml # Proven workflows
│ └── tool-quirks.yaml # Environment behaviors
├── repos/ # Per-repo knowledge
│ ├── the-nexus.yaml
│ ├── hermes-agent.yaml
│ └── ...
└── agents/ # Agent-type knowledge
├── mimo-sprint.yaml
└── ...
```
### YAML File Format
```yaml
---
domain: global # or repo name or agent name
category: tool-quirk # fact, pitfall, pattern, tool-quirk, question
version: 1
last_updated: "2026-04-13"
---
# Tool Quirks (Global)
Cross-environment behaviors that bite you if you don't know them.
## Authentication
- id: global:tool-quirk:001
fact: "Gitea token stored at ~/.config/gitea/token, not env var"
confidence: 0.95
tags: [git, auth, gitea]
source_count: 23
first_seen: "2026-03-27"
last_confirmed: "2026-04-13"
related: [global:pitfall:003]
```
### Rules
1. **One file per domain per category.** `repos/the-nexus.yaml` holds all the-nexus facts.
2. **Markdown sections for humans.** YAML items live under markdown headers for Gitea UI readability.
3. **ID is the link.** The `id` field connects YAML facts to index.json entries.
4. **Harvester writes, humans edit.** Harvester appends. Humans correct confidence, add tags, mark expired.
---
## Sync Rules
1. **Harvester -> YAML:** Appends new facts to the appropriate YAML file.
2. **Harvester -> index.json:** Adds/updates fact entries.
3. **Human edits YAML:** Changes propagate to index.json on next harvester run.
4. **Confidence decay:** Facts not confirmed in 30+ sessions get confidence *= 0.9.
5. **Expiration:** Facts with `expires` date past current date are marked `stale`.
---
## Validation
Facts must pass these checks:
1. `id` matches format `{domain}:{category}:{sequence}`
2. `category` is one of the 5 allowed values
3. `confidence` is between 0.0 and 1.0
4. `fact` is non-empty string, max 280 characters
5. `domain` is non-empty string
6. `tags` are lowercase alphanumeric + hyphens
7. No duplicate IDs in index.json
Validation script: `scripts/validate_knowledge.py`

View File

@@ -1,80 +0,0 @@
---
domain: global
category: pitfall
version: 1
last_updated: "2026-04-13"
---
# Pitfalls (Global)
Cross-repo traps that waste time across the fleet.
## Git & Forge
- id: global:pitfall:001
fact: "Branch protection requires 1 approval on main - API merges fail with 405 without it"
confidence: 0.95
tags: [git, merge, branch-protection, gitea]
source_count: 12
first_seen: "2026-04-05"
last_confirmed: "2026-04-13"
related: [the-nexus:pitfall:001]
- id: global:pitfall:002
fact: "Never use --no-verify on git commits - it bypasses all hooks including safety checks"
confidence: 0.95
tags: [git, hooks, safety]
source_count: 5
first_seen: "2026-03-28"
last_confirmed: "2026-04-13"
- id: global:pitfall:003
fact: "Gitea PR creation workaround needed on the-nexus - direct API call fails, use alternative endpoint"
confidence: 0.9
tags: [gitea, pr, api, workaround]
source_count: 4
first_seen: "2026-04-06"
last_confirmed: "2026-04-12"
## Agent Operations
- id: global:pitfall:004
fact: "Anthropic is BANNED from fallback chain - if fallback triggers to Anthropic, something is wrong"
confidence: 0.95
tags: [provider, anthropic, fallback]
source_count: 7
first_seen: "2026-03-30"
last_confirmed: "2026-04-13"
- id: global:pitfall:005
fact: "Telegram tokens expired - don't assume Telegram notifications work without checking"
confidence: 0.85
tags: [telegram, notifications, token]
source_count: 3
first_seen: "2026-04-02"
- id: global:pitfall:006
fact: "Multiple gateways = 'cannot schedule futures' error - only one gateway process should run"
confidence: 0.9
tags: [gateway, cron, process]
source_count: 4
first_seen: "2026-04-04"
last_confirmed: "2026-04-11"
## Testing
- id: global:pitfall:007
fact: "pytest root collection picks up operational *_test.py scripts - restrict to tests/ directory"
confidence: 0.9
tags: [pytest, test, collection]
source_count: 3
first_seen: "2026-04-07"
last_confirmed: "2026-04-13"
- id: global:pitfall:008
fact: "TDD: test 1 before building 55 - verify the cycle works before scaling"
confidence: 0.95
tags: [tdd, testing, methodology]
source_count: 8
first_seen: "2026-03-25"
last_confirmed: "2026-04-13"

View File

@@ -1,73 +0,0 @@
---
domain: global
category: tool-quirk
version: 1
last_updated: "2026-04-13"
---
# Tool Quirks (Global)
Cross-environment behaviors that bite you if you don't know them.
## Authentication
- id: global:tool-quirk:001
fact: "Gitea token stored at ~/.config/gitea/token, not env var GITEA_TOKEN"
confidence: 0.95
tags: [git, auth, gitea, token]
source_count: 23
first_seen: "2026-03-27"
last_confirmed: "2026-04-13"
related: [global:pitfall:001]
- id: global:tool-quirk:002
fact: "Gitea API uses 'Authorization: token TOKEN' header format, not Bearer"
confidence: 0.9
tags: [git, api, gitea]
source_count: 8
first_seen: "2026-03-28"
last_confirmed: "2026-04-12"
- id: global:tool-quirk:003
fact: "Gitea Issues API type=issues param does NOT filter PRs - use truthiness check on pull_request field"
confidence: 0.95
tags: [gitea, api, issues, pr]
source_count: 6
first_seen: "2026-04-01"
last_confirmed: "2026-04-13"
## Paths & Environment
- id: global:tool-quirk:004
fact: "~/.hermes is the default hermes home - check get_hermes_home() not the path literal"
confidence: 0.9
tags: [paths, hermes, env]
source_count: 10
first_seen: "2026-03-30"
last_confirmed: "2026-04-13"
related: [hermes-agent:pitfall:005]
- id: global:tool-quirk:005
fact: "Ansible vault-encrypted vars in YAML require vault_inline_vars plugin - standard ansible-vault fails"
confidence: 0.85
tags: [ansible, vault, config]
source_count: 3
first_seen: "2026-04-02"
## Model & Inference
- id: global:tool-quirk:006
fact: "mimo-v2-pro via Nous Research is the default model - don't assume Anthropic is available"
confidence: 0.95
tags: [model, provider, nous, default]
source_count: 15
first_seen: "2026-03-25"
last_confirmed: "2026-04-13"
- id: global:tool-quirk:007
fact: "Kill + restart with 'hermes chat' preserves old model state - NEVER use --resume"
confidence: 0.95
tags: [hermes, model, restart, session]
source_count: 8
first_seen: "2026-03-29"
last_confirmed: "2026-04-12"

View File

@@ -1,82 +0,0 @@
---
domain: hermes-agent
category: pitfall
version: 1
last_updated: "2026-04-13"
---
# Pitfalls (hermes-agent)
Things that go wrong in this repo if you don't know the traps.
## Cron & Deployment
- id: hermes-agent:pitfall:001
fact: "deploy-crons.py leaves jobs in mixed model format - some have provider/model, some just model"
confidence: 0.95
tags: [cron, deploy, model, config]
source_count: 5
first_seen: "2026-04-08"
last_confirmed: "2026-04-13"
related: [hermes-agent:pitfall:002, hermes-agent:pitfall:003]
- id: hermes-agent:pitfall:002
fact: "deploy-crons.py --deploy doesn't set legacy skill field from skills list, breaking older jobs"
confidence: 0.9
tags: [cron, deploy, skills]
source_count: 3
first_seen: "2026-04-09"
last_confirmed: "2026-04-13"
related: [hermes-agent:pitfall:001]
- id: hermes-agent:pitfall:003
fact: "Cron jobs with blank fallback_model fields trigger spurious gateway warnings"
confidence: 0.9
tags: [cron, model, fallback]
source_count: 4
first_seen: "2026-04-07"
last_confirmed: "2026-04-12"
related: [hermes-agent:pitfall:001]
- id: hermes-agent:pitfall:004
fact: "model-watchdog.py checks first provider line, not model.provider - causes false drift alarms"
confidence: 0.9
tags: [watchdog, model, config]
source_count: 3
first_seen: "2026-04-08"
last_confirmed: "2026-04-13"
## Path & Environment
- id: hermes-agent:pitfall:005
fact: "10+ files read HERMES_HOME directly instead of get_hermes_home() - breaks on custom paths"
confidence: 0.85
tags: [paths, env, hermes-home]
source_count: 6
first_seen: "2026-04-06"
last_confirmed: "2026-04-12"
related: [global:pitfall:002]
- id: hermes-agent:pitfall:006
fact: "get_hermes_home() doesn't expand tilde when HERMES_HOME=~/... is set"
confidence: 0.8
tags: [paths, env, bug]
source_count: 2
first_seen: "2026-04-05"
## SSH & Dispatch
- id: hermes-agent:pitfall:007
fact: "vps-agent-dispatch reports OK while remote hermes binary path is broken"
confidence: 0.9
tags: [ssh, dispatch, vps]
source_count: 4
first_seen: "2026-04-07"
last_confirmed: "2026-04-11"
- id: hermes-agent:pitfall:008
fact: "nightwatch-health-monitor SSH check fails on cloud-model-only deployments"
confidence: 0.85
tags: [ssh, health, cloud]
source_count: 2
first_seen: "2026-04-10"

View File

@@ -1,65 +0,0 @@
---
domain: the-nexus
category: pitfall
version: 1
last_updated: "2026-04-13"
---
# Pitfalls (the-nexus)
Things that go wrong in this repo if you don't know the traps.
## Git & Merging
- id: the-nexus:pitfall:001
fact: "Merges fail with HTTP 405 due to branch protection - must use merge API with 1 approval"
confidence: 0.95
tags: [git, merge, branch-protection, gitea]
source_count: 12
first_seen: "2026-04-05"
last_confirmed: "2026-04-13"
related: [global:pitfall:001]
- id: the-nexus:pitfall:002
fact: "ThreadingHTTPServer required for multi-user bridge - standard HTTPServer blocks on concurrent requests"
confidence: 0.95
tags: [server, concurrency, bridge]
source_count: 5
first_seen: "2026-04-10"
last_confirmed: "2026-04-13"
- id: the-nexus:pitfall:003
fact: "ChatLog.log() crashes on message persistence when index.html has orphaned button tags"
confidence: 0.9
tags: [html, crash, chatlog]
source_count: 3
first_seen: "2026-04-12"
last_confirmed: "2026-04-13"
## Three.js & Performance
- id: the-nexus:pitfall:004
fact: "Three.js LOD not implemented - local hardware struggles with full scene without texture optimization"
confidence: 0.85
tags: [threejs, performance, lod]
source_count: 4
first_seen: "2026-04-09"
last_confirmed: "2026-04-13"
- id: the-nexus:pitfall:005
fact: "Duplicate content blocks appear in index.html when PR merges conflict silently"
confidence: 0.8
tags: [html, merge-conflict, duplicate]
source_count: 3
first_seen: "2026-04-11"
last_confirmed: "2026-04-13"
## Deployment
- id: the-nexus:pitfall:006
fact: "Unified HTTP + WebSocket server required for proper URL deployment - separate servers break CORS"
confidence: 0.9
tags: [deploy, websocket, http, cors]
source_count: 4
first_seen: "2026-04-10"
last_confirmed: "2026-04-13"

View File

@@ -1,61 +0,0 @@
# Compounding Intelligence Metrics
**Generated:** 2026-04-14T18:12:26.469085+00:00
## knowledge_velocity
New facts extracted per day. Higher = compounding loop working.
**Value:** 1.61 | **7d trend:** N/A --- (unknown)
- total_facts: 29
- period_days: 18
- new_facts: 29
## knowledge_coverage
Percentage of domains/repos with 10+ facts. Measures breadth.
**Value:** 0.333 | **7d trend:** N/A --- (unknown)
- covered_domains: 1
- total_domains: 3
## hit_rate
Percentage of sessions referencing bootstrapped knowledge.
**Value:** 0.676 | **7d trend:** N/A --- (unknown)
- hit_sessions: 8058
- total_sessions: 11922
## error_recurrence
Ratio of recurring errors. Lower = fleet learning from mistakes.
**Value:** 0.17 | **7d trend:** N/A --- (unknown)
- unique_errors: 53615
- recurring_errors: 9093
## task_completion
Percentage of sessions ending with successful completion.
**Value:** 0.452 | **7d trend:** N/A --- (unknown)
- normal_end_rate: 0.56
- completed: 5385
- total: 11922
## first_try_success
Percentage of sessions completed without backtracking.
**Value:** 0.818 | **7d trend:** N/A --- (unknown)
- avg_tool_msg_ratio: 0.392
- sampled: 5923
## knowledge_age
Freshness of knowledge store. 1.0 = all fresh, 0.0 = all stale.
**Value:** 0.973 | **7d trend:** N/A --- (unknown)
- avg_age_days: 2.4
- stale_facts: 0
- total_facts: 29

View File

@@ -1,127 +0,0 @@
{
"generated_at": "2026-04-14T18:12:26.469085+00:00",
"knowledge_velocity": {
"value": 1.61,
"total_facts": 29,
"period_days": 18,
"new_facts": 29
},
"knowledge_coverage": {
"value": 0.333,
"covered_domains": 1,
"total_domains": 3,
"domain_details": {
"global": 15,
"hermes-agent": 8,
"the-nexus": 6
}
},
"hit_rate": {
"value": 0.676,
"hit_sessions": 8058,
"total_sessions": 11922
},
"error_recurrence": {
"value": 0.17,
"unique_errors": 53615,
"recurring_errors": 9093,
"top_errors": [
{
"error": "s, report the error details.",
"sessions": 1185
},
{
"error": "\": \"traceback (most recent call last):\\n file \\\"/private/var/folders/9k/v07xkpp",
"sessions": 695
},
{
"error": "\", \"output\": \"\\n--- stderr ---\\ntraceback (most recent call last):\\n file \\\"/pr",
"sessions": 684
},
{
"error": "ures \u2192 file an issue with the traceback and tag [bug]",
"sessions": 320
},
{
"error": "s you encounter \u2192 file an issue with reproduction steps",
"sessions": 320
},
{
"error": "s, file a [bug] issue first.",
"sessions": 320
},
{
"error": ", fix the code.",
"sessions": 314
},
{
"error": "fix it before doing anything else.",
"sessions": 313
},
{
"error": "ures \u2192 add a review comment explaining what's wrong",
"sessions": 303
},
{
"error": "ures \u2014 they're your roadmap for guardrails",
"sessions": 303
}
]
},
"task_completion": {
"value": 0.452,
"normal_end_rate": 0.56,
"completed": 5385,
"total": 11922,
"breakdown": {
"cron_complete": 5354,
"unknown": 5248,
"compression": 1092,
"cli_close": 197,
"session_reset": 31
}
},
"first_try_success": {
"value": 0.818,
"avg_tool_msg_ratio": 0.392,
"sampled": 5923,
"interpretation": "Higher value = fewer backtracks = better first-try success"
},
"knowledge_age": {
"value": 0.973,
"avg_age_days": 2.4,
"stale_facts": 0,
"total_facts": 29,
"interpretation": "1.0 = all facts fresh. 0.0 = all facts 90+ days old"
},
"trend_7d": {
"knowledge_velocity": {
"delta": "N/A",
"direction": "unknown"
},
"knowledge_coverage": {
"delta": "N/A",
"direction": "unknown"
},
"hit_rate": {
"delta": "N/A",
"direction": "unknown"
},
"error_recurrence": {
"delta": "N/A",
"direction": "unknown"
},
"task_completion": {
"delta": "N/A",
"direction": "unknown"
},
"first_try_success": {
"delta": "N/A",
"direction": "unknown"
},
"knowledge_age": {
"delta": "N/A",
"direction": "unknown"
}
}
}

View File

@@ -1,403 +0,0 @@
#!/usr/bin/env python3
"""
Compounding Intelligence Metrics Engine.
Computes 7 metrics that prove whether the knowledge compounding loop is working:
1. Knowledge velocity -- new facts per day
2. Knowledge coverage -- % of domains with >10 facts
3. Hit rate -- % of sessions referencing bootstrap knowledge
4. Error recurrence -- same errors across sessions (should decrease)
5. Task completion -- % of sessions ending successfully
6. First-try success -- actions without backtracking
7. Knowledge age -- staleness of facts
Usage:
python3 measurer.py # All metrics, all time
python3 measurer.py --since 2026-04-01 # Time range
python3 measurer.py --repo the-nexus # Per-repo metrics
python3 measurer.py --format json # JSON output (default)
python3 measurer.py --format markdown # Human-readable
python3 measurer.py --knowledge-dir ./knowledge # Custom knowledge path
python3 measurer.py --db ~/.hermes/state.db # Custom DB path
Data sources:
- knowledge/index.json -- fact index
- knowledge/ -- YAML fact files for coverage
- ~/.hermes/state.db -- session/message metadata
"""
import argparse
import json
import os
import re
import sqlite3
import sys
from collections import Counter, defaultdict
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any
# --- Defaults ---
DEFAULT_KNOWLEDGE_DIR = Path(__file__).parent.parent / "knowledge"
DEFAULT_DB_PATH = Path.home() / ".hermes" / "state.db"
SEVEN_DAYS = timedelta(days=7)
# --- Knowledge Store ---
def load_facts(knowledge_dir):
index_path = knowledge_dir / "index.json"
if not index_path.exists():
return []
with open(index_path) as f:
data = json.load(f)
return data.get("facts", [])
def count_yaml_facts(knowledge_dir):
domain_counts = {}
for subdir in ["repos", "global", "agents"]:
dirpath = knowledge_dir / subdir
if not dirpath.exists():
continue
for yaml_file in dirpath.glob("*.yaml"):
count = 0
try:
content = yaml_file.read_text()
count = len(re.findall(r"^\s*-\s*id:", content, re.MULTILINE))
except Exception:
pass
domain = yaml_file.stem
domain_counts[domain] = domain_counts.get(domain, 0) + count
return domain_counts
# --- Session Database ---
def open_db(db_path):
if not db_path.exists():
print(f"WARNING: Database not found at {db_path}", file=sys.stderr)
return None
conn = sqlite3.connect(str(db_path))
conn.row_factory = sqlite3.Row
return conn
def query_sessions(conn, since=None, repo=None):
if conn is None:
return []
query = "SELECT id, started_at, ended_at, end_reason, message_count, tool_call_count, model FROM sessions WHERE 1=1"
params = []
if since:
since_ts = datetime.fromisoformat(since).replace(tzinfo=timezone.utc).timestamp()
query += " AND started_at >= ?"
params.append(since_ts)
query += " ORDER BY started_at ASC"
cur = conn.execute(query, params)
return [dict(row) for row in cur.fetchall()]
def query_messages(conn, session_ids=None, since_ts=None):
if conn is None:
return []
query = "SELECT m.session_id, m.role, m.content, m.tool_name, m.timestamp FROM messages m WHERE 1=1"
params = []
if since_ts:
query += " AND m.timestamp >= ?"
params.append(since_ts)
if session_ids:
placeholders = ",".join("?" for _ in session_ids)
query += f" AND m.session_id IN ({placeholders})"
params.extend(session_ids)
cur = conn.execute(query, params)
return [dict(row) for row in cur.fetchall()]
# --- Metric Computations ---
def compute_knowledge_velocity(facts, since=None):
if not facts:
return {"value": 0.0, "total_facts": 0, "period_days": 0, "new_facts": 0}
dates = []
for f in facts:
d = f.get("first_seen") or f.get("created")
if d:
try:
dt = datetime.fromisoformat(d.replace("Z", "+00:00"))
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
dates.append(dt)
except (ValueError, AttributeError):
pass
if not dates:
return {"value": 0.0, "total_facts": len(facts), "period_days": 0, "new_facts": 0}
if since:
cutoff = datetime.fromisoformat(since).replace(tzinfo=timezone.utc)
dates = [d for d in dates if d >= cutoff]
if not dates:
return {"value": 0.0, "total_facts": len(facts), "period_days": 0, "new_facts": 0}
earliest = min(dates)
latest = max(dates)
period_days = max((latest - earliest).days, 1)
return {"value": round(len(dates) / period_days, 2), "total_facts": len(facts), "period_days": period_days, "new_facts": len(dates)}
def compute_knowledge_coverage(facts, yaml_counts):
domain_fact_counts = defaultdict(int)
for f in facts:
domain = f.get("domain", "unknown")
domain_fact_counts[domain] += 1
for domain, count in yaml_counts.items():
domain_fact_counts[domain] = max(domain_fact_counts[domain], count)
total_domains = len(domain_fact_counts)
if total_domains == 0:
return {"value": 0.0, "covered_domains": 0, "total_domains": 0, "domain_details": {}}
covered = sum(1 for c in domain_fact_counts.values() if c >= 10)
return {"value": round(covered / total_domains, 3), "covered_domains": covered, "total_domains": total_domains, "domain_details": dict(sorted(domain_fact_counts.items(), key=lambda x: -x[1])[:20])}
def compute_hit_rate(sessions, messages, facts):
if not sessions or not facts:
return {"value": 0.0, "hit_sessions": 0, "total_sessions": len(sessions)}
fact_fragments = set()
for f in facts:
text = f.get("fact", "").lower().strip()
if len(text) > 10:
fact_fragments.add(text)
words = re.findall(r'\w{4,}', text)
for w in words:
fact_fragments.add(w)
if not fact_fragments:
return {"value": 0.0, "hit_sessions": 0, "total_sessions": len(sessions)}
session_messages = defaultdict(list)
for m in messages:
content = (m.get("content") or "").lower()
if content:
session_messages[m["session_id"]].append(content)
hit_sessions = 0
for session in sessions:
sid = session["id"]
all_content = " ".join(session_messages.get(sid, []))
if any(frag in all_content for frag in fact_fragments):
hit_sessions += 1
return {"value": round(hit_sessions / len(sessions), 3) if sessions else 0.0, "hit_sessions": hit_sessions, "total_sessions": len(sessions)}
def compute_error_recurrence(messages):
if not messages:
return {"value": 0.0, "unique_errors": 0, "recurring_errors": 0, "top_errors": []}
error_pattern = re.compile(r'(?:error|Error|ERROR|failed|FAIL|exception|Exception)[:\s]*(.{10,80})', re.IGNORECASE)
error_to_sessions = defaultdict(set)
for m in messages:
content = m.get("content") or ""
if not content:
continue
for match in error_pattern.finditer(content):
sig = match.group(1).strip().lower()
sig = re.sub(r'\s+', ' ', sig)
if len(sig) > 5:
error_to_sessions[sig].add(m["session_id"])
if not error_to_sessions:
return {"value": 0.0, "unique_errors": 0, "recurring_errors": 0, "top_errors": []}
recurring = {e: s for e, s in error_to_sessions.items() if len(s) > 1}
total_errors = len(error_to_sessions)
recurring_count = len(recurring)
top = sorted(recurring.items(), key=lambda x: -len(x[1]))[:10]
return {"value": round(recurring_count / total_errors, 3) if total_errors else 0.0, "unique_errors": total_errors, "recurring_errors": recurring_count, "top_errors": [{"error": e, "sessions": len(s)} for e, s in top]}
def compute_task_completion(sessions):
if not sessions:
return {"value": 0.0, "completed": 0, "total": 0, "breakdown": {}}
breakdown = Counter()
for s in sessions:
reason = s.get("end_reason") or "unknown"
breakdown[reason] += 1
completed = breakdown.get("cron_complete", 0) + breakdown.get("session_reset", 0)
normal_endings = completed + breakdown.get("cli_close", 0) + breakdown.get("compression", 0)
return {"value": round(completed / len(sessions), 3) if sessions else 0.0, "normal_end_rate": round(normal_endings / len(sessions), 3) if sessions else 0.0, "completed": completed, "total": len(sessions), "breakdown": dict(breakdown.most_common())}
def compute_first_try_success(sessions):
if not sessions:
return {"value": 0.0, "avg_tool_msg_ratio": 0.0, "sampled": 0}
ratios = []
for s in sessions:
msgs = s.get("message_count", 0) or 0
tools = s.get("tool_call_count", 0) or 0
if msgs > 2:
ratios.append(tools / msgs if msgs > 0 else 0)
if not ratios:
return {"value": 0.0, "avg_tool_msg_ratio": 0.0, "sampled": 0}
avg_ratio = sum(ratios) / len(ratios)
first_try = sum(1 for r in ratios if r < 0.5)
return {"value": round(first_try / len(ratios), 3), "avg_tool_msg_ratio": round(avg_ratio, 3), "sampled": len(ratios), "interpretation": "Higher value = fewer backtracks = better first-try success"}
def compute_knowledge_age(facts):
if not facts:
return {"value": 0.0, "avg_age_days": 0, "stale_facts": 0, "total_facts": 0}
now = datetime.now(timezone.utc)
ages = []
stale_count = 0
for f in facts:
confirmed = f.get("last_confirmed") or f.get("first_seen")
if confirmed:
try:
dt = datetime.fromisoformat(confirmed.replace("Z", "+00:00"))
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
age = (now - dt).days
ages.append(age)
if age > 30:
stale_count += 1
except (ValueError, AttributeError):
pass
if not ages:
return {"value": 0.0, "avg_age_days": 0, "stale_facts": 0, "total_facts": len(facts)}
avg_age = sum(ages) / len(ages)
freshness = max(0.0, 1.0 - (avg_age / 90))
return {"value": round(freshness, 3), "avg_age_days": round(avg_age, 1), "stale_facts": stale_count, "total_facts": len(facts), "interpretation": "1.0 = all facts fresh. 0.0 = all facts 90+ days old"}
# --- Trend Computation ---
def compute_trend(current, previous, metric_key="value"):
if not previous:
return {"delta": "N/A", "direction": "unknown"}
curr_val = current.get(metric_key, 0)
prev_val = previous.get(metric_key, 0)
if prev_val == 0:
return {"delta": "N/A (no baseline)", "direction": "unknown"}
pct = ((curr_val - prev_val) / abs(prev_val)) * 100
direction = "up" if pct > 0 else "down" if pct < 0 else "flat"
if metric_key == "error_recurrence" or metric_key == "knowledge_age":
direction_label = "good" if pct < 0 else "bad" if pct > 0 else "neutral"
else:
direction_label = "good" if pct > 0 else "bad" if pct < 0 else "neutral"
return {"delta": f"{'+' if pct > 0 else ''}{pct:.1f}%", "direction": direction, "assessment": direction_label}
# --- Output Formatters ---
def format_json(metrics):
return json.dumps(metrics, indent=2)
def format_markdown(metrics):
lines = ["# Compounding Intelligence Metrics", f"**Generated:** {metrics.get('generated_at', 'unknown')}", ""]
trend = metrics.get("trend_7d", {})
def metric_block(name, data, desc, good_direction="up"):
val = data.get("value", 0)
t = trend.get(name, {})
delta = t.get("delta", "N/A")
assessment = t.get("assessment", "unknown")
arrow = "up" if assessment == "good" else "down" if assessment == "bad" else "---"
lines.extend([f"## {name}", desc, "", f"**Value:** {val} | **7d trend:** {delta} {arrow} ({assessment})", ""])
for k, v in data.items():
if k != "value" and k != "interpretation":
if isinstance(v, (int, float, str)):
lines.append(f"- {k}: {v}")
lines.append("")
metric_block("knowledge_velocity", metrics.get("knowledge_velocity", {}), "New facts extracted per day. Higher = compounding loop working.")
metric_block("knowledge_coverage", metrics.get("knowledge_coverage", {}), "Percentage of domains/repos with 10+ facts. Measures breadth.")
metric_block("hit_rate", metrics.get("hit_rate", {}), "Percentage of sessions referencing bootstrapped knowledge.")
metric_block("error_recurrence", metrics.get("error_recurrence", {}), "Ratio of recurring errors. Lower = fleet learning from mistakes.", good_direction="down")
metric_block("task_completion", metrics.get("task_completion", {}), "Percentage of sessions ending with successful completion.")
metric_block("first_try_success", metrics.get("first_try_success", {}), "Percentage of sessions completed without backtracking.")
metric_block("knowledge_age", metrics.get("knowledge_age", {}), "Freshness of knowledge store. 1.0 = all fresh, 0.0 = all stale.", good_direction="up")
return "\n".join(lines)
# --- Snapshot Persistence ---
def load_snapshot(metrics_dir):
snapshot_path = metrics_dir / "latest_snapshot.json"
if snapshot_path.exists():
with open(snapshot_path) as f:
return json.load(f)
return {}
def save_snapshot(metrics_dir, metrics):
metrics_dir.mkdir(parents=True, exist_ok=True)
snapshot_path = metrics_dir / "latest_snapshot.json"
with open(snapshot_path, "w") as f:
json.dump(metrics, f, indent=2)
# --- Main ---
def main():
parser = argparse.ArgumentParser(description="Compounding Intelligence Metrics")
parser.add_argument("--since", help="Start date (YYYY-MM-DD)")
parser.add_argument("--repo", help="Filter by repo/domain")
parser.add_argument("--format", choices=["json", "markdown"], default="json")
parser.add_argument("--knowledge-dir", type=Path, default=DEFAULT_KNOWLEDGE_DIR)
parser.add_argument("--db", type=Path, default=DEFAULT_DB_PATH)
parser.add_argument("--save-snapshot", action="store_true", help="Save current metrics as snapshot for trend tracking")
parser.add_argument("--metrics-dir", type=Path, default=Path(__file__).parent.parent / "metrics", help="Directory for snapshots and dashboard")
args = parser.parse_args()
facts = load_facts(args.knowledge_dir)
yaml_counts = count_yaml_facts(args.knowledge_dir)
if args.repo:
facts = [f for f in facts if f.get("domain") == args.repo]
conn = open_db(args.db)
sessions = query_sessions(conn, since=args.since)
messages = query_messages(conn) if conn else []
if conn:
conn.close()
velocity = compute_knowledge_velocity(facts, since=args.since)
coverage = compute_knowledge_coverage(facts, yaml_counts)
hit_rate = compute_hit_rate(sessions, messages, facts)
error_recurrence = compute_error_recurrence(messages)
task_completion = compute_task_completion(sessions)
first_try = compute_first_try_success(sessions)
age = compute_knowledge_age(facts)
previous = load_snapshot(args.metrics_dir)
trend = {
"knowledge_velocity": compute_trend(velocity, previous.get("knowledge_velocity", {})),
"knowledge_coverage": compute_trend(coverage, previous.get("knowledge_coverage", {})),
"hit_rate": compute_trend(hit_rate, previous.get("hit_rate", {})),
"error_recurrence": compute_trend(error_recurrence, previous.get("error_recurrence", {}), "value"),
"task_completion": compute_trend(task_completion, previous.get("task_completion", {})),
"first_try_success": compute_trend(first_try, previous.get("first_try_success", {})),
"knowledge_age": compute_trend(age, previous.get("knowledge_age", {})),
}
now = datetime.now(timezone.utc).isoformat()
metrics = {
"generated_at": now,
"knowledge_velocity": velocity,
"knowledge_coverage": coverage,
"hit_rate": hit_rate,
"error_recurrence": error_recurrence,
"task_completion": task_completion,
"first_try_success": first_try,
"knowledge_age": age,
"trend_7d": trend,
}
if args.since:
metrics["since"] = args.since
if args.save_snapshot:
save_snapshot(args.metrics_dir, metrics)
dashboard_path = args.metrics_dir / "dashboard.md"
with open(dashboard_path, "w") as f:
f.write(format_markdown(metrics))
if args.format == "json":
print(format_json(metrics))
else:
print(format_markdown(metrics))
if __name__ == "__main__":
main()

353
scripts/sampler.py Normal file
View File

@@ -0,0 +1,353 @@
#!/usr/bin/env python3
"""
sampler.py — Score and rank sessions by harvest value.
With 20k+ sessions on disk, we can't harvest all at once. This script
scores each session by how likely it is to contain valuable knowledge,
so the harvester processes the best ones first.
Scoring strategy:
- Recency: last 7d=3pts, last 30d=2pts, older=1pt
- Length: >50 messages=3pts, >20=2pts, <20=1pt
- Repo uniqueness: first session for a repo=5pts, otherwise=1pt
- Outcome: failure=3pts (most to learn), success=2pts, unknown=1pt
- Tool calls: >10 tool invocations=2pts (complex sessions)
Usage:
python3 sampler.py --count 100 # Top 100 sessions
python3 sampler.py --repo the-nexus --count 20 # Top 20 for a repo
python3 sampler.py --since 2026-04-01 # All sessions since date
python3 sampler.py --count 50 --min-score 8 # Only high-value sessions
python3 sampler.py --count 100 --output sample.json # Save to file
"""
import argparse
import json
import os
import sys
import time
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Optional
# --- Fast session scanning (no full parse) ---
def scan_session_fast(path: str) -> dict:
"""Extract scoring metadata from a session without parsing the full JSONL.
Reads only: first line, last ~20 lines, and line count. This processes
20k sessions in seconds instead of minutes.
"""
meta = {
'path': path,
'message_count': 0,
'has_tool_calls': False,
'tool_call_count': 0,
'first_timestamp': '',
'last_timestamp': '',
'is_failure': False,
'repos_mentioned': set(),
'first_role': '',
'last_content_preview': '',
}
try:
file_size = os.path.getsize(path)
if file_size == 0:
return meta
with open(path, 'r', encoding='utf-8', errors='replace') as f:
# Read first line for timestamp + role
first_line = f.readline().strip()
if first_line:
try:
first_msg = json.loads(first_line)
meta['first_timestamp'] = first_msg.get('timestamp', '')
meta['first_role'] = first_msg.get('role', '')
except json.JSONDecodeError:
pass
# Fast line count + collect tail lines
# For the tail, seek to near end of file
tail_lines = []
line_count = 1 # already read first
if file_size > 8192:
# Seek to last 8KB for tail sampling
f.seek(max(0, file_size - 8192))
f.readline() # skip partial line
for line in f:
line = line.strip()
if line:
tail_lines.append(line)
line_count += 1
# We lost the exact count for big files — estimate from file size
# Average JSONL line is ~500 bytes
if line_count < 100:
line_count = max(line_count, file_size // 500)
else:
# Small file — read all
for line in f:
line = line.strip()
if line:
tail_lines.append(line)
line_count += 1
meta['message_count'] = line_count
# Parse tail lines for outcome, tool calls, repos
for line in tail_lines[-30:]: # last 30 non-empty lines
try:
msg = json.loads(line)
# Track last timestamp
ts = msg.get('timestamp', '')
if ts:
meta['last_timestamp'] = ts
# Count tool calls
if msg.get('tool_calls'):
meta['has_tool_calls'] = True
meta['tool_call_count'] += len(msg['tool_calls'])
# Detect failure signals in content
content = ''
if isinstance(msg.get('content'), str):
content = msg['content'].lower()
elif isinstance(msg.get('content'), list):
for part in msg['content']:
if isinstance(part, dict) and part.get('type') == 'text':
content += part.get('text', '').lower()
if content:
meta['last_content_preview'] = content[:200]
failure_signals = ['error', 'failed', 'cannot', 'unable',
'exception', 'traceback', 'rejected', 'denied']
if any(sig in content for sig in failure_signals):
meta['is_failure'] = True
# Extract repo references from tool call arguments
if msg.get('tool_calls'):
for tc in msg['tool_calls']:
args = tc.get('function', {}).get('arguments', '')
if isinstance(args, str):
# Look for repo patterns
for pattern in ['Timmy_Foundation/', 'Rockachopa/', 'compounding-intelligence', 'the-nexus', 'timmy-home', 'hermes-agent', 'the-beacon', 'the-door']:
if pattern in args:
repo = pattern.rstrip('/')
meta['repos_mentioned'].add(repo)
except json.JSONDecodeError:
continue
except (IOError, OSError):
pass
meta['repos_mentioned'] = list(meta['repos_mentioned'])
return meta
# --- Filename timestamp parsing ---
def parse_session_timestamp(filename: str) -> Optional[datetime]:
"""Parse timestamp from session filename.
Common formats:
session_20260413_123456_hash.jsonl
20260413_123456_hash.jsonl
"""
stem = Path(filename).stem
parts = stem.split('_')
# Try session_YYYYMMDD_HHMMSS format
for i, part in enumerate(parts):
if len(part) == 8 and part.isdigit():
date_part = part
time_part = parts[i + 1] if i + 1 < len(parts) and len(parts[i + 1]) == 6 else '000000'
try:
return datetime.strptime(f"{date_part}_{time_part}", '%Y%m%d_%H%M%S').replace(tzinfo=timezone.utc)
except ValueError:
continue
# Fallback: use file modification time
return None
# --- Scoring ---
def score_session(meta: dict, now: datetime, seen_repos: set) -> tuple[int, dict]:
"""Score a session for harvest value. Returns (score, breakdown)."""
score = 0
breakdown = {}
# 1. Recency
ts = parse_session_timestamp(os.path.basename(meta['path']))
if ts is None:
# Fallback to mtime
try:
ts = datetime.fromtimestamp(os.path.getmtime(meta['path']), tz=timezone.utc)
except OSError:
ts = now - timedelta(days=365)
age_days = (now - ts).days
if age_days <= 7:
recency = 3
elif age_days <= 30:
recency = 2
else:
recency = 1
score += recency
breakdown['recency'] = recency
# 2. Length
count = meta['message_count']
if count > 50:
length = 3
elif count > 20:
length = 2
else:
length = 1
score += length
breakdown['length'] = length
# 3. Repo uniqueness (first session mentioning a repo gets bonus)
repo_score = 0
for repo in meta.get('repos_mentioned', []):
if repo not in seen_repos:
seen_repos.add(repo)
repo_score = max(repo_score, 5)
else:
repo_score = max(repo_score, 1)
score += repo_score
breakdown['repo_unique'] = repo_score
# 4. Outcome
if meta.get('is_failure'):
outcome = 3
elif meta.get('last_content_preview', '').strip():
outcome = 2 # has some content = likely completed
else:
outcome = 1
score += outcome
breakdown['outcome'] = outcome
# 5. Tool calls
if meta.get('tool_call_count', 0) > 10:
tool = 2
else:
tool = 0
score += tool
breakdown['tool_calls'] = tool
return score, breakdown
# --- Main ---
def main():
parser = argparse.ArgumentParser(description="Score and rank sessions for harvesting")
parser.add_argument('--sessions-dir', default=os.path.expanduser('~/.hermes/sessions'),
help='Directory containing session files')
parser.add_argument('--count', type=int, default=100, help='Number of top sessions to return')
parser.add_argument('--repo', default='', help='Filter to sessions mentioning this repo')
parser.add_argument('--since', default='', help='Only score sessions after this date (YYYY-MM-DD)')
parser.add_argument('--min-score', type=int, default=0, help='Minimum score threshold')
parser.add_argument('--output', default='', help='Output file (JSON). Default: stdout')
parser.add_argument('--format', choices=['json', 'paths', 'table'], default='table',
help='Output format: json (full), paths (one per line), table (human)')
parser.add_argument('--top-percent', type=float, default=0, help='Return top N%% instead of --count')
args = parser.parse_args()
sessions_dir = Path(args.sessions_dir)
if not sessions_dir.is_dir():
print(f"ERROR: Sessions directory not found: {sessions_dir}", file=sys.stderr)
sys.exit(1)
# Find all JSONL files
print(f"Scanning {sessions_dir}...", file=sys.stderr)
t0 = time.time()
session_files = list(sessions_dir.glob('*.jsonl'))
total = len(session_files)
print(f"Found {total} session files", file=sys.stderr)
# Parse since date
since_dt = None
if args.since:
since_dt = datetime.strptime(args.since, '%Y-%m-%d').replace(tzinfo=timezone.utc)
# Score all sessions
now = datetime.now(timezone.utc)
seen_repos = set() # Track repos for uniqueness scoring
scored = []
for i, sf in enumerate(session_files):
# Date filter (fast path: check filename first)
if since_dt:
ts = parse_session_timestamp(sf.name)
if ts and ts < since_dt:
continue
meta = scan_session_fast(str(sf))
# Repo filter
if args.repo:
repos = meta.get('repos_mentioned', [])
if args.repo.lower() not in [r.lower() for r in repos]:
# Also check filename
if args.repo.lower() not in sf.name.lower():
continue
score, breakdown = score_session(meta, now, seen_repos)
if score >= args.min_score:
scored.append({
'path': str(sf),
'filename': sf.name,
'score': score,
'breakdown': breakdown,
'message_count': meta['message_count'],
'repos': meta['repos_mentioned'],
'is_failure': meta['is_failure'],
})
if (i + 1) % 5000 == 0:
elapsed = time.time() - t0
print(f" Scanned {i + 1}/{total} ({elapsed:.1f}s)", file=sys.stderr)
elapsed = time.time() - t0
print(f"Scored {len(scored)} sessions in {elapsed:.1f}s", file=sys.stderr)
# Sort by score descending
scored.sort(key=lambda x: x['score'], reverse=True)
# Apply count or percent
if args.top_percent > 0:
count = max(1, int(len(scored) * args.top_percent / 100))
else:
count = args.count
scored = scored[:count]
# Output
if args.output:
with open(args.output, 'w', encoding='utf-8') as f:
json.dump(scored, f, indent=2)
print(f"Wrote {len(scored)} sessions to {args.output}", file=sys.stderr)
elif args.format == 'json':
json.dump(scored, sys.stdout, indent=2)
elif args.format == 'paths':
for s in scored:
print(s['path'])
else: # table
print(f"{'SCORE':>5} {'MSGS':>5} {'REPOS':<25} {'FILE'}")
print(f"{'-'*5} {'-'*5} {'-'*25} {'-'*40}")
for s in scored:
repos = ', '.join(s['repos'][:2]) if s['repos'] else '-'
fail = ' FAIL' if s['is_failure'] else ''
print(f"{s['score']:>5} {s['message_count']:>5} {repos:<25} {s['filename'][:40]}{fail}")
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,212 @@
#!/usr/bin/env python3
"""
Comprehensive test script for knowledge extraction prompt.
Validates prompt structure, requirements, and consistency.
"""
import json
import re
from pathlib import Path
def test_prompt_structure():
"""Test that the prompt has the required structure."""
prompt_path = Path("templates/harvest-prompt.md")
if not prompt_path.exists():
return False, "harvest-prompt.md not found"
content = prompt_path.read_text()
# Check for required sections
required_sections = [
"System Prompt",
"Instructions",
"Categories",
"Output Format",
"Confidence Scoring",
"Constraints",
"Example"
]
for section in required_sections:
if section.lower() not in content.lower():
return False, f"Missing required section: {section}"
# Check for required categories
required_categories = ["fact", "pitfall", "pattern", "tool-quirk", "question"]
for category in required_categories:
if category not in content:
return False, f"Missing required category: {category}"
# Check for required output fields
required_fields = ["fact", "category", "repo", "confidence"]
for field in required_fields:
if field not in content:
return False, f"Missing required output field: {field}"
# Check prompt size (should be ~1k tokens, roughly 4k chars)
if len(content) > 5000:
return False, f"Prompt too large: {len(content)} chars (max ~5000)"
if len(content) < 1000:
return False, f"Prompt too small: {len(content)} chars (min ~1000)"
return True, "Prompt structure is valid"
def test_confidence_scoring():
"""Test that confidence scoring is properly defined."""
prompt_path = Path("templates/harvest-prompt.md")
content = prompt_path.read_text()
# Check for confidence scale definitions
confidence_levels = [
("0.9-1.0", "explicitly stated"),
("0.7-0.8", "clearly implied"),
("0.5-0.6", "suggested"),
("0.3-0.4", "inferred"),
("0.1-0.2", "speculative")
]
for level, description in confidence_levels:
if level not in content:
return False, f"Missing confidence level: {level}"
if description.lower() not in content.lower():
return False, f"Missing confidence description: {description}"
return True, "Confidence scoring is properly defined"
def test_example_quality():
"""Test that examples are clear and complete."""
prompt_path = Path("templates/harvest-prompt.md")
content = prompt_path.read_text()
# Check for example input/output
if "example" not in content.lower():
return False, "No examples provided"
# Check that example includes all categories
example_section = content[content.lower().find("example"):]
# Look for JSON example
json_match = re.search(r'\{[\s\S]*"knowledge"[\s\S]*\}', example_section)
if not json_match:
return False, "No JSON example found"
example_json = json_match.group(0)
# Check for all categories in example
for category in ["fact", "pitfall", "pattern", "tool-quirk", "question"]:
if category not in example_json:
return False, f"Example missing category: {category}"
return True, "Examples are clear and complete"
def test_constraint_coverage():
"""Test that constraints cover all requirements."""
prompt_path = Path("templates/harvest-prompt.md")
content = prompt_path.read_text()
required_constraints = [
"No hallucination",
"only extract",
"explicitly",
"partial",
"failed sessions",
"1k tokens"
]
for constraint in required_constraints:
if constraint.lower() not in content.lower():
return False, f"Missing constraint: {constraint}"
return True, "Constraints cover all requirements"
def test_test_sessions():
"""Test that test sessions exist and are valid."""
test_sessions_dir = Path("test_sessions")
if not test_sessions_dir.exists():
return False, "test_sessions directory not found"
session_files = list(test_sessions_dir.glob("*.jsonl"))
if len(session_files) < 5:
return False, f"Only {len(session_files)} test sessions found, need 5"
# Check each session file
for session_file in session_files:
content = session_file.read_text()
lines = content.strip().split("\n")
# Check that each line is valid JSON
for i, line in enumerate(lines, 1):
try:
json.loads(line)
except json.JSONDecodeError as e:
return False, f"Invalid JSON in {session_file.name}, line {i}: {e}"
return True, f"Found {len(session_files)} valid test sessions"
def run_all_tests():
"""Run all tests and return results."""
tests = [
("Prompt Structure", test_prompt_structure),
("Confidence Scoring", test_confidence_scoring),
("Example Quality", test_example_quality),
("Constraint Coverage", test_constraint_coverage),
("Test Sessions", test_test_sessions)
]
results = []
all_passed = True
for test_name, test_func in tests:
try:
passed, message = test_func()
results.append({
"test": test_name,
"passed": passed,
"message": message
})
if not passed:
all_passed = False
except Exception as e:
results.append({
"test": test_name,
"passed": False,
"message": f"Error: {str(e)}"
})
all_passed = False
# Print results
print("=" * 60)
print("HARVEST PROMPT TEST RESULTS")
print("=" * 60)
for result in results:
status = "✓ PASS" if result["passed"] else "✗ FAIL"
print(f"{status}: {result['test']}")
print(f" {result['message']}")
print()
print("=" * 60)
if all_passed:
print("ALL TESTS PASSED!")
else:
print("SOME TESTS FAILED!")
print("=" * 60)
return all_passed, results
if __name__ == "__main__":
all_passed, results = run_all_tests()
# Save results to file
with open("test_results.json", "w") as f:
json.dump({
"all_passed": all_passed,
"results": results,
"timestamp": "2026-04-14T19:05:00Z"
}, f, indent=2)
print(f"Results saved to test_results.json")
# Exit with appropriate code
exit(0 if all_passed else 1)

View File

@@ -1,80 +0,0 @@
#!/usr/bin/env python3
"""
Validate knowledge files and index.json against the schema.
Usage:
python scripts/validate_knowledge.py
"""
import json
import sys
from pathlib import Path
VALID_CATEGORIES = {"fact", "pitfall", "pattern", "tool-quirk", "question"}
REQUIRED_FACT_FIELDS = {"id", "fact", "category", "domain", "confidence"}
MAX_FACT_LENGTH = 280
def validate_fact(fact, source=""):
errors = []
for field in REQUIRED_FACT_FIELDS:
if field not in fact:
errors.append(f"{source}: missing required field '{field}'")
if "fact" in fact:
if not isinstance(fact["fact"], str) or len(fact["fact"].strip()) == 0:
errors.append(f"{source}: 'fact' must be non-empty string")
elif len(fact["fact"]) > MAX_FACT_LENGTH:
errors.append(f"{source}: 'fact' exceeds {MAX_FACT_LENGTH} chars")
if "category" in fact and fact["category"] not in VALID_CATEGORIES:
errors.append(f"{source}: invalid category '{fact['category']}'")
if "confidence" in fact:
if not isinstance(fact["confidence"], (int, float)):
errors.append(f"{source}: 'confidence' must be a number")
elif not (0.0 <= fact["confidence"] <= 1.0):
errors.append(f"{source}: 'confidence' must be 0.0-1.0")
if "id" in fact:
parts = fact["id"].split(":")
if len(parts) != 3:
errors.append(f"{source}: 'id' must be domain:category:sequence")
elif parts[1] not in VALID_CATEGORIES:
errors.append(f"{source}: id category '{parts[1]}' invalid")
return errors
def main():
repo_root = Path(__file__).parent.parent
index_path = repo_root / "knowledge" / "index.json"
all_errors = []
if not index_path.exists():
print(f"VALIDATION FAILED: index.json not found at {index_path}")
sys.exit(1)
with open(index_path) as f:
data = json.load(f)
if "version" not in data:
all_errors.append("index.json: missing 'version'")
if "facts" not in data or not isinstance(data["facts"], list):
all_errors.append("index.json: missing or invalid 'facts'")
seen_ids = set()
for i, fact in enumerate(data.get("facts", [])):
all_errors.extend(validate_fact(fact, f"facts[{i}]"))
if "id" in fact:
if fact["id"] in seen_ids:
all_errors.append(f"index.json: duplicate id '{fact['id']}'")
seen_ids.add(fact["id"])
if all_errors:
print(f"VALIDATION FAILED - {len(all_errors)} error(s):\n")
for e in all_errors:
print(f" x {e}")
sys.exit(1)
else:
print(f"VALIDATION PASSED - {len(data.get('facts', []))} facts, schema v{data.get('version', '?')}")
sys.exit(0)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,18 @@
{"role": "user", "content": "Deploy the application to production"}
{"role": "assistant", "content": "I'll deploy the application to production.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ssh deploy@prod-server 'cd /app && git pull && docker-compose up -d'"}}]}
{"role": "tool", "content": "ssh: connect to host prod-server port 22: Connection refused"}
{"role": "assistant", "content": "Connection refused. Let me check the server status.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ping -c 3 prod-server"}}]}
{"role": "tool", "content": "PING prod-server (192.168.1.100): 56 data bytes\nRequest timeout for icmp_seq 0\nRequest timeout for icmp_seq 1\nRequest timeout for icmp_seq 2\n--- prod-server ping statistics ---\n3 packets transmitted, 0 packets received, 100.0% packet loss"}
{"role": "assistant", "content": "The server appears to be down. Let me check if there's an alternative deployment method."}
{"role": "user", "content": "Try using the VPS instead"}
{"role": "assistant", "content": "I'll try deploying to the VPS.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ssh deploy@vps-server 'cd /app && git pull && docker-compose up -d'"}}]}
{"role": "tool", "content": "Permission denied (publickey)."}
{"role": "assistant", "content": "Permission denied. The SSH key might not be set up correctly. Let me check the SSH configuration.", "tool_calls": [{"function": "terminal", "arguments": {"command": "cat ~/.ssh/config"}}]}
{"role": "tool", "content": "Host vps-server\n HostName 167.99.126.228\n User deploy\n IdentityFile ~/.ssh/id_rsa_vps"}
{"role": "assistant", "content": "The SSH config looks correct. Let me check if the key exists.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ls -la ~/.ssh/id_rsa_vps"}}]}
{"role": "tool", "content": "ls: /Users/user/.ssh/id_rsa_vps: No such file or directory"}
{"role": "assistant", "content": "The SSH key doesn't exist. This is a pitfall - we need to generate the SSH key first."}
{"role": "user", "content": "Generate a new SSH key for the VPS"}
{"role": "assistant", "content": "I'll generate a new SSH key.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa_vps -N ''"}}]}
{"role": "tool", "content": "Generating public/private rsa key pair.\nYour identification has been saved in /Users/user/.ssh/id_rsa_vps\nYour public key has been saved in /Users/user/.ssh/id_rsa_vps.pub\nThe key fingerprint is:\nSHA256:abc123... user@machine"}
{"role": "assistant", "content": "SSH key generated. Now I need to add it to the VPS."}

View File

@@ -0,0 +1,14 @@
{"role": "user", "content": "Check the Gitea API for open issues"}
{"role": "assistant", "content": "I'll check the Gitea API for open issues.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -H 'Authorization: token $GITEA_TOKEN' https://forge.alexanderwhitestone.com/api/v1/repos/Timmy_Foundation/compounding-intelligence/issues?state=open"}}]}
{"role": "tool", "content": "curl: (6) Could not resolve host: forge.alexanderwhitestone.com"}
{"role": "assistant", "content": "DNS resolution failed. Let me check the network connectivity.", "tool_calls": [{"function": "terminal", "arguments": {"command": "nslookup forge.alexanderwhitestone.com"}}]}
{"role": "tool", "content": "Server: 8.8.8.8\nAddress: 8.8.8.8#53\n\n** server can't find forge.alexanderwhitestone.com: NXDOMAIN"}
{"role": "assistant", "content": "The domain doesn't resolve. This might be a DNS issue. Let me try using the IP address directly.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -H 'Authorization: token $GITEA_TOKEN' https://167.99.126.228/api/v1/repos/Timmy_Foundation/compounding-intelligence/issues?state=open"}}]}
{"role": "tool", "content": "curl: (60) SSL certificate problem: self signed certificate"}
{"role": "assistant", "content": "SSL certificate issue. The server is using a self-signed certificate. Let me try with -k flag to skip SSL verification.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -k -H 'Authorization: token $GITEA_TOKEN' https://167.99.126.228/api/v1/repos/Timmy_Foundation/compounding-intelligence/issues?state=open"}}]}
{"role": "tool", "content": "[]"}
{"role": "assistant", "content": "No open issues found. The API returned an empty array."}
{"role": "user", "content": "Check if there are any pull requests"}
{"role": "assistant", "content": "Let me check for pull requests.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -k -H 'Authorization: token $GITEA_TOKEN' https://167.99.126.228/api/v1/repos/Timmy_Foundation/compounding-intelligence/pulls?state=open"}}]}
{"role": "tool", "content": "[{"id": 20, "title": "feat: add harvester.py + session_reader.py — session knowledge extractor (closes #8)", "state": "open", "user": {"login": "Rockachopa"}, "created_at": "2026-04-14T17:28:20Z"}]"}
{"role": "assistant", "content": "Found 1 open pull request: PR #20 - "feat: add harvester.py + session_reader.py — session knowledge extractor (closes #8)" by Rockachopa."}

View File

@@ -0,0 +1,10 @@
{"role": "user", "content": "How do I deploy a new version of the application?"}
{"role": "assistant", "content": "Here's the deployment process:\n1. Run tests: pytest\n2. Build Docker image: docker build -t app:latest .\n3. Push to registry: docker push registry.example.com/app:latest\n4. Update production: ssh prod 'cd /app && docker-compose pull && docker-compose up -d'\n5. Verify: curl https://app.example.com/health"}
{"role": "user", "content": "What if tests fail?"}
{"role": "assistant", "content": "If tests fail:\n1. Check the error logs: pytest -v\n2. Fix the failing tests\n3. Re-run tests: pytest\n4. If still failing, check recent changes: git log --oneline -10\n5. Consider reverting: git revert HEAD"}
{"role": "user", "content": "What if the Docker build fails?"}
{"role": "assistant", "content": "If Docker build fails:\n1. Check Dockerfile syntax: docker build --no-cache -t app:latest .\n2. Check for missing dependencies: cat requirements.txt\n3. Check disk space: df -h\n4. Clean up Docker: docker system prune -f\n5. Try building with verbose output: docker build --progress=plain -t app:latest ."}
{"role": "user", "content": "What if the push to registry fails?"}
{"role": "assistant", "content": "If push to registry fails:\n1. Check registry credentials: docker login registry.example.com\n2. Check registry space: curl https://registry.example.com/v2/_catalog\n3. Check image size: docker images app:latest\n4. Try tagging with version: docker tag app:latest registry.example.com/app:v1.0.0\n5. Push with version tag: docker push registry.example.com/app:v1.0.0"}
{"role": "user", "content": "What if the production update fails?"}
{"role": "assistant", "content": "If production update fails:\n1. Check SSH connection: ssh prod 'echo connected'\n2. Check disk space on prod: ssh prod 'df -h'\n3. Check Docker status: ssh prod 'docker ps -a'\n4. Check logs: ssh prod 'docker-compose logs --tail=100'\n5. Rollback: ssh prod 'cd /app && git checkout HEAD~1 && docker-compose up -d'"}

View File

@@ -0,0 +1,12 @@
{"role": "user", "content": "I need to implement a new feature for the application"}
{"role": "assistant", "content": "I can help you implement a new feature. What feature would you like to add?"}
{"role": "user", "content": "I want to add user authentication"}
{"role": "assistant", "content": "User authentication is a good feature. I have a few questions:\n1. What authentication method do you prefer? (JWT, OAuth, Session-based)\n2. Do you need social login? (Google, GitHub, etc.)\n3. What user data do you need to store?\n4. Do you need role-based access control?\n5. What's your database? (PostgreSQL, MySQL, SQLite)"}
{"role": "user", "content": "I prefer JWT, no social login, just email and password, no roles, PostgreSQL"}
{"role": "assistant", "content": "Great choices. Here's the implementation plan:\n1. Create user model with email and password hash\n2. Create JWT token generation and validation\n3. Create login endpoint: POST /api/auth/login\n4. Create register endpoint: POST /api/auth/register\n5. Create middleware for protected routes\n6. Add password hashing with bcrypt\n7. Add rate limiting for login attempts\n8. Add password reset functionality\n9. Add email verification\n10. Test all endpoints"}
{"role": "user", "content": "What about password reset?"}
{"role": "assistant", "content": "For password reset:\n1. Create reset token table\n2. Create forgot password endpoint: POST /api/auth/forgot-password\n3. Send reset email with token\n4. Create reset password endpoint: POST /api/auth/reset-password\n5. Validate reset token and expiration\n6. Update password and delete token\n7. Send confirmation email"}
{"role": "user", "content": "What about email verification?"}
{"role": "assistant", "content": "For email verification:\n1. Add email_verified field to user model\n2. Create verification token table\n3. Send verification email on registration\n4. Create verify email endpoint: GET /api/auth/verify-email?token=...\n5. Validate token and update email_verified\n6. Add middleware to check email_verified for certain routes\n7. Add resend verification email endpoint"}
{"role": "user", "content": "What about rate limiting?"}
{"role": "assistant", "content": "For rate limiting:\n1. Use Redis for rate limit storage\n2. Create rate limit middleware\n3. Set limits: 5 login attempts per minute, 3 password resets per hour\n4. Return 429 Too Many Requests when limit exceeded\n5. Add X-RateLimit headers to responses\n6. Log rate limit violations\n7. Add whitelist for trusted IPs"}

View File

@@ -0,0 +1,13 @@
{"role": "user", "content": "Clone the repo at forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence"}
{"role": "assistant", "content": "I'll clone the repository for you.", "tool_calls": [{"function": "terminal", "arguments": {"command": "git clone https://forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence.git"}}]}
{"role": "tool", "content": "Cloning into 'compounding-intelligence'...\nremote: Enumerating objects: 15, done.\nremote: Counting objects: 100% (15/15), done.\nremote: Compressing objects: 100% (12/12), done.\nremote: Total 15 (delta 2), reused 0 (delta 0), pack-reused 0\nReceiving objects: 100% (15/15), done.\nResolving deltas: 100% (2/2), done."}
{"role": "assistant", "content": "Repository cloned successfully. Let me check the structure.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ls -la compounding-intelligence/"}}]}
{"role": "tool", "content": "total 8\ndrwxr-xr-x 5 user staff 160 Apr 14 17:21 .\ndrwxr-xr-x 12 user staff 384 Apr 14 17:21 ..\ndrwxr-xr-x 13 user staff 416 Apr 14 17:21 .git\n-rw-r--r-- 1 user staff 1234 Apr 14 17:21 README.md\ndrwxr-xr-x 3 user staff 96 Apr 14 17:21 knowledge\n-rw-r--r-- 1 user staff 456 Apr 14 17:21 metrics\ndrwxr-xr-x 4 user staff 128 Apr 14 17:21 scripts\ndrwxr-xr-x 3 user staff 96 Apr 14 17:21 templates"}
{"role": "user", "content": "What files are in the templates directory?"}
{"role": "assistant", "content": "Let me check the templates directory.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ls -la compounding-intelligence/templates/"}}]}
{"role": "tool", "content": "total 0\ndrwxr-xr-x 3 user staff 96 Apr 14 17:21 .\ndrwxr-xr-x 5 user staff 160 Apr 14 17:21 ..\n-rw-r--r-- 1 user staff 0 Apr 14 17:21 .gitkeep"}
{"role": "assistant", "content": "The templates directory currently only contains a .gitkeep file. No templates have been added yet."}
{"role": "user", "content": "Create a harvest-prompt.md template in the templates directory."}
{"role": "assistant", "content": "I'll create the harvest-prompt.md template for you.", "tool_calls": [{"function": "terminal", "arguments": {"command": "cat > compounding-intelligence/templates/harvest-prompt.md << 'EOF'\n# Knowledge Extraction Prompt\n\nExtract knowledge from session transcripts.\nEOF"}}]}
{"role": "tool", "content": ""}
{"role": "assistant", "content": "Template created successfully. The harvest-prompt.md file is now in the templates directory."}