Compare commits
81 Commits
fix/10-kno
...
step35/232
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
759abffd00 | ||
|
|
4b5a675355 | ||
| 345d2451d0 | |||
| 8aa9c9f018 | |||
| 277f9e3a2b | |||
| 21f654a159 | |||
| 12abaad838 | |||
| c106db2e28 | |||
| 242c77cc99 | |||
| fe94130380 | |||
| 4181065f60 | |||
| cc215e3ed7 | |||
| baa2c84c3f | |||
| 6dd354385f | |||
|
|
55adcb31dc | ||
|
|
ec0e9d65ca | ||
| b732172dcc | |||
| f7c479c4eb | |||
| c203010e3a | |||
|
|
e1e42c3f8e | ||
| 7a4677c752 | |||
| 229c327c9e | |||
| 537bb1b61b | |||
| fe8a70adc1 | |||
|
|
a4cbfbd5bf | ||
|
|
10096977bd | ||
|
|
08c4237dca | ||
|
|
661f2730f2 | ||
|
|
80cc656fc1 | ||
|
|
522fa84151 | ||
|
|
69992cf82a | ||
|
|
d0a422deef | ||
|
|
4210a4ea9d | ||
|
|
bac0e0336b | ||
|
|
64b823c598 | ||
|
|
27459271ac | ||
|
|
45cd835bd2 | ||
|
|
5e8a201a26 | ||
|
|
fa43d82f79 | ||
|
|
c7b7b38510 | ||
|
|
908fbbb26c | ||
|
|
ea33c331f6 | ||
|
|
f55d31cd02 | ||
| 8ca3132990 | |||
|
|
363407293d | ||
|
|
4bd0a5d7ac | ||
| 93bc3fc18a | |||
|
|
9668034ad6 | ||
| 49365c64d2 | |||
|
|
341abab2a0 | ||
| e8359cf10a | |||
| b3592e14ad | |||
| f1175df79d | |||
| be805a1b4c | |||
| 1d47665dd4 | |||
| 5eab5e4aac | |||
| 71dd801575 | |||
| e6f1b07f16 | |||
| 81c02f6709 | |||
| c2c3c6a3b9 | |||
| d664119b9c | |||
| 764414d4d5 | |||
| 54f3bef7fc | |||
| 4fcd372de4 | |||
| 77a753f6f2 | |||
| cbebd93cbb | |||
| b36f617d4a | |||
| b5466dc938 | |||
| 55797c8a3e | |||
| 7342fc7cb2 | |||
| 206cfbb498 | |||
| cdb71adddf | |||
| 160dfcf419 | |||
| 8d716ff03f | |||
| 920510996e | |||
| 1fafeaf5a4 | |||
| 36b440f998 | |||
| 9f3caabf42 | |||
| a21f3a44e1 | |||
|
|
b65256bf76 | ||
|
|
da073ad7cf |
22
.gitea/workflows/test.yml
Normal file
22
.gitea/workflows/test.yml
Normal file
@@ -0,0 +1,22 @@
|
||||
name: Test
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
push:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
pytest:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.11'
|
||||
- name: Install test dependencies
|
||||
run: |
|
||||
python -m pip install --upgrade pip
|
||||
pip install -r requirements.txt
|
||||
- name: Run test suite
|
||||
run: |
|
||||
make test
|
||||
2
.gitignore
vendored
Normal file
2
.gitignore
vendored
Normal file
@@ -0,0 +1,2 @@
|
||||
__pycache__/
|
||||
*.pyc
|
||||
251
GENOME.md
Normal file
251
GENOME.md
Normal file
@@ -0,0 +1,251 @@
|
||||
# GENOME.md — compounding-intelligence
|
||||
|
||||
**Generated:** 2026-04-17
|
||||
**Repo:** Timmy_Foundation/compounding-intelligence
|
||||
**Description:** Turn 1B+ daily agent tokens into durable, compounding fleet intelligence.
|
||||
|
||||
---
|
||||
|
||||
## Project Overview
|
||||
|
||||
Every agent session starts at zero. The same HTTP 405 gets rediscovered as a branch protection issue. The same token path gets searched from scratch. Intelligence evaporates when the session ends.
|
||||
|
||||
Compounding-intelligence solves this with three pipelines forming a loop:
|
||||
|
||||
```
|
||||
SESSION ENDS → HARVESTER → KNOWLEDGE STORE → BOOTSTRAPPER → NEW SESSION STARTS SMARTER
|
||||
↓
|
||||
MEASURER → Prove it's working
|
||||
```
|
||||
|
||||
**Status:** Active development. Core pipelines implemented. 20+ scripts, 14 test files, knowledge store populated with real data.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
TRANS[Session Transcripts<br/>~/.hermes/sessions/*.jsonl] --> READER[session_reader.py]
|
||||
READER --> HARVESTER[harvester.py]
|
||||
HARVESTER -->|LLM extraction| PROMPT[harvest-prompt.md]
|
||||
HARVESTER --> DEDUP[deduplicate()]
|
||||
DEDUP --> INDEX[knowledge/index.json]
|
||||
DEDUP --> GLOBAL[knowledge/global/*.yaml]
|
||||
DEDUP --> REPO[knowledge/repos/*.yaml]
|
||||
|
||||
INDEX --> BOOTSTRAPPER[bootstrapper.py]
|
||||
BOOTSTRAPPER -->|filter + rank + truncate| CONTEXT[Bootstrap Context<br/>2k token injection]
|
||||
CONTEXT --> SESSION[New Session starts smarter]
|
||||
|
||||
INDEX --> VALIDATOR[validate_knowledge.py]
|
||||
INDEX --> STALENESS[knowledge_staleness_check.py]
|
||||
INDEX --> GAPS[knowledge_gap_identifier.py]
|
||||
|
||||
TRANS --> SAMPLER[sampler.py]
|
||||
SAMPLER -->|score + rank| BEST[High-value sessions]
|
||||
BEST --> HARVESTER
|
||||
|
||||
TRANS --> METADATA[session_metadata.py]
|
||||
METADATA --> SUMMARY[SessionSummary objects]
|
||||
|
||||
KNOWLEDGE --> DIFF[diff_analyzer.py]
|
||||
DIFF --> PROPOSALS[improvement_proposals.py]
|
||||
PROPOSALS --> PRIORITIES[priority_rebalancer.py]
|
||||
```
|
||||
|
||||
## Entry Points
|
||||
|
||||
### Core Pipelines
|
||||
|
||||
| Script | Purpose | Key Functions |
|
||||
|--------|---------|---------------|
|
||||
| `harvester.py` | Extract knowledge from session transcripts | `harvest_session()`, `call_llm()`, `deduplicate()`, `validate_fact()` |
|
||||
| `bootstrapper.py` | Build pre-session context from knowledge store | `build_bootstrap_context()`, `filter_facts()`, `sort_facts()`, `truncate_to_tokens()` |
|
||||
| `session_reader.py` | Parse JSONL session transcripts | `read_session()`, `extract_conversation()`, `messages_to_text()` |
|
||||
| `sampler.py` | Score and rank sessions for harvesting value | `scan_session_fast()`, `score_session()` |
|
||||
| `session_metadata.py` | Extract structured metadata from sessions | `extract_session_metadata()`, `SessionSummary` |
|
||||
|
||||
### Analysis & Quality
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `validate_knowledge.py` | Validate knowledge index schema compliance |
|
||||
| `knowledge_staleness_check.py` | Detect stale knowledge (source changed since extraction) |
|
||||
| `knowledge_gap_identifier.py` | Find untested functions, undocumented APIs, missing tests |
|
||||
| `diff_analyzer.py` | Analyze code diffs for improvement signals |
|
||||
| `improvement_proposals.py` | Generate ranked improvement proposals |
|
||||
| `priority_rebalancer.py` | Rebalance priorities across proposals |
|
||||
| `automation_opportunity_finder.py` | Find manual steps that can be automated |
|
||||
| `dead_code_detector.py` | Detect unused code |
|
||||
| `dependency_graph.py` | Map dependency relationships |
|
||||
| `perf_bottleneck_finder.py` | Find performance bottlenecks |
|
||||
| `refactoring_opportunity_finder.py` | Identify refactoring targets |
|
||||
| `gitea_issue_parser.py` | Parse Gitea issues for knowledge extraction |
|
||||
|
||||
### Automation
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `session_pair_harvester.py` | Extract training pairs from sessions |
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
1. Session ends → .jsonl written to ~/.hermes/sessions/
|
||||
2. sampler.py scores sessions by age, recency, repo coverage
|
||||
3. harvester.py reads top sessions, calls LLM with harvest-prompt.md
|
||||
4. LLM extracts facts/pitfalls/patterns/quirks/questions
|
||||
5. deduplicate() checks against existing index via fact_fingerprint()
|
||||
6. validate_fact() checks schema compliance
|
||||
7. write_knowledge() appends to knowledge/index.json + per-repo YAML
|
||||
8. On next session start, bootstrapper.py:
|
||||
a. Loads knowledge/index.json
|
||||
b. Filters by session's repo and agent type
|
||||
c. Sorts by confidence (high first), then recency
|
||||
d. Truncates to 2k token budget
|
||||
e. Injects as pre-context
|
||||
9. Agent starts with full situational awareness instead of zero
|
||||
```
|
||||
|
||||
## Key Abstractions
|
||||
|
||||
### Knowledge Item (fact/pitfall/pattern/quirk/question)
|
||||
```json
|
||||
{
|
||||
"fact": "Gitea token is at ~/.config/gitea/token",
|
||||
"category": "tool-quirk",
|
||||
"repo": "global",
|
||||
"confidence": 0.9,
|
||||
"evidence": "Found during clone attempt",
|
||||
"source_session": "2026-04-13_abc123",
|
||||
"extracted_at": "2026-04-13T20:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### SessionSummary (session_metadata.py)
|
||||
Extracted metadata per session: duration, token count, tools used, repos touched, error count, outcome.
|
||||
|
||||
### Gap / GapReport (knowledge_gap_identifier.py)
|
||||
Structured gap analysis: untested functions, undocumented APIs, missing tests. Severity: critical/high/medium/low.
|
||||
|
||||
### Knowledge Index (knowledge/index.json)
|
||||
Machine-readable fact store. 12KB, populated with real data. Categories: fact, pitfall, pattern, tool-quirk, question.
|
||||
|
||||
## Knowledge Store
|
||||
|
||||
```
|
||||
knowledge/
|
||||
├── index.json # Master fact store (12KB, populated)
|
||||
├── SCHEMA.md # Schema documentation
|
||||
├── global/
|
||||
│ ├── pitfalls.yaml # Cross-repo pitfalls (2KB)
|
||||
│ └── tool-quirks.yaml # Tool-specific quirks (2KB)
|
||||
├── repos/
|
||||
│ ├── hermes-agent.yaml # hermes-agent knowledge (2KB)
|
||||
│ └── the-nexus.yaml # the-nexus knowledge (2KB)
|
||||
└── agents/ # Per-agent knowledge (empty)
|
||||
```
|
||||
|
||||
## API Surface
|
||||
|
||||
### LLM API (consumed)
|
||||
| Provider | Endpoint | Usage |
|
||||
|----------|----------|-------|
|
||||
| Nous Research | `https://inference-api.nousresearch.com/v1` | Knowledge extraction |
|
||||
| Ollama | `http://localhost:11434/v1` | Local fallback |
|
||||
|
||||
### File API (consumed/produced)
|
||||
| Path | Format | Direction |
|
||||
|------|--------|-----------|
|
||||
| `~/.hermes/sessions/*.jsonl` | JSONL | Input (session transcripts) |
|
||||
| `knowledge/index.json` | JSON | Output (master fact store) |
|
||||
| `knowledge/global/*.yaml` | YAML | Output (cross-repo knowledge) |
|
||||
| `knowledge/repos/*.yaml` | YAML | Output (per-repo knowledge) |
|
||||
| `templates/harvest-prompt.md` | Markdown | Config (extraction prompt) |
|
||||
|
||||
## Test Coverage
|
||||
|
||||
**14 test files** covering core pipelines:
|
||||
|
||||
| Test File | Covers |
|
||||
|-----------|--------|
|
||||
| `test_harvest_prompt.py` | Prompt validation, hallucination detection |
|
||||
| `test_harvest_prompt_comprehensive.py` | Extended prompt testing |
|
||||
| `test_harvester_pipeline.py` | Harvester extraction + dedup |
|
||||
| `test_bootstrapper.py` | Context building, filtering, truncation |
|
||||
| `test_session_pair_harvester.py` | Training pair extraction |
|
||||
| `test_improvement_proposals.py` | Proposal generation |
|
||||
| `test_priority_rebalancer.py` | Priority scoring |
|
||||
| `test_knowledge_staleness.py` | Staleness detection |
|
||||
| `test_automation_opportunity_finder.py` | Automation detection |
|
||||
| `test_diff_analyzer.py` | Diff analysis |
|
||||
| `test_gitea_issue_parser.py` | Issue parsing |
|
||||
| `test_refactoring_opportunity_finder.py` | Refactoring signals |
|
||||
| `test_knowledge_gap_identifier.py` | Gap analysis |
|
||||
| `test_perf_bottleneck_finder.py` | Perf bottleneck detection |
|
||||
|
||||
### Coverage Gaps
|
||||
|
||||
1. **session_reader.py** — No dedicated test file (tested indirectly)
|
||||
2. **sampler.py** — No test file (scoring logic untested)
|
||||
3. **session_metadata.py** — No test file
|
||||
4. **validate_knowledge.py** — No test file
|
||||
5. **knowledge_staleness_check.py** — Tested but limited
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### API Key Handling
|
||||
- `harvester.py` reads API key from `~/.hermes/auth.json` or env vars
|
||||
- Key passed to LLM API in request headers only
|
||||
- No key logging
|
||||
|
||||
### Knowledge Integrity
|
||||
- `validate_fact()` checks schema before writing
|
||||
- `deduplicate()` prevents duplicate entries via fingerprint
|
||||
- `knowledge_staleness_check.py` detects when source code changed but knowledge didn't
|
||||
- Confidence scores prevent low-quality knowledge from polluting the store
|
||||
|
||||
### File Safety
|
||||
- Knowledge writes are append-only (never deletes)
|
||||
- Bootstrap context is truncated to budget (no prompt injection via knowledge)
|
||||
- Session reader handles malformed JSONL gracefully
|
||||
|
||||
## File Index
|
||||
|
||||
```
|
||||
scripts/
|
||||
harvester.py (473 lines) — Core knowledge extraction
|
||||
bootstrapper.py (302 lines) — Pre-session context builder
|
||||
session_reader.py (137 lines) — JSONL session parser
|
||||
sampler.py (363 lines) — Session scoring + ranking
|
||||
session_metadata.py (271 lines) — Session metadata extraction
|
||||
validate_knowledge.py (44 lines) — Index validation
|
||||
knowledge_staleness_check.py (125 lines) — Staleness detection
|
||||
knowledge_gap_identifier.py (291 lines) — Gap analysis engine
|
||||
diff_analyzer.py (203 lines) — Diff analysis
|
||||
improvement_proposals.py (518 lines) — Proposal generation
|
||||
priority_rebalancer.py (745 lines) — Priority scoring
|
||||
automation_opportunity_finder.py (600 lines) — Automation detection
|
||||
dead_code_detector.py (270 lines) — Dead code detection
|
||||
dependency_graph.py (220 lines) — Dependency mapping
|
||||
perf_bottleneck_finder.py (635 lines) — Perf analysis
|
||||
refactoring_opportunity_finder.py (46 lines) — Refactoring signals
|
||||
gitea_issue_parser.py (140 lines) — Gitea issue parsing
|
||||
session_pair_harvester.py (224 lines) — Training pair extraction
|
||||
knowledge/
|
||||
index.json (12KB) — Master fact store
|
||||
SCHEMA.md (3KB) — Schema docs
|
||||
global/pitfalls.yaml (2KB) — Cross-repo pitfalls
|
||||
global/tool-quirks.yaml (2KB) — Tool quirks
|
||||
repos/hermes-agent.yaml (2KB) — Repo-specific knowledge
|
||||
repos/the-nexus.yaml (2KB) — Repo-specific knowledge
|
||||
templates/
|
||||
harvest-prompt.md (4KB) — Extraction prompt
|
||||
test_sessions/ (5 files) — Sample transcripts
|
||||
tests/ + scripts/test_* (14 files)— Test suite
|
||||
```
|
||||
|
||||
**Total:** ~6,500 lines of code across 18 scripts + 14 test files.
|
||||
|
||||
---
|
||||
|
||||
*Generated by Codebase Genome pipeline — Issue #676*
|
||||
4
Makefile
Normal file
4
Makefile
Normal file
@@ -0,0 +1,4 @@
|
||||
.PHONY: test
|
||||
|
||||
test:
|
||||
python3 -m pytest tests/test_ci_config.py scripts/test_*.py -v
|
||||
207
docs/swarm-memory-design.md
Normal file
207
docs/swarm-memory-design.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# Swarm Memory Architecture — Design Note
|
||||
|
||||
**Issue:** #232 — [ATLAS][Research] Solve the swarm-memory gap for concurrent subagents
|
||||
**Repo:** Timmy_Foundation/compounding-intelligence
|
||||
**Status:** Research — Design Draft
|
||||
**Author:** step35 (burn)
|
||||
**Date:** 2026-04-26
|
||||
|
||||
---
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
The compounding-intelligence pipelines assume a **session-bounded** memory model: each agent session starts with injected bootstrap context, runs, produces a transcript, then ends. Knowledge is harvested *after* the session and injected *before* the next.
|
||||
|
||||
But **concurrent subagents** (multiple simultaneous agents working parallel tasks) break this model:
|
||||
|
||||
- **No shared scratch space:** Each subagent operates in isolation; discoveries in sibling sessions aren't visible until the next harvest cycle.
|
||||
- **Race conditions on promotion:** Two subagents may discover the same fact; both write it, causing duplication or conflicts.
|
||||
- **Lost correlation:** Without a shared event log, you cannot reconstruct what happened across the swarm.
|
||||
- **Stale shared state:** If a fact is promoted to global memory while subagents are still running, they may act on outdated assumptions.
|
||||
|
||||
**Core question:** What memory semantics should exist across concurrent subagents so they can cooperate without corrupting each other or losing important results?
|
||||
|
||||
---
|
||||
|
||||
## 2. Session Memory vs Swarm Memory
|
||||
|
||||
### Session Memory (Current)
|
||||
|
||||
| Property | Description |
|
||||
|---|---|
|
||||
| **Scope** | Single agent process lifetime |
|
||||
| **Storage** | In-memory context window + transient tool state |
|
||||
| **Visibility** | Private to that session |
|
||||
| **Lifetime** | Ephemeral — disappears on exit |
|
||||
| **Promotion** | Post-session harvester extracts durable facts |
|
||||
| **Example** | "I read the config file and saw port 8080" |
|
||||
|
||||
### Swarm Memory (What's Missing)
|
||||
|
||||
| Property | Desired |
|
||||
|---|---|
|
||||
| **Scope** | All concurrent subagents in a task group |
|
||||
| **Storage** | Shared, durable, versioned |
|
||||
| **Visibility** | Readable by all siblings; write semantics TBD |
|
||||
| **Lifetime** | Persists for duration of the coordinated task |
|
||||
| **Promotion** | Real-time or near-real-time synchronization |
|
||||
| **Example** | "Agent A found that the API returns 405 on main; all agents should know this now" |
|
||||
|
||||
**Key insight:** Session memory is **private and accumulated**; swarm memory is **shared and coordinated**. The harvester/bootstrapper loop is too slow for real-time coordination.
|
||||
|
||||
---
|
||||
|
||||
## 3. Candidate Designs
|
||||
|
||||
### Design A — Append-Only Event Log + Synthesis
|
||||
|
||||
**Overview:** All subagents write to a shared, append-only event log. A background synthesis process reads the log and extracts high-level facts into the knowledge store. Subagents also read the log to stay current.
|
||||
|
||||
**Data model:**
|
||||
```
|
||||
swarm-memory/
|
||||
event-log.jsonl # Immutable, ordered, concurrent-safe append
|
||||
event-index/ # By agent, by type, by timestamp
|
||||
synthesized-facts/ # Periodic distillation into durable facts
|
||||
checkpoints/ # Snapshot every N events for fast replay
|
||||
```
|
||||
|
||||
**Write path:**
|
||||
1. Subagent observes something → `event_log.append({agent, type, content, timestamp, session_id})`
|
||||
2. Other subagents can tail the log (like a changelog)
|
||||
|
||||
**Read path:**
|
||||
1. Before each action, subagent queries recent events (last N minutes or last M entries)
|
||||
2. Background job periodically runs synthesis LLM to convert raw events → distilled facts
|
||||
|
||||
**Pros:**
|
||||
- **Lossless:** Nothing is ever overwritten; full audit trail
|
||||
- **Concurrent-safe:** Append-only, no locking
|
||||
- **Causality preserved:** Order of discoveries is visible
|
||||
- **Replayable:** Any subagent can reconstruct state from checkpoint + tail
|
||||
|
||||
**Cons:**
|
||||
- **Signal/noise:** Raw events are noisy; synthesis latency means swarm facts lag
|
||||
- **Storage growth:** Event log grows unbounded without pruning policy
|
||||
- **Query performance:** Finding "all facts about X" requires synthesis or full scan
|
||||
- **Coordination latency:** Subagents only learn of discoveries after they're written and tailed
|
||||
|
||||
**Failure modes:**
|
||||
- **Duplication:** Multiple agents write the same observation → synthesis dedups
|
||||
- **Contradiction:** Two agents report conflicting facts → synthesis must reconcile
|
||||
- **Stale state:** Agent reads log at T0, then new events arrive before it acts
|
||||
|
||||
---
|
||||
|
||||
### Design B — Shared Board + Evidence Links
|
||||
|
||||
**Overview:** A shared, mutable board stores distilled facts. Each fact includes provenance links to the agent sessions that discovered it. Agents read-before-write and update via compare-and-swap.
|
||||
|
||||
**Data model:**
|
||||
```
|
||||
swarm-memory/
|
||||
board.yaml # Current set of facts with version stamps
|
||||
evidence-links/ # Mapping: fact_id → [session_id, turn_range]
|
||||
fact-history/ # append-only log of fact revisions (for audit)
|
||||
```
|
||||
|
||||
**Write path (compare-and-swap):**
|
||||
1. Agent reads current fact version
|
||||
2. Agent proposes update with new evidence
|
||||
3. System accepts if version unchanged since read; rejects with retry if conflict
|
||||
4. On accept → append to fact-history, increment board version
|
||||
|
||||
**Read path:**
|
||||
1. Agent reads board.yaml (small, distilled)
|
||||
2. If deeper verification needed, follow evidence-links to source sessions
|
||||
|
||||
**Pros:**
|
||||
- **Low-latency reads:** Board is small and current
|
||||
- **Explicit provenance:** Every fact knows which sessions contributed
|
||||
- **Conflict detection:** CAS catches concurrent updates
|
||||
- **Intentional updates:** Agents must justify changes with evidence
|
||||
|
||||
**Cons:**
|
||||
- **Write contention:** Multiple agents writing same fact cause retry storms
|
||||
- **Central point:** board.yaml is a single source of truth (but versioned)
|
||||
- **Merge complexity:** CAS retry logic must be retry-with-backoff; could stall
|
||||
- **Staleness window:** Between read and act, board may change
|
||||
|
||||
**Failure modes:**
|
||||
- **Thundering herd:** Many agents CAS-fail on same hot fact → exponential backoff needed
|
||||
- **Missing promotions:** A fact discovered but never written because agent crashed pre-write
|
||||
- **Board corruption:** If CAS not atomic, two writes could interleave
|
||||
- **Evidence loss:** If evidence-links point to deleted session transcripts, verification fails
|
||||
|
||||
---
|
||||
|
||||
## 4. Trade-off Matrix
|
||||
|
||||
| Dimension | Event Log | Shared Board |
|
||||
|---|---|---|
|
||||
| **Write concurrency** | Unbounded (append-only) | Contention on hot keys |
|
||||
| **Read latency** | Must scan/synthesize | Direct read (constant-time) |
|
||||
| **Storage efficiency** | Redundant raw events | Condensed facts |
|
||||
| **Auditability** | Full reconstruction | Requires fact-history |
|
||||
| **Coordination speed** | Lag between event → synthesis | Near-real-time (CAS cycle) |
|
||||
| **Complexity** | Log management + synthesis worker | CAS protocol + retry logic |
|
||||
|
||||
**Verdict:** Start with **Event Log** (simpler, safer, no coordination overhead), then layer Board as a *view* over synthesized facts if read latency becomes a bottleneck.
|
||||
|
||||
---
|
||||
|
||||
## 5. Proposed Experimental Prototype
|
||||
|
||||
**Scope:** Minimal viable swarm-memory path for a controlled parallel task.
|
||||
|
||||
**Task:** Have 3 concurrent subagents process a set of GitHub issues. Each agent:
|
||||
1. Reads issue details
|
||||
2. Searches codebase for relevant files
|
||||
3. Drafts a fix
|
||||
4. **Writes discovery events to swarm event log**
|
||||
5. Reads peer discoveries before next step
|
||||
|
||||
**Metrics to collect:**
|
||||
- Duplication rate: how many agents found the same root cause independently?
|
||||
- Correlation lift: did reading peer discoveries change agent behavior?
|
||||
- Latency: time from discovery to visibility across swarm
|
||||
- Synthesis quality: can an LLM summarize raw events into coherent fact?
|
||||
|
||||
**Implementation plan:**
|
||||
1. `scripts/swarm_event_log.py` — thread-safe JSONL append + tail API
|
||||
2. `scripts/swarm_synthesizer.py` — periodic batch that consumes event log, emits distilled facts
|
||||
3. Patch `hermes-agent` burn worker to emit events at key milestones
|
||||
4. Simple dashboard: `metrics/swarm_memory_dashboard.md`
|
||||
|
||||
**Success criteria:** Prototype runs end-to-end with 3 agents; event log captures discoveries; synthesizer produces at least one cross-agent insight.
|
||||
|
||||
---
|
||||
|
||||
## 6. Failure Modes to Watch
|
||||
|
||||
| Mode | Symptom | Mitigation |
|
||||
|---|---|---|
|
||||
| Duplication | Same fact appears from 3 agents | Synthesis dedup; evidence links count |
|
||||
| Contradiction | Agent A says "port 8080", Agent B says "port 3000" | Evidence-weighted majority; timestamp priority |
|
||||
| Stale shared state | Agent reads board, acts, board changed under it | Version vectors; read-modify-write CAS with retry |
|
||||
| Missing promotion | Discovery lost on agent crash | Event log is durable before action; recovery from last checkpoint |
|
||||
| Race on hot fact | Two agents try to write same fact simultaneously | CAS backoff; random jitter |
|
||||
| Log unbounded | Event log grows 10GB/day | Checkpoint + prune: keep summary + recent window |
|
||||
|
||||
---
|
||||
|
||||
## 7. Next Steps (Out of Scope for This Note)
|
||||
|
||||
- Build the event log implementation (Design A, phase 1)
|
||||
- Wire hermes-agent to emit events
|
||||
- Run the 3-agent parallel experiment
|
||||
- Measure and compare Board vs Log read patterns
|
||||
- Decide: ship to prod or iterate
|
||||
|
||||
---
|
||||
|
||||
## 8. References
|
||||
|
||||
- Parent: Timmy_Foundation/hermes-agent#984 — [ATLAS] Steal highest-leverage ecosystem patterns
|
||||
- Related: compounding-intelligence#229 — Telemetry ingestion (Tokscale)
|
||||
- Related: hermes-agent#985 — Lossless context + memory subsystem (LCM/GBrain)
|
||||
297
quality_gate.py
Normal file
297
quality_gate.py
Normal file
@@ -0,0 +1,297 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
quality_gate.py — Score and filter knowledge entries.
|
||||
|
||||
Scores each entry on 4 dimensions:
|
||||
- Specificity: concrete examples vs vague generalities
|
||||
- Actionability: can this be used to do something?
|
||||
- Freshness: is this still accurate?
|
||||
- Source quality: was the model/provider reliable?
|
||||
|
||||
Usage:
|
||||
from quality_gate import score_entry, filter_entries, quality_report
|
||||
|
||||
score = score_entry(entry)
|
||||
filtered = filter_entries(entries, threshold=0.5)
|
||||
report = quality_report(entries)
|
||||
"""
|
||||
|
||||
import json
|
||||
import math
|
||||
import re
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any, Optional
|
||||
|
||||
# Source quality scores (higher = more reliable)
|
||||
SOURCE_QUALITY = {
|
||||
"claude-sonnet": 0.9,
|
||||
"claude-opus": 0.95,
|
||||
"gpt-4": 0.85,
|
||||
"gpt-4-turbo": 0.85,
|
||||
"gpt-5": 0.9,
|
||||
"mimo-v2-pro": 0.8,
|
||||
"gemini-pro": 0.8,
|
||||
"llama-3-70b": 0.75,
|
||||
"llama-3-8b": 0.7,
|
||||
"ollama": 0.6,
|
||||
"unknown": 0.5,
|
||||
}
|
||||
|
||||
DEFAULT_SOURCE_QUALITY = 0.5
|
||||
|
||||
# Specificity indicators
|
||||
SPECIFIC_INDICATORS = [
|
||||
r"\b\d+\.\d+", # decimal numbers
|
||||
r"\b\d{4}-\d{2}-\d{2}", # dates
|
||||
r"\b[A-Z][a-z]+\s[A-Z][a-z]+", # proper nouns
|
||||
r"`[^`]+`", # code/commands
|
||||
r"https?://", # URLs
|
||||
r"\b(example|instance|specifically|concretely)\b",
|
||||
r"\b(step \d|first|second|third)\b",
|
||||
r"\b(exactly|precisely|measured|counted)\b",
|
||||
]
|
||||
|
||||
# Vagueness indicators (penalty)
|
||||
VAGUE_INDICATORS = [
|
||||
r"\b(generally|usually|often|sometimes|might|could|perhaps)\b",
|
||||
r"\b(various|several|many|some|few)\b",
|
||||
r"\b(it depends|varies|differs)\b",
|
||||
r"\b(basically|essentially|fundamentally)\b",
|
||||
r"\b(everyone knows|it's obvious|clearly)\b",
|
||||
]
|
||||
|
||||
# Actionability indicators
|
||||
ACTIONABLE_INDICATORS = [
|
||||
r"\b(run|execute|install|deploy|configure|set up)\b",
|
||||
r"\b(use|apply|implement|create|build)\b",
|
||||
r"\b(check|verify|test|validate|confirm)\b",
|
||||
r"\b(fix|resolve|solve|debug|troubleshoot)\b",
|
||||
r"\b(if .+ then|when .+ do|to .+ use)\b",
|
||||
r"```[a-z]*\n", # code blocks
|
||||
r"\$\s", # shell commands
|
||||
r"\b\d+\.\s", # numbered steps
|
||||
]
|
||||
|
||||
|
||||
def score_specificity(content: str) -> float:
|
||||
"""Score specificity: 0=vague, 1=very specific."""
|
||||
content_lower = content.lower()
|
||||
score = 0.5 # baseline
|
||||
|
||||
# Check for specific indicators
|
||||
specific_count = sum(
|
||||
len(re.findall(p, content, re.IGNORECASE))
|
||||
for p in SPECIFIC_INDICATORS
|
||||
)
|
||||
|
||||
# Check for vague indicators
|
||||
vague_count = sum(
|
||||
len(re.findall(p, content_lower))
|
||||
for p in VAGUE_INDICATORS
|
||||
)
|
||||
|
||||
# Adjust score
|
||||
score += min(specific_count * 0.05, 0.4)
|
||||
score -= min(vague_count * 0.08, 0.3)
|
||||
|
||||
# Length bonus (longer = more detail, up to a point)
|
||||
word_count = len(content.split())
|
||||
if word_count > 50:
|
||||
score += min((word_count - 50) * 0.001, 0.1)
|
||||
|
||||
return max(0.0, min(1.0, score))
|
||||
|
||||
|
||||
def score_actionability(content: str) -> float:
|
||||
"""Score actionability: 0=abstract, 1=highly actionable."""
|
||||
content_lower = content.lower()
|
||||
score = 0.3 # baseline (most knowledge is informational)
|
||||
|
||||
# Check for actionable indicators
|
||||
actionable_count = sum(
|
||||
len(re.findall(p, content_lower))
|
||||
for p in ACTIONABLE_INDICATORS
|
||||
)
|
||||
|
||||
score += min(actionable_count * 0.1, 0.6)
|
||||
|
||||
# Code blocks are highly actionable
|
||||
if "```" in content:
|
||||
score += 0.2
|
||||
|
||||
# Numbered steps are actionable
|
||||
if re.search(r"\d+\.\s+\w", content):
|
||||
score += 0.1
|
||||
|
||||
return max(0.0, min(1.0, score))
|
||||
|
||||
|
||||
def score_freshness(timestamp: Optional[str]) -> float:
|
||||
"""Score freshness: 1=new, decays over time."""
|
||||
if not timestamp:
|
||||
return 0.5
|
||||
|
||||
try:
|
||||
if isinstance(timestamp, str):
|
||||
ts = datetime.fromisoformat(timestamp.replace("Z", "+00:00"))
|
||||
else:
|
||||
ts = timestamp
|
||||
|
||||
now = datetime.now(timezone.utc)
|
||||
age_days = (now - ts).days
|
||||
|
||||
# Exponential decay: 1.0 at day 0, 0.5 at ~180 days, 0.1 at ~365 days
|
||||
score = math.exp(-age_days / 180)
|
||||
return max(0.1, min(1.0, score))
|
||||
except (ValueError, TypeError):
|
||||
return 0.5
|
||||
|
||||
|
||||
def score_source_quality(model: Optional[str]) -> float:
|
||||
"""Score source quality based on model/provider."""
|
||||
if not model:
|
||||
return DEFAULT_SOURCE_QUALITY
|
||||
|
||||
# Normalize model name
|
||||
model_lower = model.lower()
|
||||
for key, score in SOURCE_QUALITY.items():
|
||||
if key in model_lower:
|
||||
return score
|
||||
|
||||
return DEFAULT_SOURCE_QUALITY
|
||||
|
||||
|
||||
def score_entry(entry: dict) -> float:
|
||||
"""
|
||||
Score a knowledge entry on quality (0.0-1.0).
|
||||
|
||||
Weights:
|
||||
- specificity: 0.3
|
||||
- actionability: 0.3
|
||||
- freshness: 0.2
|
||||
- source_quality: 0.2
|
||||
"""
|
||||
content = entry.get("content", entry.get("text", entry.get("response", "")))
|
||||
model = entry.get("model", entry.get("provenance", {}).get("model"))
|
||||
timestamp = entry.get("timestamp", entry.get("provenance", {}).get("timestamp"))
|
||||
|
||||
specificity = score_specificity(content)
|
||||
actionability = score_actionability(content)
|
||||
freshness = score_freshness(timestamp)
|
||||
source = score_source_quality(model)
|
||||
|
||||
return round(
|
||||
0.3 * specificity +
|
||||
0.3 * actionability +
|
||||
0.2 * freshness +
|
||||
0.2 * source,
|
||||
4
|
||||
)
|
||||
|
||||
|
||||
def score_entry_detailed(entry: dict) -> dict:
|
||||
"""Score with breakdown."""
|
||||
content = entry.get("content", entry.get("text", entry.get("response", "")))
|
||||
model = entry.get("model", entry.get("provenance", {}).get("model"))
|
||||
timestamp = entry.get("timestamp", entry.get("provenance", {}).get("timestamp"))
|
||||
|
||||
specificity = score_specificity(content)
|
||||
actionability = score_actionability(content)
|
||||
freshness = score_freshness(timestamp)
|
||||
source = score_source_quality(model)
|
||||
|
||||
return {
|
||||
"score": round(0.3 * specificity + 0.3 * actionability + 0.2 * freshness + 0.2 * source, 4),
|
||||
"specificity": round(specificity, 4),
|
||||
"actionability": round(actionability, 4),
|
||||
"freshness": round(freshness, 4),
|
||||
"source_quality": round(source, 4),
|
||||
}
|
||||
|
||||
|
||||
def filter_entries(entries: List[dict], threshold: float = 0.5) -> List[dict]:
|
||||
"""Filter entries below quality threshold."""
|
||||
filtered = []
|
||||
for entry in entries:
|
||||
if score_entry(entry) >= threshold:
|
||||
filtered.append(entry)
|
||||
return filtered
|
||||
|
||||
|
||||
def quality_report(entries: List[dict]) -> str:
|
||||
"""Generate quality distribution report."""
|
||||
if not entries:
|
||||
return "No entries to analyze."
|
||||
|
||||
scores = [score_entry(e) for e in entries]
|
||||
|
||||
avg = sum(scores) / len(scores)
|
||||
min_score = min(scores)
|
||||
max_score = max(scores)
|
||||
|
||||
# Distribution buckets
|
||||
buckets = {"high": 0, "medium": 0, "low": 0, "rejected": 0}
|
||||
for s in scores:
|
||||
if s >= 0.7:
|
||||
buckets["high"] += 1
|
||||
elif s >= 0.5:
|
||||
buckets["medium"] += 1
|
||||
elif s >= 0.3:
|
||||
buckets["low"] += 1
|
||||
else:
|
||||
buckets["rejected"] += 1
|
||||
|
||||
lines = [
|
||||
"=" * 50,
|
||||
" QUALITY GATE REPORT",
|
||||
"=" * 50,
|
||||
f" Total entries: {len(entries)}",
|
||||
f" Average score: {avg:.3f}",
|
||||
f" Min: {min_score:.3f}",
|
||||
f" Max: {max_score:.3f}",
|
||||
"",
|
||||
" Distribution:",
|
||||
]
|
||||
|
||||
for bucket, count in buckets.items():
|
||||
pct = count / len(entries) * 100
|
||||
bar = "█" * int(pct / 5)
|
||||
lines.append(f" {bucket:<12} {count:>5} ({pct:>5.1f}%) {bar}")
|
||||
|
||||
passed = buckets["high"] + buckets["medium"]
|
||||
lines.append(f"\n Pass rate (>= 0.5): {passed}/{len(entries)} ({passed/len(entries)*100:.1f}%)")
|
||||
lines.append("=" * 50)
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser(description="Knowledge quality gate")
|
||||
parser.add_argument("files", nargs="+", help="JSONL files to score")
|
||||
parser.add_argument("--threshold", type=float, default=0.5, help="Quality threshold")
|
||||
parser.add_argument("--json", action="store_true", help="JSON output")
|
||||
parser.add_argument("--filter", action="store_true", help="Filter and write back")
|
||||
args = parser.parse_args()
|
||||
|
||||
all_entries = []
|
||||
for filepath in args.files:
|
||||
with open(filepath) as f:
|
||||
for line in f:
|
||||
if line.strip():
|
||||
all_entries.append(json.loads(line))
|
||||
|
||||
if args.json:
|
||||
results = [{"entry": e, **score_entry_detailed(e)} for e in all_entries]
|
||||
print(json.dumps(results, indent=2))
|
||||
elif args.filter:
|
||||
filtered = filter_entries(all_entries, args.threshold)
|
||||
print(f"Kept {len(filtered)}/{len(all_entries)} entries (threshold: {args.threshold})")
|
||||
else:
|
||||
print(quality_report(all_entries))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
1
requirements.txt
Normal file
1
requirements.txt
Normal file
@@ -0,0 +1 @@
|
||||
pytest>=8,<9
|
||||
554
scripts/automation_opportunity_finder.py
Normal file
554
scripts/automation_opportunity_finder.py
Normal file
@@ -0,0 +1,554 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Automation Opportunity Finder — Scan fleet for manual processes that could be automated.
|
||||
|
||||
Analyzes:
|
||||
1. Cron jobs — finds manual steps between scheduled tasks
|
||||
2. Documentation — extracts TODO/FIXME/manual-step patterns
|
||||
3. Scripts — detects repeated command sequences
|
||||
4. Session transcripts — finds repeated tool-call patterns
|
||||
|
||||
Usage:
|
||||
python3 scripts/automation_opportunity_finder.py --hermes-home ~/.hermes
|
||||
python3 scripts/automation_opportunity_finder.py --hermes-home ~/.hermes --json
|
||||
python3 scripts/automation_opportunity_finder.py --hermes-home ~/.hermes --output proposals.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from collections import Counter, defaultdict
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Patterns that signal manual work
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
MANUAL_STEP_PATTERNS = [
|
||||
# Explicit manual markers
|
||||
(r"(?i)\bTODO[:\s]", "todo"),
|
||||
(r"(?i)\bFIXME[:\s]", "fixme"),
|
||||
(r"(?i)\bMANUAL[:\s]", "manual_step"),
|
||||
(r"(?i)\bHACK[:\s]", "hack"),
|
||||
(r"(?i)\bWORKAROUND[:\s]", "workaround"),
|
||||
# Step-by-step instructions in docs
|
||||
(r"(?i)^(\d+)[.\)]\s+(run|execute|ssh|scp|curl|cd|make|docker|ansible|git)", "sequential_step"),
|
||||
# Explicitly manual operations
|
||||
(r"(?i)\bmanually\b", "manual_keyword"),
|
||||
(r"(?i)\bby hand\b", "manual_keyword"),
|
||||
(r"(?i)\bdon\'?t forget to\b", "manual_keyword"),
|
||||
(r"(?i)\bremember to\b", "manual_keyword"),
|
||||
(r"(?i)\bmake sure to\b", "manual_keyword"),
|
||||
]
|
||||
|
||||
# Shell commands that appear frequently in runbooks — signal automatable workflows
|
||||
SHELL_COMMAND_PATTERNS = [
|
||||
"ssh ", "scp ", "rsync ", "curl ", "wget ",
|
||||
"docker ", "docker-compose ", "kubectl ",
|
||||
"ansible-playbook ", "terraform ", "systemctl ",
|
||||
"systemctl restart", "systemctl status",
|
||||
"git push", "git pull", "git merge", "git checkout",
|
||||
"pip install", "npm install", "cargo build",
|
||||
]
|
||||
|
||||
# Directories to skip during scans — large/uninteresting trees
|
||||
EXCLUDE_DIRS = frozenset({
|
||||
"node_modules", "venv", ".venv", "__pycache__", ".git",
|
||||
"site-packages", "dist", "build", ".tox", ".mypy_cache",
|
||||
".pytest_cache", "coverage", ".next", "vendor",
|
||||
"skills", # hermes skills dir is huge
|
||||
"audio_cache", "skins", "profiles",
|
||||
})
|
||||
|
||||
# Session tool calls that appear repeatedly — candidates for workflow automation
|
||||
TOOL_SEQUENCE_MIN_OCCURRENCES = 3
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Analyzers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def analyze_cron_jobs(hermes_home: str) -> List[Dict[str, Any]]:
|
||||
"""Analyze cron job definitions for automation gaps."""
|
||||
proposals = []
|
||||
cron_dir = Path(hermes_home) / "cron"
|
||||
jobs_file = cron_dir / "jobs.json"
|
||||
|
||||
if not jobs_file.exists():
|
||||
# Try YAML format
|
||||
for ext in (".yaml", ".yml"):
|
||||
alt = cron_dir / f"jobs{ext}"
|
||||
if alt.exists():
|
||||
jobs_file = alt
|
||||
break
|
||||
|
||||
if not jobs_file.exists():
|
||||
return proposals
|
||||
|
||||
try:
|
||||
if jobs_file.suffix == ".json":
|
||||
with open(jobs_file) as f:
|
||||
jobs = json.load(f)
|
||||
else:
|
||||
try:
|
||||
import yaml
|
||||
with open(jobs_file) as f:
|
||||
jobs = yaml.safe_load(f)
|
||||
except ImportError:
|
||||
return proposals
|
||||
except (json.JSONDecodeError, Exception):
|
||||
return proposals
|
||||
|
||||
if not isinstance(jobs, list):
|
||||
return proposals
|
||||
|
||||
# Look for disabled jobs (someone turned them off — might need a different approach)
|
||||
disabled = [j for j in jobs if not j.get("enabled", True)]
|
||||
if disabled:
|
||||
names = [j.get("name", j.get("id", "?")) for j in disabled[:5]]
|
||||
proposals.append({
|
||||
"category": "cron_disabled",
|
||||
"title": f"{len(disabled)} disabled cron job(s) may need automation rework",
|
||||
"description": f"These jobs were disabled: {', '.join(names)}. Investigate why and whether a different automation approach is needed.",
|
||||
"confidence": 0.7,
|
||||
"impact": "medium",
|
||||
"sources": [str(jobs_file)],
|
||||
})
|
||||
|
||||
# Look for jobs with high error counts
|
||||
error_jobs = [j for j in jobs if j.get("last_status") == "error"]
|
||||
if error_jobs:
|
||||
names = [j.get("name", j.get("id", "?")) for j in error_jobs[:5]]
|
||||
proposals.append({
|
||||
"category": "cron_errors",
|
||||
"title": f"{len(error_jobs)} cron job(s) failing — may need automation rework",
|
||||
"description": f"Jobs with errors: {', '.join(names)}. Failure patterns suggest missing dependencies or fragile automation.",
|
||||
"confidence": 0.8,
|
||||
"impact": "high",
|
||||
"sources": [str(jobs_file)],
|
||||
})
|
||||
|
||||
# Look for jobs with delivery errors (platform issues)
|
||||
delivery_errors = [j for j in jobs if j.get("last_delivery_error")]
|
||||
if delivery_errors:
|
||||
proposals.append({
|
||||
"category": "cron_delivery",
|
||||
"title": f"{len(delivery_errors)} cron job(s) have delivery failures",
|
||||
"description": "Delivery failures suggest missing retry logic or platform integration gaps.",
|
||||
"confidence": 0.75,
|
||||
"impact": "medium",
|
||||
"sources": [str(jobs_file)],
|
||||
})
|
||||
|
||||
# Look for jobs on short intervals that could be event-driven
|
||||
for job in jobs:
|
||||
schedule = job.get("schedule", "")
|
||||
# Check for very frequent schedules (every minute, every 5 min)
|
||||
if isinstance(schedule, str) and re.match(r"^\*\/([1-5])\s", schedule):
|
||||
proposals.append({
|
||||
"category": "cron_frequency",
|
||||
"title": f"Job '{job.get('name', job.get('id', '?'))}' runs every {schedule.split()[0]} — consider event-driven",
|
||||
"description": f"High-frequency cron ({schedule}) may be better as event-driven or daemon.",
|
||||
"confidence": 0.6,
|
||||
"impact": "low",
|
||||
"sources": [str(jobs_file)],
|
||||
})
|
||||
|
||||
return proposals
|
||||
|
||||
|
||||
def analyze_documents(root_dirs: List[str]) -> List[Dict[str, Any]]:
|
||||
"""Scan documentation for manual step patterns."""
|
||||
proposals = []
|
||||
doc_extensions = {".md", ".txt", ".rst", ".adoc"}
|
||||
findings_by_category = defaultdict(list)
|
||||
|
||||
for root_dir in root_dirs:
|
||||
root = Path(root_dir)
|
||||
if not root.exists():
|
||||
continue
|
||||
|
||||
for path in root.rglob("*"):
|
||||
if path.is_dir():
|
||||
continue
|
||||
if path.suffix not in doc_extensions:
|
||||
continue
|
||||
# Skip excluded dirs and hidden dirs
|
||||
parts = path.relative_to(root).parts if root in path.parents or root == path.parent else path.parts
|
||||
if any(p.startswith(".") or p in EXCLUDE_DIRS for p in parts):
|
||||
continue
|
||||
if len(parts) > 8:
|
||||
continue
|
||||
|
||||
try:
|
||||
content = path.read_text(errors="replace")
|
||||
except (PermissionError, OSError):
|
||||
continue
|
||||
|
||||
lines = content.split("\n")
|
||||
for i, line in enumerate(lines):
|
||||
stripped = line.strip()
|
||||
if not stripped:
|
||||
continue
|
||||
|
||||
for pattern, category in MANUAL_STEP_PATTERNS:
|
||||
if re.search(pattern, stripped):
|
||||
findings_by_category[category].append({
|
||||
"file": str(path),
|
||||
"line": i + 1,
|
||||
"text": stripped[:200],
|
||||
})
|
||||
|
||||
# Generate proposals from findings
|
||||
for category, findings in findings_by_category.items():
|
||||
if len(findings) < 2:
|
||||
continue
|
||||
|
||||
file_count = len(set(f["file"] for f in findings))
|
||||
proposals.append({
|
||||
"category": f"manual_{category}",
|
||||
"title": f"{len(findings)} '{category}' markers across {file_count} doc(s)",
|
||||
"description": f"Found in: {', '.join(set(Path(f['file']).name for f in findings[:5]))}",
|
||||
"confidence": 0.65,
|
||||
"impact": "medium",
|
||||
"sources": list(set(f["file"] for f in findings[:10])),
|
||||
"details": findings[:5], # sample
|
||||
})
|
||||
|
||||
return proposals
|
||||
|
||||
|
||||
def analyze_scripts(root_dirs: List[str]) -> List[Dict[str, Any]]:
|
||||
"""Detect repeated command sequences in scripts."""
|
||||
proposals = []
|
||||
script_extensions = {".py", ".sh", ".bash", ".zsh"}
|
||||
command_counter = Counter()
|
||||
command_locations = defaultdict(list)
|
||||
|
||||
for root_dir in root_dirs:
|
||||
root = Path(root_dir)
|
||||
if not root.exists():
|
||||
continue
|
||||
|
||||
for path in root.rglob("*"):
|
||||
if path.is_dir():
|
||||
continue
|
||||
if path.suffix not in script_extensions:
|
||||
continue
|
||||
parts = path.relative_to(root).parts if root in path.parents or root == path.parent else path.parts
|
||||
if any(p.startswith(".") or p in EXCLUDE_DIRS for p in parts):
|
||||
continue
|
||||
if len(parts) > 8:
|
||||
continue
|
||||
|
||||
try:
|
||||
content = path.read_text(errors="replace")
|
||||
except (PermissionError, OSError):
|
||||
continue
|
||||
|
||||
lines = content.split("\n")
|
||||
for i, line in enumerate(lines):
|
||||
stripped = line.strip()
|
||||
if not stripped or stripped.startswith("#"):
|
||||
continue
|
||||
|
||||
for cmd_prefix in SHELL_COMMAND_PATTERNS:
|
||||
if cmd_prefix in stripped:
|
||||
# Normalize the command
|
||||
normalized = re.sub(r"\s+", " ", stripped)[:120]
|
||||
command_counter[normalized] += 1
|
||||
command_locations[normalized].append(f"{path}:{i+1}")
|
||||
|
||||
# Proposals for commands appearing 3+ times
|
||||
for cmd, count in command_counter.most_common(20):
|
||||
if count < 3:
|
||||
break
|
||||
locs = command_locations[cmd]
|
||||
file_count = len(set(loc.split(":")[0] for loc in locs))
|
||||
proposals.append({
|
||||
"category": "repeated_command",
|
||||
"title": f"Command repeated {count}x across {file_count} file(s): {cmd[:80]}",
|
||||
"description": f"Locations: {', '.join(locs[:3])}",
|
||||
"confidence": min(0.5 + (count * 0.1), 0.95),
|
||||
"impact": "medium",
|
||||
"sources": list(set(loc.split(":")[0] for loc in locs)),
|
||||
})
|
||||
|
||||
return proposals
|
||||
|
||||
|
||||
def analyze_session_transcripts(session_dirs: List[str]) -> List[Dict[str, Any]]:
|
||||
"""Find repeated tool-call patterns in session transcripts."""
|
||||
proposals = []
|
||||
tool_sequence_counter = Counter()
|
||||
tool_sequence_examples = {}
|
||||
|
||||
for session_dir in session_dirs:
|
||||
session_path = Path(session_dir)
|
||||
if not session_path.exists():
|
||||
continue
|
||||
|
||||
for path in session_path.rglob("*.jsonl"):
|
||||
try:
|
||||
content = path.read_text(errors="replace")
|
||||
except (PermissionError, OSError):
|
||||
continue
|
||||
|
||||
# Extract tool calls in sequence
|
||||
tool_sequence = []
|
||||
for line in content.split("\n"):
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
msg = json.loads(line)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
|
||||
# Look for tool calls in assistant messages
|
||||
if msg.get("role") == "assistant" and msg.get("tool_calls"):
|
||||
for tc in msg["tool_calls"]:
|
||||
func_name = tc.get("function", {}).get("name", "?")
|
||||
tool_sequence.append(func_name)
|
||||
|
||||
# Find 2-call sequences
|
||||
for i in range(len(tool_sequence) - 1):
|
||||
seq = (tool_sequence[i], tool_sequence[i + 1])
|
||||
tool_sequence_counter[seq] += 1
|
||||
if seq not in tool_sequence_examples:
|
||||
tool_sequence_examples[seq] = str(path.name)
|
||||
|
||||
# Find 3-call sequences
|
||||
for i in range(len(tool_sequence) - 2):
|
||||
seq = (tool_sequence[i], tool_sequence[i + 1], tool_sequence[i + 2])
|
||||
tool_sequence_counter[seq] += 1
|
||||
if seq not in tool_sequence_examples:
|
||||
tool_sequence_examples[seq] = str(path.name)
|
||||
|
||||
# Generate proposals for frequently repeated sequences
|
||||
for seq, count in tool_sequence_counter.most_common(20):
|
||||
if count < TOOL_SEQUENCE_MIN_OCCURRENCES:
|
||||
break
|
||||
seq_str = " -> ".join(seq)
|
||||
proposals.append({
|
||||
"category": "tool_sequence",
|
||||
"title": f"Tool sequence '{seq_str}' repeated {count} times",
|
||||
"description": f"Consider creating a workflow/skill that automates this sequence.",
|
||||
"confidence": min(0.5 + (count * 0.05), 0.9),
|
||||
"impact": "medium",
|
||||
"sources": [tool_sequence_examples.get(seq, "unknown")],
|
||||
})
|
||||
|
||||
return proposals
|
||||
|
||||
|
||||
def analyze_shell_history(root_dirs: List[str]) -> List[Dict[str, Any]]:
|
||||
"""Find repeated shell commands from history files."""
|
||||
proposals = []
|
||||
command_counter = Counter()
|
||||
|
||||
for root_dir in root_dirs:
|
||||
root = Path(root_dir)
|
||||
history_files = []
|
||||
|
||||
# Look for shell history files
|
||||
for name in (".bash_history", ".zsh_history", ".python_history"):
|
||||
p = root / name
|
||||
if p.exists():
|
||||
history_files.append(p)
|
||||
|
||||
# Also check in hermes home
|
||||
for p in root.glob("**/*history*"):
|
||||
if p.is_file() and p.suffix in ("", ".txt", ".log"):
|
||||
history_files.append(p)
|
||||
|
||||
for hf in history_files:
|
||||
try:
|
||||
content = hf.read_text(errors="replace")
|
||||
except (PermissionError, OSError):
|
||||
continue
|
||||
|
||||
for line in content.split("\n"):
|
||||
stripped = line.strip()
|
||||
# ZSH history format: ": 1234567890:0;command"
|
||||
stripped = re.sub(r"^:\s*\d+:\d+;", "", stripped)
|
||||
if not stripped or len(stripped) < 5:
|
||||
continue
|
||||
# Skip trivial commands
|
||||
if stripped in ("ls", "cd", "pwd", "clear", "exit"):
|
||||
continue
|
||||
command_counter[stripped] += 1
|
||||
|
||||
for cmd, count in command_counter.most_common(10):
|
||||
if count < 5:
|
||||
break
|
||||
proposals.append({
|
||||
"category": "shell_repetition",
|
||||
"title": f"Shell command run {count}+ times: {cmd[:80]}",
|
||||
"description": "Frequently repeated shell command — candidate for alias, function, or script.",
|
||||
"confidence": min(0.4 + (count * 0.05), 0.85),
|
||||
"impact": "low",
|
||||
"sources": ["shell_history"],
|
||||
})
|
||||
|
||||
return proposals
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Proposal output
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def deduplicate_proposals(proposals: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
|
||||
"""Remove duplicate proposals based on title similarity."""
|
||||
seen_titles = set()
|
||||
unique = []
|
||||
for p in proposals:
|
||||
# Normalize title for dedup
|
||||
key = re.sub(r"\d+", "N", p["title"]).lower()
|
||||
if key not in seen_titles:
|
||||
seen_titles.add(key)
|
||||
unique.append(p)
|
||||
return unique
|
||||
|
||||
|
||||
def rank_proposals(proposals: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
|
||||
"""Sort proposals by impact * confidence (highest first)."""
|
||||
impact_weight = {"critical": 1.0, "high": 0.8, "medium": 0.5, "low": 0.2}
|
||||
return sorted(
|
||||
proposals,
|
||||
key=lambda p: impact_weight.get(p.get("impact", "low"), 0.2) * p.get("confidence", 0.5),
|
||||
reverse=True,
|
||||
)
|
||||
|
||||
|
||||
def format_text_report(proposals: List[Dict[str, Any]]) -> str:
|
||||
"""Format proposals as human-readable text."""
|
||||
if not proposals:
|
||||
return "No automation opportunities found."
|
||||
|
||||
lines = [
|
||||
"=" * 70,
|
||||
" AUTOMATION OPPORTUNITY REPORT",
|
||||
f" Generated: {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M UTC')}",
|
||||
f" Proposals: {len(proposals)}",
|
||||
"=" * 70,
|
||||
"",
|
||||
]
|
||||
|
||||
for i, p in enumerate(proposals, 1):
|
||||
score = p.get("confidence", 0.5) * {"critical": 1.0, "high": 0.8, "medium": 0.5, "low": 0.2}.get(p.get("impact", "low"), 0.2)
|
||||
lines.append(f"[{i}] {p['title']}")
|
||||
lines.append(f" Category: {p['category']} | Impact: {p.get('impact','?')} | Confidence: {p.get('confidence',0):.0%} | Score: {score:.2f}")
|
||||
lines.append(f" {p['description']}")
|
||||
if p.get("sources"):
|
||||
lines.append(f" Sources: {', '.join(p['sources'][:3])}")
|
||||
lines.append("")
|
||||
|
||||
# Summary by category
|
||||
cat_counts = Counter(p["category"] for p in proposals)
|
||||
lines.append("-" * 70)
|
||||
lines.append("Summary by category:")
|
||||
for cat, count in cat_counts.most_common():
|
||||
lines.append(f" {cat}: {count}")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Main
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Find automation opportunities across the fleet")
|
||||
parser.add_argument("--hermes-home", default=os.path.expanduser("~/.hermes"),
|
||||
help="Path to hermes home directory (default: ~/.hermes)")
|
||||
parser.add_argument("--scan-dirs", nargs="*",
|
||||
help="Additional directories to scan (default: hermes-home + cwd)")
|
||||
parser.add_argument("--session-dirs", nargs="*",
|
||||
help="Session transcript directories (default: hermes-home/sessions)")
|
||||
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||
parser.add_argument("--output", "-o", help="Write proposals to file")
|
||||
parser.add_argument("--min-confidence", type=float, default=0.3,
|
||||
help="Minimum confidence threshold (default: 0.3)")
|
||||
parser.add_argument("--categories", nargs="*",
|
||||
help="Only include these categories (cron, docs, scripts, sessions, shell)")
|
||||
args = parser.parse_args()
|
||||
|
||||
hermes_home = os.path.expanduser(args.hermes_home)
|
||||
|
||||
# Default scan directories
|
||||
scan_dirs = [hermes_home, "."]
|
||||
if args.scan_dirs:
|
||||
scan_dirs.extend(args.scan_dirs)
|
||||
|
||||
session_dirs = [os.path.join(hermes_home, "sessions")]
|
||||
if args.session_dirs:
|
||||
session_dirs.extend(args.session_dirs)
|
||||
|
||||
# Also check common session locations
|
||||
for subdir in ("transcripts", "session-db"):
|
||||
p = os.path.join(hermes_home, subdir)
|
||||
if os.path.isdir(p):
|
||||
session_dirs.append(p)
|
||||
|
||||
categories = set(args.categories) if args.categories else {"cron", "docs", "scripts", "sessions", "shell"}
|
||||
|
||||
# Run analyzers
|
||||
all_proposals = []
|
||||
|
||||
if "cron" in categories:
|
||||
all_proposals.extend(analyze_cron_jobs(hermes_home))
|
||||
|
||||
if "docs" in categories:
|
||||
all_proposals.extend(analyze_documents(scan_dirs))
|
||||
|
||||
if "scripts" in categories:
|
||||
all_proposals.extend(analyze_scripts(scan_dirs))
|
||||
|
||||
if "sessions" in categories:
|
||||
all_proposals.extend(analyze_session_transcripts(session_dirs))
|
||||
|
||||
if "shell" in categories:
|
||||
all_proposals.extend(analyze_shell_history(scan_dirs))
|
||||
|
||||
# Deduplicate and rank
|
||||
all_proposals = deduplicate_proposals(all_proposals)
|
||||
all_proposals = rank_proposals(all_proposals)
|
||||
|
||||
# Filter by confidence
|
||||
all_proposals = [p for p in all_proposals if p.get("confidence", 0) >= args.min_confidence]
|
||||
|
||||
# Output
|
||||
output = {
|
||||
"generated_at": datetime.now(timezone.utc).isoformat(),
|
||||
"hermes_home": hermes_home,
|
||||
"scan_dirs": scan_dirs,
|
||||
"total_proposals": len(all_proposals),
|
||||
"proposals": all_proposals,
|
||||
}
|
||||
|
||||
if args.json:
|
||||
result = json.dumps(output, indent=2)
|
||||
else:
|
||||
result = format_text_report(all_proposals)
|
||||
|
||||
if args.output:
|
||||
with open(args.output, "w") as f:
|
||||
if args.json:
|
||||
json.dump(output, f, indent=2)
|
||||
else:
|
||||
f.write(result)
|
||||
print(f"Written to {args.output}", file=sys.stderr)
|
||||
else:
|
||||
print(result)
|
||||
|
||||
return 0 if all_proposals else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
359
scripts/bootstrapper.py
Normal file
359
scripts/bootstrapper.py
Normal file
@@ -0,0 +1,359 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Bootstrapper — assemble pre-session context from knowledge store.
|
||||
|
||||
Reads the knowledge store and produces a compact context block (2k tokens max)
|
||||
that can be injected into a new session so it starts with situational awareness.
|
||||
|
||||
Usage:
|
||||
python3 bootstrapper.py --repo the-nexus --agent mimo-sprint
|
||||
python3 bootstrapper.py --repo timmy-home --global
|
||||
python3 bootstrapper.py --global
|
||||
python3 bootstrapper.py --repo the-nexus --max-tokens 1000
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
# Resolve knowledge root relative to this script's parent
|
||||
SCRIPT_DIR = Path(__file__).resolve().parent
|
||||
REPO_ROOT = SCRIPT_DIR.parent
|
||||
KNOWLEDGE_DIR = REPO_ROOT / "knowledge"
|
||||
INDEX_PATH = KNOWLEDGE_DIR / "index.json"
|
||||
|
||||
# Approximate token count: ~4 chars per token for English text
|
||||
CHARS_PER_TOKEN = 4
|
||||
|
||||
# Category sort priority (lower = shown first)
|
||||
CATEGORY_PRIORITY = {
|
||||
"pitfall": 0,
|
||||
"tool-quirk": 1,
|
||||
"pattern": 2,
|
||||
"fact": 3,
|
||||
"question": 4,
|
||||
}
|
||||
|
||||
|
||||
def load_index(index_path: Path = INDEX_PATH) -> dict:
|
||||
"""Load and validate the knowledge index."""
|
||||
if not index_path.exists():
|
||||
return {"version": 1, "total_facts": 0, "facts": []}
|
||||
|
||||
with open(index_path) as f:
|
||||
data = json.load(f)
|
||||
|
||||
if "facts" not in data:
|
||||
print(f"WARNING: index.json missing 'facts' key", file=sys.stderr)
|
||||
return {"version": 1, "total_facts": 0, "facts": []}
|
||||
|
||||
return data
|
||||
|
||||
|
||||
def filter_facts(
|
||||
facts: list[dict],
|
||||
repo: Optional[str] = None,
|
||||
agent: Optional[str] = None,
|
||||
include_global: bool = True,
|
||||
) -> list[dict]:
|
||||
"""Filter facts by repo, agent, and global scope."""
|
||||
filtered = []
|
||||
|
||||
for fact in facts:
|
||||
fact_repo = fact.get("repo", "global")
|
||||
fact_agent = fact.get("agent", "")
|
||||
|
||||
# Match by repo (regardless of agent)
|
||||
if repo and fact_repo == repo:
|
||||
filtered.append(fact)
|
||||
continue
|
||||
|
||||
# Match by exact agent type
|
||||
if agent and fact_agent == agent:
|
||||
filtered.append(fact)
|
||||
continue
|
||||
|
||||
# Include global facts without agent restriction (universal facts)
|
||||
if include_global and fact_repo == "global" and not fact_agent:
|
||||
filtered.append(fact)
|
||||
|
||||
return filtered
|
||||
|
||||
|
||||
def sort_facts(facts: list[dict]) -> list[dict]:
|
||||
"""
|
||||
Sort facts by: confidence (desc), then category priority, then fact text.
|
||||
Most reliable and most dangerous facts come first.
|
||||
"""
|
||||
|
||||
def sort_key(f):
|
||||
confidence = f.get("confidence", 0.5)
|
||||
category = f.get("category", "fact")
|
||||
cat_priority = CATEGORY_PRIORITY.get(category, 5)
|
||||
return (-confidence, cat_priority, f.get("fact", ""))
|
||||
|
||||
return sorted(facts, key=sort_key)
|
||||
|
||||
|
||||
def load_repo_knowledge(repo: str) -> Optional[str]:
|
||||
"""Load per-repo knowledge markdown if it exists."""
|
||||
repo_path = KNOWLEDGE_DIR / "repos" / f"{repo}.md"
|
||||
if repo_path.exists():
|
||||
return repo_path.read_text().strip()
|
||||
return None
|
||||
|
||||
|
||||
def load_agent_knowledge(agent: str) -> Optional[str]:
|
||||
"""Load per-agent knowledge markdown if it exists."""
|
||||
agent_path = KNOWLEDGE_DIR / "agents" / f"{agent}.md"
|
||||
if agent_path.exists():
|
||||
return agent_path.read_text().strip()
|
||||
return None
|
||||
|
||||
|
||||
def load_global_knowledge() -> list[str]:
|
||||
"""Load all global knowledge markdown files."""
|
||||
global_dir = KNOWLEDGE_DIR / "global"
|
||||
if not global_dir.exists():
|
||||
return []
|
||||
|
||||
chunks = []
|
||||
for md_file in sorted(global_dir.glob("*.md")):
|
||||
content = md_file.read_text().strip()
|
||||
if content:
|
||||
chunks.append(content)
|
||||
return chunks
|
||||
|
||||
|
||||
def render_facts_section(facts: list[dict], category: str, label: str) -> str:
|
||||
"""Render a section of facts for a single category."""
|
||||
cat_facts = [f for f in facts if f.get("category") == category]
|
||||
if not cat_facts:
|
||||
return ""
|
||||
|
||||
lines = [f"### {label}\n"]
|
||||
for f in cat_facts:
|
||||
conf = f.get("confidence", 0.5)
|
||||
fact_text = f.get("fact", "")
|
||||
repo_tag = f.get("repo", "")
|
||||
if repo_tag and repo_tag != "global":
|
||||
lines.append(f"- [{conf:.0%}] ({repo_tag}) {fact_text}")
|
||||
else:
|
||||
lines.append(f"- [{conf:.0%}] {fact_text}")
|
||||
|
||||
return "\n".join(lines) + "\n"
|
||||
|
||||
|
||||
def estimate_tokens(text: str) -> int:
|
||||
"""Rough token estimate."""
|
||||
return len(text) // CHARS_PER_TOKEN
|
||||
|
||||
|
||||
def truncate_to_tokens(text: str, max_tokens: int) -> str:
|
||||
"""Truncate text to approximately max_tokens, cutting at line boundaries."""
|
||||
max_chars = max_tokens * CHARS_PER_TOKEN
|
||||
if len(text) <= max_chars:
|
||||
return text
|
||||
|
||||
# Cut at last newline before the limit
|
||||
truncated = text[:max_chars]
|
||||
last_newline = truncated.rfind("\n")
|
||||
if last_newline > 0:
|
||||
truncated = truncated[:last_newline]
|
||||
|
||||
return truncated + "\n\n[... truncated to fit context window ...]"
|
||||
|
||||
|
||||
def build_bootstrap_context(
|
||||
repo: Optional[str] = None,
|
||||
agent: Optional[str] = None,
|
||||
include_global: bool = True,
|
||||
max_tokens: int = 2000,
|
||||
index_path: Path = INDEX_PATH,
|
||||
) -> str:
|
||||
"""
|
||||
Build the full bootstrap context block.
|
||||
|
||||
Returns a markdown string suitable for injection into a session prompt.
|
||||
"""
|
||||
index = load_index(index_path)
|
||||
facts = index.get("facts", [])
|
||||
|
||||
# Filter
|
||||
filtered = filter_facts(facts, repo=repo, agent=agent, include_global=include_global)
|
||||
|
||||
# Sort
|
||||
sorted_facts = sort_facts(filtered)
|
||||
|
||||
# Build sections
|
||||
sections = ["## What You Know (bootstrapped)\n"]
|
||||
|
||||
# Per-repo markdown knowledge
|
||||
if repo:
|
||||
repo_md = load_repo_knowledge(repo)
|
||||
if repo_md:
|
||||
sections.append(f"### Repo Notes: {repo}\n")
|
||||
sections.append(repo_md + "\n")
|
||||
|
||||
# Structured facts by category
|
||||
if sorted_facts:
|
||||
# Group by source
|
||||
repo_facts = [f for f in sorted_facts if f.get("repo") == repo] if repo else []
|
||||
global_facts = [f for f in sorted_facts if f.get("repo") == "global"]
|
||||
agent_facts = [f for f in sorted_facts if f.get("agent") == agent] if agent else []
|
||||
|
||||
if repo_facts:
|
||||
sections.append(f"### Repo: {repo}\n")
|
||||
for cat, label in [
|
||||
("pitfall", "PITFALLS"),
|
||||
("tool-quirk", "QUIRKS"),
|
||||
("pattern", "PATTERNS"),
|
||||
("fact", "FACTS"),
|
||||
("question", "OPEN QUESTIONS"),
|
||||
]:
|
||||
section = render_facts_section(repo_facts, cat, label)
|
||||
if section:
|
||||
sections.append(section)
|
||||
|
||||
if global_facts:
|
||||
sections.append("### Global\n")
|
||||
for cat, label in [
|
||||
("pitfall", "PITFALLS"),
|
||||
("tool-quirk", "QUIRKS"),
|
||||
("pattern", "PATTERNS"),
|
||||
("fact", "FACTS"),
|
||||
]:
|
||||
section = render_facts_section(global_facts, cat, label)
|
||||
if section:
|
||||
sections.append(section)
|
||||
|
||||
if agent_facts:
|
||||
sections.append(f"### Agent Notes ({agent})\n")
|
||||
for cat, label in [
|
||||
("pitfall", "PITFALLS"),
|
||||
("tool-quirk", "QUIRKS"),
|
||||
("pattern", "PATTERNS"),
|
||||
("fact", "FACTS"),
|
||||
]:
|
||||
section = render_facts_section(agent_facts, cat, label)
|
||||
if section:
|
||||
sections.append(section)
|
||||
|
||||
# Per-agent markdown knowledge
|
||||
if agent:
|
||||
agent_md = load_agent_knowledge(agent)
|
||||
if agent_md:
|
||||
sections.append(f"### Agent Profile: {agent}\n")
|
||||
sections.append(agent_md + "\n")
|
||||
|
||||
# Global markdown knowledge
|
||||
global_chunks = load_global_knowledge()
|
||||
if global_chunks:
|
||||
sections.append("### Global Notes\n")
|
||||
sections.extend(chunk + "\n" for chunk in global_chunks)
|
||||
|
||||
# If nothing was found
|
||||
if len(sections) == 1:
|
||||
sections.append("_No relevant knowledge found. Starting fresh._\n")
|
||||
if not facts:
|
||||
sections.append(
|
||||
"_Knowledge store is empty. Run the harvester to populate it._\n"
|
||||
)
|
||||
|
||||
# Join and truncate
|
||||
context = "\n".join(sections)
|
||||
context = truncate_to_tokens(context, max_tokens)
|
||||
|
||||
return context
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Assemble pre-session context from knowledge store"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--repo",
|
||||
type=str,
|
||||
default=None,
|
||||
help="Repository name to filter facts by",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--agent",
|
||||
type=str,
|
||||
default=None,
|
||||
help="Agent type to filter facts by (e.g., mimo-sprint, groq-fast)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--global",
|
||||
dest="include_global",
|
||||
action="store_true",
|
||||
default=True,
|
||||
help="Include global facts (default: true)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--no-global",
|
||||
dest="include_global",
|
||||
action="store_false",
|
||||
help="Exclude global facts",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max-tokens",
|
||||
type=int,
|
||||
default=2000,
|
||||
help="Maximum token count for output (default: 2000)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--index",
|
||||
type=str,
|
||||
default=None,
|
||||
help="Path to index.json (default: knowledge/index.json)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--json",
|
||||
dest="output_json",
|
||||
action="store_true",
|
||||
help="Output raw JSON instead of markdown",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
index_path = Path(args.index) if args.index else INDEX_PATH
|
||||
|
||||
if args.output_json:
|
||||
# JSON mode: return the filtered, sorted facts
|
||||
index = load_index(index_path)
|
||||
facts = index.get("facts", [])
|
||||
filtered = filter_facts(
|
||||
facts,
|
||||
repo=args.repo,
|
||||
agent=args.agent,
|
||||
include_global=args.include_global,
|
||||
)
|
||||
sorted_facts = sort_facts(filtered)
|
||||
output = {
|
||||
"repo": args.repo,
|
||||
"agent": args.agent,
|
||||
"include_global": args.include_global,
|
||||
"total_indexed": len(facts),
|
||||
"matched": len(sorted_facts),
|
||||
"facts": sorted_facts,
|
||||
}
|
||||
print(json.dumps(output, indent=2))
|
||||
else:
|
||||
# Markdown mode: full bootstrap context
|
||||
context = build_bootstrap_context(
|
||||
repo=args.repo,
|
||||
agent=args.agent,
|
||||
include_global=args.include_global,
|
||||
max_tokens=args.max_tokens,
|
||||
index_path=index_path,
|
||||
)
|
||||
print(context)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
282
scripts/dead_code_detector.py
Normal file
282
scripts/dead_code_detector.py
Normal file
@@ -0,0 +1,282 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Dead Code Detector for Python Codebases
|
||||
|
||||
AST-based analysis to find defined but never-called functions and classes.
|
||||
Excludes entry points, plugin hooks, __init__ exports.
|
||||
|
||||
Usage:
|
||||
python3 scripts/dead_code_detector.py /path/to/repo/
|
||||
python3 scripts/dead_code_detector.py hermes-agent/ --format json
|
||||
python3 scripts/dead_code_detector.py . --exclude tests/,venv/
|
||||
|
||||
Output: file:line, function/class name, last git author (if available)
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import ast
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
|
||||
# Names that are expected to be unused (entry points, protocol methods, etc.)
|
||||
SAFE_UNUSED_PATTERNS = {
|
||||
# Python dunders
|
||||
"__init__", "__str__", "__repr__", "__eq__", "__hash__", "__len__",
|
||||
"__getitem__", "__setitem__", "__contains__", "__iter__", "__next__",
|
||||
"__enter__", "__exit__", "__call__", "__bool__", "__del__",
|
||||
"__post_init__", "__class_getitem__",
|
||||
# Common entry points
|
||||
"main", "app", "handler", "setup", "teardown", "fixture",
|
||||
# pytest
|
||||
"conftest", "test_", "pytest_", # prefix patterns
|
||||
# Protocols / abstract
|
||||
"abstractmethod", "abc_",
|
||||
}
|
||||
|
||||
|
||||
def is_safe_unused(name: str, filepath: str) -> bool:
|
||||
"""Check if an unused name is expected to be unused."""
|
||||
# Test files are exempt
|
||||
if "test" in filepath.lower():
|
||||
return True
|
||||
|
||||
# Known patterns
|
||||
for pattern in SAFE_UNUSED_PATTERNS:
|
||||
if name.startswith(pattern) or name == pattern:
|
||||
return True
|
||||
|
||||
# __init__.py exports are often unused internally
|
||||
if filepath.endswith("__init__.py"):
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
|
||||
def get_git_blame(filepath: str, lineno: int) -> Optional[str]:
|
||||
"""Get last author of a line via git blame."""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["git", "blame", "-L", f"{lineno},{lineno}", "--porcelain", filepath],
|
||||
capture_output=True, text=True, timeout=5
|
||||
)
|
||||
for line in result.stdout.split("\n"):
|
||||
if line.startswith("author "):
|
||||
return line[7:]
|
||||
except:
|
||||
pass
|
||||
return None
|
||||
|
||||
|
||||
class DefinitionCollector(ast.NodeVisitor):
|
||||
"""Collect all function and class definitions."""
|
||||
|
||||
def __init__(self):
|
||||
self.definitions = [] # (name, type, lineno, filepath)
|
||||
|
||||
def visit_FunctionDef(self, node):
|
||||
self.definitions.append((node.name, "function", node.lineno))
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_AsyncFunctionDef(self, node):
|
||||
self.definitions.append((node.name, "async_function", node.lineno))
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_ClassDef(self, node):
|
||||
self.definitions.append((node.name, "class", node.lineno))
|
||||
self.generic_visit(node)
|
||||
|
||||
|
||||
class NameUsageCollector(ast.NodeVisitor):
|
||||
"""Collect all name references (calls, imports, attribute access)."""
|
||||
|
||||
def __init__(self):
|
||||
self.names = set()
|
||||
self.calls = set()
|
||||
self.imports = set()
|
||||
|
||||
def visit_Name(self, node):
|
||||
self.names.add(node.id)
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_Attribute(self, node):
|
||||
if isinstance(node.value, ast.Name):
|
||||
self.names.add(node.value.id)
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_Call(self, node):
|
||||
if isinstance(node.func, ast.Name):
|
||||
self.calls.add(node.func.id)
|
||||
elif isinstance(node.func, ast.Attribute):
|
||||
if isinstance(node.func.value, ast.Name):
|
||||
self.names.add(node.func.value.id)
|
||||
self.calls.add(node.func.attr)
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_Import(self, node):
|
||||
for alias in node.names:
|
||||
self.imports.add(alias.asname or alias.name)
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_ImportFrom(self, node):
|
||||
for alias in node.names:
|
||||
self.imports.add(alias.asname or alias.name)
|
||||
self.generic_visit(node)
|
||||
|
||||
|
||||
def analyze_file(filepath: str) -> dict:
|
||||
"""Analyze a single Python file for dead code."""
|
||||
path = Path(filepath)
|
||||
try:
|
||||
content = path.read_text()
|
||||
tree = ast.parse(content, filename=str(filepath))
|
||||
except (SyntaxError, UnicodeDecodeError):
|
||||
return {"error": f"Could not parse {filepath}"}
|
||||
|
||||
# Collect definitions
|
||||
def_collector = DefinitionCollector()
|
||||
def_collector.visit(tree)
|
||||
definitions = def_collector.definitions
|
||||
|
||||
# Collect usage
|
||||
usage_collector = NameUsageCollector()
|
||||
usage_collector.visit(tree)
|
||||
used_names = usage_collector.names | usage_collector.calls | usage_collector.imports
|
||||
|
||||
# Also scan the entire repo for references to this file's definitions
|
||||
# (this is done at the repo level, not file level)
|
||||
|
||||
dead = []
|
||||
for name, def_type, lineno in definitions:
|
||||
if name.startswith("_") and not name.startswith("__"):
|
||||
# Private functions — might be used externally, less likely dead
|
||||
pass
|
||||
|
||||
if name not in used_names:
|
||||
if not is_safe_unused(name, filepath):
|
||||
dead.append({
|
||||
"name": name,
|
||||
"type": def_type,
|
||||
"file": filepath,
|
||||
"line": lineno,
|
||||
})
|
||||
|
||||
return {"definitions": len(definitions), "dead": dead}
|
||||
|
||||
|
||||
def scan_repo(repo_path: str, exclude_patterns: list = None) -> dict:
|
||||
"""Scan an entire repo for dead code."""
|
||||
path = Path(repo_path)
|
||||
exclude = exclude_patterns or ["venv", ".venv", "node_modules", "__pycache__",
|
||||
".git", "dist", "build", ".tox", "vendor"]
|
||||
|
||||
all_definitions = {} # name -> [{file, line, type}]
|
||||
all_files = []
|
||||
dead_code = []
|
||||
|
||||
# First pass: collect all definitions across repo
|
||||
for fpath in path.rglob("*.py"):
|
||||
parts = fpath.parts
|
||||
if any(ex in parts for ex in exclude):
|
||||
continue
|
||||
if fpath.name.startswith("."):
|
||||
continue
|
||||
|
||||
try:
|
||||
content = fpath.read_text(errors="ignore")
|
||||
tree = ast.parse(content, filename=str(fpath))
|
||||
except:
|
||||
continue
|
||||
|
||||
all_files.append(str(fpath))
|
||||
collector = DefinitionCollector()
|
||||
collector.visit(tree)
|
||||
|
||||
for name, def_type, lineno in collector.definitions:
|
||||
rel_path = str(fpath.relative_to(path))
|
||||
if name not in all_definitions:
|
||||
all_definitions[name] = []
|
||||
all_definitions[name].append({
|
||||
"file": rel_path,
|
||||
"line": lineno,
|
||||
"type": def_type,
|
||||
})
|
||||
|
||||
# Second pass: check each name for usage across entire repo
|
||||
all_used_names = set()
|
||||
for fpath_str in all_files:
|
||||
try:
|
||||
content = Path(fpath_str).read_text(errors="ignore")
|
||||
tree = ast.parse(content)
|
||||
except:
|
||||
continue
|
||||
usage = NameUsageCollector()
|
||||
usage.visit(tree)
|
||||
all_used_names.update(usage.names)
|
||||
all_used_names.update(usage.calls)
|
||||
all_used_names.update(usage.imports)
|
||||
|
||||
# Find dead code
|
||||
for name, locations in all_definitions.items():
|
||||
if name not in all_used_names:
|
||||
for loc in locations:
|
||||
if not is_safe_unused(name, loc["file"]):
|
||||
dead_code.append({
|
||||
"name": name,
|
||||
"type": loc["type"],
|
||||
"file": loc["file"],
|
||||
"line": loc["line"],
|
||||
})
|
||||
|
||||
return {
|
||||
"repo": path.name,
|
||||
"files_scanned": len(all_files),
|
||||
"total_definitions": sum(len(v) for v in all_definitions.values()),
|
||||
"dead_code_count": len(dead_code),
|
||||
"dead_code": sorted(dead_code, key=lambda x: (x["file"], x["line"])),
|
||||
}
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Find dead code in Python codebases")
|
||||
parser.add_argument("repo", help="Repository path to scan")
|
||||
parser.add_argument("--format", choices=["text", "json"], default="text")
|
||||
parser.add_argument("--exclude", help="Comma-separated patterns to exclude")
|
||||
parser.add_argument("--git-blame", action="store_true", help="Include git blame info")
|
||||
args = parser.parse_args()
|
||||
|
||||
exclude = args.exclude.split(",") if args.exclude else None
|
||||
result = scan_repo(args.repo, exclude)
|
||||
|
||||
if args.format == "json":
|
||||
print(json.dumps(result, indent=2))
|
||||
else:
|
||||
print(f"Dead Code Report: {result['repo']}")
|
||||
print(f"Files scanned: {result['files_scanned']}")
|
||||
print(f"Total definitions: {result['total_definitions']}")
|
||||
print(f"Dead code found: {result['dead_code_count']}")
|
||||
print()
|
||||
|
||||
if result["dead_code"]:
|
||||
print(f"{'File':<45} {'Line':>4} {'Type':<10} {'Name'}")
|
||||
print("-" * 85)
|
||||
for item in result["dead_code"]:
|
||||
author = ""
|
||||
if args.git_blame:
|
||||
author = get_git_blame(
|
||||
os.path.join(args.repo, item["file"]),
|
||||
item["line"]
|
||||
) or ""
|
||||
author = f" ({author})" if author else ""
|
||||
print(f"{item['file']:<45} {item['line']:>4} {item['type']:<10} {item['name']}{author}")
|
||||
else:
|
||||
print("No dead code detected!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
317
scripts/dedup.py
Normal file
317
scripts/dedup.py
Normal file
@@ -0,0 +1,317 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
dedup.py — Knowledge deduplication: content hash + semantic similarity.
|
||||
|
||||
Deduplicates harvested knowledge entries to avoid training on duplicates.
|
||||
Uses content hashing for exact matches and token overlap for near-duplicates.
|
||||
|
||||
Usage:
|
||||
python3 dedup.py --input knowledge/index.json --output knowledge/index_deduped.json
|
||||
python3 dedup.py --input knowledge/index.json --dry-run
|
||||
python3 dedup.py --test # Run built-in dedup test
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import hashlib
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Optional, Tuple
|
||||
|
||||
|
||||
def normalize_text(text: str) -> str:
|
||||
"""Normalize text for hashing: lowercase, collapse whitespace, strip."""
|
||||
text = text.lower().strip()
|
||||
text = re.sub(r'\s+', ' ', text)
|
||||
return text
|
||||
|
||||
|
||||
def content_hash(text: str) -> str:
|
||||
"""SHA256 hash of normalized text for exact dedup."""
|
||||
normalized = normalize_text(text)
|
||||
return hashlib.sha256(normalized.encode('utf-8')).hexdigest()
|
||||
|
||||
|
||||
def tokenize(text: str) -> set:
|
||||
"""Simple tokenizer: lowercase words, 3+ chars."""
|
||||
words = re.findall(r'[a-z0-9_]{3,}', text.lower())
|
||||
return set(words)
|
||||
|
||||
|
||||
def token_similarity(a: str, b: str) -> float:
|
||||
"""Token-based Jaccard similarity (0.0-1.0).
|
||||
|
||||
Fast local alternative to embedding similarity.
|
||||
Good enough for near-duplicate detection.
|
||||
"""
|
||||
tokens_a = tokenize(a)
|
||||
tokens_b = tokenize(b)
|
||||
if not tokens_a or not tokens_b:
|
||||
return 0.0
|
||||
intersection = tokens_a & tokens_b
|
||||
union = tokens_a | tokens_b
|
||||
return len(intersection) / len(union)
|
||||
|
||||
|
||||
def quality_score(fact: dict) -> float:
|
||||
"""Compute quality score for merge ranking.
|
||||
|
||||
Higher is better. Factors:
|
||||
- confidence (0-1)
|
||||
- source_count (more confirmations = better)
|
||||
- has tags (richer metadata)
|
||||
"""
|
||||
confidence = fact.get('confidence', 0.5)
|
||||
source_count = fact.get('source_count', 1)
|
||||
has_tags = 1.0 if fact.get('tags') else 0.0
|
||||
has_related = 1.0 if fact.get('related') else 0.0
|
||||
|
||||
# Weighted composite
|
||||
score = (
|
||||
confidence * 0.5 +
|
||||
min(source_count / 10, 1.0) * 0.3 +
|
||||
has_tags * 0.1 +
|
||||
has_related * 0.1
|
||||
)
|
||||
return round(score, 4)
|
||||
|
||||
|
||||
def merge_facts(keep: dict, drop: dict) -> dict:
|
||||
"""Merge two near-duplicate facts, keeping higher-quality fields.
|
||||
|
||||
The 'keep' fact is enriched with metadata from 'drop'.
|
||||
"""
|
||||
# Merge tags (union)
|
||||
keep_tags = set(keep.get('tags', []))
|
||||
drop_tags = set(drop.get('tags', []))
|
||||
keep['tags'] = sorted(keep_tags | drop_tags)
|
||||
|
||||
# Merge related (union)
|
||||
keep_related = set(keep.get('related', []))
|
||||
drop_related = set(drop.get('related', []))
|
||||
keep['related'] = sorted(keep_related | drop_related)
|
||||
|
||||
# Update source_count (sum)
|
||||
keep['source_count'] = keep.get('source_count', 1) + drop.get('source_count', 1)
|
||||
|
||||
# Update confidence (max — we've now seen it from multiple sources)
|
||||
keep['confidence'] = max(keep.get('confidence', 0), drop.get('confidence', 0))
|
||||
|
||||
# Track that we merged
|
||||
if '_merged_from' not in keep:
|
||||
keep['_merged_from'] = []
|
||||
keep['_merged_from'].append(drop.get('id', 'unknown'))
|
||||
|
||||
return keep
|
||||
|
||||
|
||||
def dedup_facts(
|
||||
facts: List[dict],
|
||||
exact_threshold: float = 1.0,
|
||||
near_threshold: float = 0.95,
|
||||
dry_run: bool = False,
|
||||
) -> Tuple[List[dict], dict]:
|
||||
"""Deduplicate a list of knowledge facts.
|
||||
|
||||
Args:
|
||||
facts: List of fact dicts (from index.json)
|
||||
exact_threshold: Hash match = exact duplicate
|
||||
near_threshold: Token similarity above this = near-duplicate
|
||||
dry_run: If True, don't modify, just report
|
||||
|
||||
Returns:
|
||||
(deduped_facts, stats_dict)
|
||||
"""
|
||||
if not facts:
|
||||
return [], {"total": 0, "exact_dupes": 0, "near_dupes": 0, "unique": 0}
|
||||
|
||||
# Phase 1: Exact dedup by content hash
|
||||
hash_seen = {} # hash -> index in deduped list
|
||||
exact_dupes = 0
|
||||
deduped = []
|
||||
|
||||
for fact in facts:
|
||||
text = fact.get('fact', '')
|
||||
h = content_hash(text)
|
||||
|
||||
if h in hash_seen:
|
||||
# Exact duplicate — merge metadata into existing
|
||||
existing_idx = hash_seen[h]
|
||||
if not dry_run:
|
||||
deduped[existing_idx] = merge_facts(deduped[existing_idx], fact)
|
||||
exact_dupes += 1
|
||||
else:
|
||||
hash_seen[h] = len(deduped)
|
||||
deduped.append(fact)
|
||||
|
||||
# Phase 2: Near-dup by token similarity
|
||||
near_dupes = 0
|
||||
i = 0
|
||||
while i < len(deduped):
|
||||
j = i + 1
|
||||
while j < len(deduped):
|
||||
sim = token_similarity(deduped[i].get('fact', ''), deduped[j].get('fact', ''))
|
||||
if sim >= near_threshold:
|
||||
# Near-duplicate — keep higher quality
|
||||
q_i = quality_score(deduped[i])
|
||||
q_j = quality_score(deduped[j])
|
||||
if q_i >= q_j:
|
||||
if not dry_run:
|
||||
deduped[i] = merge_facts(deduped[i], deduped[j])
|
||||
deduped.pop(j)
|
||||
else:
|
||||
# j is higher quality — merge i into j, then remove i
|
||||
if not dry_run:
|
||||
deduped[j] = merge_facts(deduped[j], deduped[i])
|
||||
deduped.pop(i)
|
||||
break # i changed, restart inner loop
|
||||
near_dupes += 1
|
||||
else:
|
||||
j += 1
|
||||
i += 1
|
||||
|
||||
stats = {
|
||||
"total": len(facts),
|
||||
"exact_dupes": exact_dupes,
|
||||
"near_dupes": near_dupes,
|
||||
"unique": len(deduped),
|
||||
"removed": len(facts) - len(deduped),
|
||||
}
|
||||
|
||||
return deduped, stats
|
||||
|
||||
|
||||
def dedup_index_file(
|
||||
input_path: str,
|
||||
output_path: Optional[str] = None,
|
||||
near_threshold: float = 0.95,
|
||||
dry_run: bool = False,
|
||||
) -> dict:
|
||||
"""Deduplicate an index.json file.
|
||||
|
||||
Args:
|
||||
input_path: Path to index.json
|
||||
output_path: Where to write deduped file (default: overwrite input)
|
||||
near_threshold: Token similarity threshold for near-dupes
|
||||
dry_run: Report only, don't write
|
||||
|
||||
Returns stats dict.
|
||||
"""
|
||||
path = Path(input_path)
|
||||
if not path.exists():
|
||||
raise FileNotFoundError(f"Index file not found: {input_path}")
|
||||
|
||||
with open(path) as f:
|
||||
data = json.load(f)
|
||||
|
||||
facts = data.get('facts', [])
|
||||
deduped, stats = dedup_facts(facts, near_threshold=near_threshold, dry_run=dry_run)
|
||||
|
||||
if not dry_run:
|
||||
data['facts'] = deduped
|
||||
data['total_facts'] = len(deduped)
|
||||
data['last_dedup'] = __import__('datetime').datetime.now(
|
||||
__import__('datetime').timezone.utc
|
||||
).isoformat()
|
||||
|
||||
out_path = Path(output_path) if output_path else path
|
||||
with open(out_path, 'w') as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
|
||||
return stats
|
||||
|
||||
|
||||
def generate_test_duplicates(n: int = 20) -> List[dict]:
|
||||
"""Generate test facts with intentional duplicates for testing.
|
||||
|
||||
Creates n unique facts plus n/4 exact dupes and n/4 near-dupes.
|
||||
"""
|
||||
import random
|
||||
random.seed(42)
|
||||
|
||||
unique_facts = []
|
||||
for i in range(n):
|
||||
topic = random.choice(["git", "python", "docker", "rust", "nginx"])
|
||||
tip = random.choice(["use verbose flags", "check logs first", "restart service", "clear cache", "update config"])
|
||||
unique_facts.append({
|
||||
"id": f"test:fact:{i:03d}",
|
||||
"fact": f"When working with {topic}, always {tip} before deploying.",
|
||||
"category": "fact",
|
||||
"domain": "test",
|
||||
"confidence": round(random.uniform(0.5, 1.0), 2),
|
||||
"source_count": random.randint(1, 5),
|
||||
"tags": [topic, "test"],
|
||||
})
|
||||
|
||||
# Add exact duplicates (same text, different IDs)
|
||||
duped = list(unique_facts)
|
||||
for i in range(n // 4):
|
||||
original = unique_facts[i]
|
||||
dupe = dict(original)
|
||||
dupe["id"] = f"test:fact:dup{i:03d}"
|
||||
dupe["confidence"] = round(random.uniform(0.3, 0.8), 2)
|
||||
duped.append(dupe)
|
||||
|
||||
# Add near-duplicates (slightly different phrasing)
|
||||
for i in range(n // 4):
|
||||
original = unique_facts[i]
|
||||
near = dict(original)
|
||||
near["id"] = f"test:fact:near{i:03d}"
|
||||
near["fact"] = original["fact"].replace("always", "should").replace("before deploying", "prior to deployment")
|
||||
near["confidence"] = round(random.uniform(0.4, 0.9), 2)
|
||||
duped.append(near)
|
||||
|
||||
return duped
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Knowledge deduplication")
|
||||
parser.add_argument("--input", help="Path to index.json")
|
||||
parser.add_argument("--output", help="Output path (default: overwrite input)")
|
||||
parser.add_argument("--threshold", type=float, default=0.95,
|
||||
help="Near-dup similarity threshold (default: 0.95)")
|
||||
parser.add_argument("--dry-run", action="store_true", help="Report only, don't write")
|
||||
parser.add_argument("--test", action="store_true", help="Run built-in dedup test")
|
||||
parser.add_argument("--json", action="store_true", help="JSON output")
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.test:
|
||||
test_facts = generate_test_duplicates(20)
|
||||
print(f"Generated {len(test_facts)} test facts (20 unique + dupes)")
|
||||
deduped, stats = dedup_facts(test_facts, near_threshold=args.threshold)
|
||||
print(f"\nDedup results:")
|
||||
print(f" Total input: {stats['total']}")
|
||||
print(f" Exact dupes: {stats['exact_dupes']}")
|
||||
print(f" Near dupes: {stats['near_dupes']}")
|
||||
print(f" Unique output: {stats['unique']}")
|
||||
print(f" Removed: {stats['removed']}")
|
||||
|
||||
# Verify: should have ~20 unique (some merged)
|
||||
assert stats['unique'] <= 20, f"Too many unique: {stats['unique']} > 20"
|
||||
assert stats['unique'] >= 15, f"Too few unique: {stats['unique']} < 15"
|
||||
assert stats['removed'] > 0, "No duplicates removed"
|
||||
print("\nOK: Dedup test passed")
|
||||
return
|
||||
|
||||
if not args.input:
|
||||
print("ERROR: Provide --input or --test")
|
||||
sys.exit(1)
|
||||
|
||||
stats = dedup_index_file(args.input, args.output, args.threshold, args.dry_run)
|
||||
|
||||
if args.json:
|
||||
print(json.dumps(stats, indent=2))
|
||||
else:
|
||||
print(f"Dedup results:")
|
||||
print(f" Total input: {stats['total']}")
|
||||
print(f" Exact dupes: {stats['exact_dupes']}")
|
||||
print(f" Near dupes: {stats['near_dupes']}")
|
||||
print(f" Unique output: {stats['unique']}")
|
||||
print(f" Removed: {stats['removed']}")
|
||||
if args.dry_run:
|
||||
print(" (dry run — no changes written)")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
249
scripts/dependency_graph.py
Normal file
249
scripts/dependency_graph.py
Normal file
@@ -0,0 +1,249 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Cross-Repo Dependency Graph Builder
|
||||
|
||||
Scans repos for import/require/reference patterns and builds a directed
|
||||
dependency graph. Detects circular dependencies. Outputs DOT and Mermaid.
|
||||
|
||||
Usage:
|
||||
python3 scripts/dependency_graph.py /path/to/repos/
|
||||
python3 scripts/dependency_graph.py --repos repo1,repo2,repo3 --format mermaid
|
||||
python3 scripts/dependency_graph.py --repos-dir /path/to/ --format dot --output deps.dot
|
||||
|
||||
Patterns detected:
|
||||
- Python: import X, from X import Y
|
||||
- JavaScript: require("X"), import ... from "X"
|
||||
- Go: import "X"
|
||||
- Ansible: include_role, import_role
|
||||
- Docker/Compose: image: X, depends_on
|
||||
- Config references: repo-name in YAML/TOML/JSON
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
# Known repo names for matching
|
||||
KNOWN_REPOS = [
|
||||
"hermes-agent", "timmy-config", "timmy-home", "the-nexus", "the-door",
|
||||
"the-beacon", "fleet-ops", "burn-fleet", "timmy-dispatch", "turboquant",
|
||||
"compounding-intelligence", "the-playground", "second-son-of-timmy",
|
||||
"ai-safety-review", "the-echo-pattern", "timmy-academy", "wolf",
|
||||
"the-testament",
|
||||
]
|
||||
|
||||
|
||||
def normalize_repo_name(name: str) -> str:
|
||||
"""Normalize a repo name for comparison."""
|
||||
return name.lower().replace("_", "-").replace(".git", "").strip()
|
||||
|
||||
|
||||
def scan_file_for_deps(filepath: str, content: str, own_repo: str) -> set:
|
||||
"""Scan a file's content for references to other repos."""
|
||||
deps = set()
|
||||
own_norm = normalize_repo_name(own_repo)
|
||||
|
||||
for repo in KNOWN_REPOS:
|
||||
repo_norm = normalize_repo_name(repo)
|
||||
if repo_norm == own_norm:
|
||||
continue
|
||||
|
||||
# Direct name references
|
||||
patterns = [
|
||||
repo, # exact name
|
||||
repo.replace("-", "_"), # underscore variant
|
||||
repo.replace("-", ""), # no separator
|
||||
f"/{repo}/", # path reference
|
||||
f'"{repo}"', # quoted
|
||||
f"'{repo}'", # single quoted
|
||||
f"Timmy_Foundation/{repo}", # full Gitea path
|
||||
f"Timmy_Foundation.{repo}", # Python module path
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
if pattern in content:
|
||||
deps.add(repo)
|
||||
break
|
||||
|
||||
return deps
|
||||
|
||||
|
||||
def scan_repo(repo_path: str, repo_name: str = None) -> dict:
|
||||
"""Scan a repo directory for dependencies."""
|
||||
path = Path(repo_path)
|
||||
if not path.is_dir():
|
||||
return {"error": f"Not a directory: {repo_path}"}
|
||||
|
||||
if not repo_name:
|
||||
repo_name = path.name
|
||||
|
||||
deps = set()
|
||||
files_scanned = 0
|
||||
exts = {".py", ".js", ".ts", ".go", ".yaml", ".yml", ".toml", ".json",
|
||||
".md", ".sh", ".bash", ".Dockerfile", ".tf", ".hcl"}
|
||||
|
||||
for fpath in path.rglob("*"):
|
||||
if not fpath.is_file():
|
||||
continue
|
||||
if fpath.suffix not in exts:
|
||||
continue
|
||||
# Skip common non-source dirs
|
||||
parts = fpath.parts
|
||||
if any(p in (".git", "node_modules", "__pycache__", ".venv", "venv",
|
||||
"vendor", "dist", "build", ".tox") for p in parts):
|
||||
continue
|
||||
|
||||
try:
|
||||
content = fpath.read_text(errors="ignore")
|
||||
except:
|
||||
continue
|
||||
|
||||
file_deps = scan_file_for_deps(str(fpath), content, repo_name)
|
||||
deps.update(file_deps)
|
||||
files_scanned += 1
|
||||
|
||||
return {
|
||||
"repo": repo_name,
|
||||
"dependencies": sorted(deps),
|
||||
"files_scanned": files_scanned,
|
||||
}
|
||||
|
||||
|
||||
def detect_cycles(graph: dict) -> list:
|
||||
"""Detect circular dependencies using DFS."""
|
||||
cycles = []
|
||||
visited = set()
|
||||
rec_stack = set()
|
||||
|
||||
def dfs(node, path):
|
||||
visited.add(node)
|
||||
rec_stack.add(node)
|
||||
|
||||
for neighbor in graph.get(node, {}).get("dependencies", []):
|
||||
if neighbor not in visited:
|
||||
result = dfs(neighbor, path + [neighbor])
|
||||
if result:
|
||||
return result
|
||||
elif neighbor in rec_stack:
|
||||
cycle_start = path.index(neighbor)
|
||||
return path[cycle_start:] + [neighbor]
|
||||
|
||||
rec_stack.remove(node)
|
||||
return None
|
||||
|
||||
for node in graph:
|
||||
if node not in visited:
|
||||
cycle = dfs(node, [node])
|
||||
if cycle:
|
||||
cycles.append(cycle)
|
||||
|
||||
return cycles
|
||||
|
||||
|
||||
def to_dot(graph: dict) -> str:
|
||||
"""Generate DOT format output."""
|
||||
lines = ["digraph dependencies {"]
|
||||
lines.append(" rankdir=LR;")
|
||||
lines.append(' node [shape=box, style=filled, fillcolor="#1a1a2e", fontcolor="#e6edf3"];')
|
||||
lines.append(' edge [color="#4a4a6a"];')
|
||||
lines.append("")
|
||||
|
||||
for repo, data in sorted(graph.items()):
|
||||
dep_count = len(data.get("dependencies", []))
|
||||
fill = "#2d1b69" if dep_count > 2 else "#16213e"
|
||||
lines.append(f' "{repo}" [fillcolor="{fill}"];')
|
||||
for dep in data.get("dependencies", []):
|
||||
lines.append(f' "{repo}" -> "{dep}";')
|
||||
|
||||
lines.append("}")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def to_mermaid(graph: dict) -> str:
|
||||
"""Generate Mermaid format output."""
|
||||
lines = ["graph LR"]
|
||||
|
||||
for repo, data in sorted(graph.items()):
|
||||
for dep in data.get("dependencies", []):
|
||||
lines.append(f" {repo.replace('-','_')} --> {dep.replace('-','_')}")
|
||||
|
||||
# Add node labels
|
||||
lines.append("")
|
||||
for repo in sorted(graph.keys()):
|
||||
lines.append(f" {repo.replace('-','_')}[{repo}]")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Build cross-repo dependency graph")
|
||||
parser.add_argument("repos_dir", nargs="?", help="Directory containing repos")
|
||||
parser.add_argument("--repos", help="Comma-separated list of repo paths")
|
||||
parser.add_argument("--format", choices=["dot", "mermaid", "json"], default="json")
|
||||
parser.add_argument("--output", "-o", help="Output file (default: stdout)")
|
||||
parser.add_argument("--cycles-only", action="store_true", help="Only report cycles")
|
||||
args = parser.parse_args()
|
||||
|
||||
results = {}
|
||||
repo_paths = []
|
||||
|
||||
if args.repos:
|
||||
repo_paths = [p.strip() for p in args.repos.split(",")]
|
||||
elif args.repos_dir:
|
||||
base = Path(args.repos_dir)
|
||||
repo_paths = [str(p) for p in base.iterdir() if p.is_dir() and not p.name.startswith(".")]
|
||||
else:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
for rpath in repo_paths:
|
||||
name = Path(rpath).name
|
||||
print(f"Scanning {name}...", file=sys.stderr)
|
||||
result = scan_repo(rpath, name)
|
||||
if "error" not in result:
|
||||
results[name] = result
|
||||
|
||||
# Detect cycles
|
||||
cycles = detect_cycles(results)
|
||||
|
||||
if args.cycles_only:
|
||||
if cycles:
|
||||
print("CIRCULAR DEPENDENCIES DETECTED:")
|
||||
for cycle in cycles:
|
||||
print(f" {' -> '.join(cycle)}")
|
||||
sys.exit(1)
|
||||
else:
|
||||
print("No circular dependencies found.")
|
||||
sys.exit(0)
|
||||
|
||||
# Output
|
||||
output = {}
|
||||
if args.format == "dot":
|
||||
output = to_dot(results)
|
||||
elif args.format == "mermaid":
|
||||
output = to_mermaid(results)
|
||||
else:
|
||||
output = json.dumps({
|
||||
"repos": results,
|
||||
"cycles": cycles,
|
||||
"summary": {
|
||||
"total_repos": len(results),
|
||||
"total_deps": sum(len(r["dependencies"]) for r in results.values()),
|
||||
"cycles_found": len(cycles),
|
||||
}
|
||||
}, indent=2)
|
||||
|
||||
if args.output:
|
||||
Path(args.output).write_text(output)
|
||||
print(f"Written to {args.output}", file=sys.stderr)
|
||||
else:
|
||||
print(output)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
216
scripts/diff_analyzer.py
Normal file
216
scripts/diff_analyzer.py
Normal file
@@ -0,0 +1,216 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Diff Analyzer — Parse unified diffs and categorize every change.
|
||||
|
||||
Pipeline 6.1 for Compounding Intelligence.
|
||||
"""
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass, field, asdict
|
||||
from enum import Enum
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
class ChangeCategory(Enum):
|
||||
ADDED = "added"
|
||||
DELETED = "deleted"
|
||||
MODIFIED = "modified"
|
||||
MOVED = "moved"
|
||||
CONTEXT = "context"
|
||||
|
||||
|
||||
@dataclass
|
||||
class Hunk:
|
||||
"""A single diff hunk with header, line ranges, and category."""
|
||||
header: str
|
||||
old_start: int
|
||||
old_count: int
|
||||
new_start: int
|
||||
new_count: int
|
||||
lines: List[str] = field(default_factory=list)
|
||||
category: ChangeCategory = ChangeCategory.CONTEXT
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d = asdict(self)
|
||||
d["category"] = self.category.value
|
||||
return d
|
||||
|
||||
|
||||
@dataclass
|
||||
class FileChange:
|
||||
"""A single file's changes."""
|
||||
path: str
|
||||
old_path: Optional[str] = None # For renames
|
||||
hunks: List[Hunk] = field(default_factory=list)
|
||||
added_lines: int = 0
|
||||
deleted_lines: int = 0
|
||||
is_new: bool = False
|
||||
is_deleted: bool = False
|
||||
is_renamed: bool = False
|
||||
is_binary: bool = False
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
"path": self.path,
|
||||
"old_path": self.old_path,
|
||||
"hunks": [h.to_dict() for h in self.hunks],
|
||||
"added_lines": self.added_lines,
|
||||
"deleted_lines": self.deleted_lines,
|
||||
"is_new": self.is_new,
|
||||
"is_deleted": self.is_deleted,
|
||||
"is_renamed": self.is_renamed,
|
||||
"is_binary": self.is_binary,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class ChangeSummary:
|
||||
"""Aggregate stats + per-file breakdown."""
|
||||
files: List[FileChange] = field(default_factory=list)
|
||||
total_added: int = 0
|
||||
total_deleted: int = 0
|
||||
total_files_changed: int = 0
|
||||
total_hunks: int = 0
|
||||
new_files: int = 0
|
||||
deleted_files: int = 0
|
||||
renamed_files: int = 0
|
||||
binary_files: int = 0
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
"total_files_changed": self.total_files_changed,
|
||||
"total_added": self.total_added,
|
||||
"total_deleted": self.total_deleted,
|
||||
"total_hunks": self.total_hunks,
|
||||
"new_files": self.new_files,
|
||||
"deleted_files": self.deleted_files,
|
||||
"renamed_files": self.renamed_files,
|
||||
"binary_files": self.binary_files,
|
||||
"files": [f.to_dict() for f in self.files],
|
||||
}
|
||||
|
||||
|
||||
class DiffAnalyzer:
|
||||
"""Parses unified diff format and produces structured ChangeSummary."""
|
||||
|
||||
HUNK_HEADER_RE = re.compile(r"^@@\s+-(\d+)(?:,(\d+))?\s+\+(\d+)(?:,(\d+))?\s+@@(.*)$")
|
||||
DIFF_FILE_RE = re.compile(r"^diff --git a/(.*) b/(.*)")
|
||||
RENAME_RE = re.compile(r"^rename from (.+)$")
|
||||
RENAME_TO_RE = re.compile(r"^rename to (.+)$")
|
||||
NEW_FILE_RE = re.compile(r"^new file mode")
|
||||
DELETED_FILE_RE = re.compile(r"^deleted file mode")
|
||||
BINARY_RE = re.compile(r"^Binary files .* differ")
|
||||
|
||||
def analyze(self, diff_text: str) -> ChangeSummary:
|
||||
"""Parse a unified diff and return a ChangeSummary."""
|
||||
summary = ChangeSummary()
|
||||
if not diff_text or not diff_text.strip():
|
||||
return summary
|
||||
|
||||
# Split diff into per-file sections
|
||||
file_diffs = self._split_files(diff_text)
|
||||
|
||||
for file_diff in file_diffs:
|
||||
fc = self._parse_file_diff(file_diff)
|
||||
summary.files.append(fc)
|
||||
summary.total_added += fc.added_lines
|
||||
summary.total_deleted += fc.deleted_lines
|
||||
summary.total_hunks += len(fc.hunks)
|
||||
if fc.is_new:
|
||||
summary.new_files += 1
|
||||
if fc.is_deleted:
|
||||
summary.deleted_files += 1
|
||||
if fc.is_renamed:
|
||||
summary.renamed_files += 1
|
||||
if fc.is_binary:
|
||||
summary.binary_files += 1
|
||||
|
||||
summary.total_files_changed = len(summary.files)
|
||||
return summary
|
||||
|
||||
def _split_files(self, diff_text: str) -> List[str]:
|
||||
"""Split a multi-file diff into individual file diffs."""
|
||||
lines = diff_text.split("\n")
|
||||
chunks = []
|
||||
current = []
|
||||
for line in lines:
|
||||
if line.startswith("diff --git ") and current:
|
||||
chunks.append("\n".join(current))
|
||||
current = [line]
|
||||
else:
|
||||
current.append(line)
|
||||
if current:
|
||||
chunks.append("\n".join(current))
|
||||
return chunks
|
||||
|
||||
def _parse_file_diff(self, diff_text: str) -> FileChange:
|
||||
"""Parse a single file's diff section."""
|
||||
lines = diff_text.split("\n")
|
||||
fc = FileChange(path="")
|
||||
|
||||
# Extract file paths
|
||||
for line in lines:
|
||||
m = self.DIFF_FILE_RE.match(line)
|
||||
if m:
|
||||
fc.path = m.group(2)
|
||||
break
|
||||
|
||||
# Check for special states
|
||||
for line in lines:
|
||||
if self.NEW_FILE_RE.match(line):
|
||||
fc.is_new = True
|
||||
elif self.DELETED_FILE_RE.match(line):
|
||||
fc.is_deleted = True
|
||||
elif self.RENAME_RE.match(line):
|
||||
fc.old_path = m.group(1) if (m := self.RENAME_RE.match(line)) else None
|
||||
fc.is_renamed = True
|
||||
elif self.BINARY_RE.match(line):
|
||||
fc.is_binary = True
|
||||
return fc # No hunks for binary
|
||||
|
||||
# Rename TO
|
||||
for line in lines:
|
||||
m = self.RENAME_TO_RE.match(line)
|
||||
if m and fc.is_renamed:
|
||||
fc.path = m.group(1)
|
||||
|
||||
# Parse hunks
|
||||
current_hunk = None
|
||||
for line in lines:
|
||||
m = self.HUNK_HEADER_RE.match(line)
|
||||
if m:
|
||||
if current_hunk:
|
||||
self._classify_hunk(current_hunk, fc)
|
||||
fc.hunks.append(current_hunk)
|
||||
current_hunk = Hunk(
|
||||
header=m.group(5).strip(),
|
||||
old_start=int(m.group(1)),
|
||||
old_count=int(m.group(2) or 1),
|
||||
new_start=int(m.group(3)),
|
||||
new_count=int(m.group(4) or 1),
|
||||
)
|
||||
elif current_hunk and (line.startswith("+") or line.startswith("-") or line.startswith(" ")):
|
||||
current_hunk.lines.append(line)
|
||||
|
||||
if current_hunk:
|
||||
self._classify_hunk(current_hunk, fc)
|
||||
fc.hunks.append(current_hunk)
|
||||
|
||||
return fc
|
||||
|
||||
def _classify_hunk(self, hunk: Hunk, fc: FileChange):
|
||||
"""Classify a hunk and count lines."""
|
||||
added = sum(1 for l in hunk.lines if l.startswith("+"))
|
||||
deleted = sum(1 for l in hunk.lines if l.startswith("-"))
|
||||
|
||||
fc.added_lines += added
|
||||
fc.deleted_lines += deleted
|
||||
|
||||
if added > 0 and deleted == 0:
|
||||
hunk.category = ChangeCategory.ADDED
|
||||
elif deleted > 0 and added == 0:
|
||||
hunk.category = ChangeCategory.DELETED
|
||||
elif added > 0 and deleted > 0:
|
||||
hunk.category = ChangeCategory.MODIFIED
|
||||
else:
|
||||
hunk.category = ChangeCategory.CONTEXT
|
||||
387
scripts/freshness.py
Normal file
387
scripts/freshness.py
Normal file
@@ -0,0 +1,387 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Knowledge Freshness Cron — Detect stale entries from code changes (Issue #200)
|
||||
|
||||
Automatically detects when knowledge entries become stale due to code changes.
|
||||
|
||||
Detection Method:
|
||||
1. Track source file hash alongside knowledge entry
|
||||
2. Compare current file hashes vs stored
|
||||
3. Mismatch → flag entry as potentially stale
|
||||
4. Report stale entries and optionally re-extract
|
||||
|
||||
Usage:
|
||||
python3 scripts/freshness.py --knowledge-dir knowledge/
|
||||
python3 scripts/freshness.py --knowledge-dir knowledge/ --json
|
||||
python3 scripts/freshness.py --knowledge-dir knowledge/ --repo /path/to/repo
|
||||
python3 scripts/freshness.py --knowledge-dir knowledge/ --auto-reextract
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import hashlib
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import yaml
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any, Optional, Tuple
|
||||
|
||||
|
||||
def compute_file_hash(filepath: str) -> Optional[str]:
|
||||
"""Compute SHA-256 hash of a file. Returns None if file doesn't exist."""
|
||||
try:
|
||||
with open(filepath, "rb") as f:
|
||||
return "sha256:" + hashlib.sha256(f.read()).hexdigest()
|
||||
except (FileNotFoundError, IsADirectoryError, PermissionError):
|
||||
return None
|
||||
|
||||
|
||||
def get_git_file_changes(repo_path: str, days: int = 1) -> Dict[str, List[str]]:
|
||||
"""
|
||||
Get files changed in git in the last N days.
|
||||
|
||||
Returns dict with 'modified', 'added', 'deleted' lists of file paths.
|
||||
"""
|
||||
changes = {"modified": [], "added": [], "deleted": []}
|
||||
|
||||
try:
|
||||
# Get commits from last N days
|
||||
cmd = [
|
||||
"git", "-C", repo_path, "log",
|
||||
f"--since={days} days ago",
|
||||
"--name-status",
|
||||
"--pretty=format:",
|
||||
"--diff-filter=MAD"
|
||||
]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
|
||||
|
||||
if result.returncode != 0:
|
||||
return changes
|
||||
|
||||
for line in result.stdout.splitlines():
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
|
||||
parts = line.split('\t', 1)
|
||||
if len(parts) != 2:
|
||||
continue
|
||||
|
||||
status, filepath = parts
|
||||
if status == 'M':
|
||||
changes["modified"].append(filepath)
|
||||
elif status == 'A':
|
||||
changes["added"].append(filepath)
|
||||
elif status == 'D':
|
||||
changes["deleted"].append(filepath)
|
||||
|
||||
except (subprocess.TimeoutExpired, FileNotFoundError):
|
||||
pass
|
||||
|
||||
# Deduplicate
|
||||
for key in changes:
|
||||
changes[key] = list(set(changes[key]))
|
||||
|
||||
return changes
|
||||
|
||||
|
||||
def load_knowledge_entries(knowledge_dir: str) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Load knowledge entries from YAML files in the knowledge directory.
|
||||
|
||||
Supports:
|
||||
- knowledge/index.json (legacy format)
|
||||
- knowledge/global/*.yaml
|
||||
- knowledge/repos/*.yaml
|
||||
- knowledge/agents/*.yaml
|
||||
"""
|
||||
entries = []
|
||||
|
||||
# Load from index.json if exists
|
||||
index_path = os.path.join(knowledge_dir, "index.json")
|
||||
if os.path.exists(index_path):
|
||||
try:
|
||||
with open(index_path) as f:
|
||||
data = json.load(f)
|
||||
for fact in data.get("facts", []):
|
||||
entries.append({
|
||||
"source": "index.json",
|
||||
"fact": fact.get("fact", ""),
|
||||
"source_file": fact.get("source_file"),
|
||||
"source_hash": fact.get("source_hash"),
|
||||
"category": fact.get("category", "unknown"),
|
||||
"confidence": fact.get("confidence", 0.5)
|
||||
})
|
||||
except (json.JSONDecodeError, KeyError):
|
||||
pass
|
||||
|
||||
# Load from YAML files
|
||||
for subdir in ["global", "repos", "agents"]:
|
||||
subdir_path = os.path.join(knowledge_dir, subdir)
|
||||
if not os.path.isdir(subdir_path):
|
||||
continue
|
||||
|
||||
for filename in os.listdir(subdir_path):
|
||||
if not filename.endswith((".yaml", ".yml")):
|
||||
continue
|
||||
|
||||
filepath = os.path.join(subdir_path, filename)
|
||||
try:
|
||||
with open(filepath) as f:
|
||||
data = yaml.safe_load(f)
|
||||
|
||||
if not data or not isinstance(data, dict):
|
||||
continue
|
||||
|
||||
# Extract entries from YAML structure
|
||||
for key, value in data.items():
|
||||
if isinstance(value, list):
|
||||
for item in value:
|
||||
if isinstance(item, dict):
|
||||
entries.append({
|
||||
"source": f"{subdir}/{filename}",
|
||||
"fact": item.get("description", item.get("fact", "")),
|
||||
"source_file": item.get("source_file"),
|
||||
"source_hash": item.get("source_hash"),
|
||||
"category": item.get("category", "unknown"),
|
||||
"confidence": item.get("confidence", 0.5)
|
||||
})
|
||||
elif isinstance(value, dict):
|
||||
entries.append({
|
||||
"source": f"{subdir}/{filename}",
|
||||
"fact": value.get("description", value.get("fact", "")),
|
||||
"source_file": value.get("source_file"),
|
||||
"source_hash": value.get("source_hash"),
|
||||
"category": value.get("category", "unknown"),
|
||||
"confidence": value.get("confidence", 0.5)
|
||||
})
|
||||
except (yaml.YAMLError, IOError):
|
||||
pass
|
||||
|
||||
return entries
|
||||
|
||||
|
||||
def check_freshness(knowledge_dir: str, repo_root: str = ".",
|
||||
days: int = 1) -> Dict[str, Any]:
|
||||
"""
|
||||
Check freshness of knowledge entries against recent code changes.
|
||||
|
||||
Returns:
|
||||
{
|
||||
"timestamp": ISO timestamp,
|
||||
"total_entries": int,
|
||||
"stale_entries": [...],
|
||||
"fresh_entries": [...],
|
||||
"git_changes": {...},
|
||||
"summary": {...}
|
||||
}
|
||||
"""
|
||||
entries = load_knowledge_entries(knowledge_dir)
|
||||
git_changes = get_git_file_changes(repo_root, days)
|
||||
|
||||
stale_entries = []
|
||||
fresh_entries = []
|
||||
|
||||
for entry in entries:
|
||||
source_file = entry.get("source_file")
|
||||
if not source_file:
|
||||
# Entry without source file reference
|
||||
fresh_entries.append({**entry, "status": "no_source"})
|
||||
continue
|
||||
|
||||
# Check if source file was recently modified
|
||||
is_stale = False
|
||||
reason = ""
|
||||
|
||||
if source_file in git_changes["modified"]:
|
||||
is_stale = True
|
||||
reason = "source_modified"
|
||||
elif source_file in git_changes["deleted"]:
|
||||
is_stale = True
|
||||
reason = "source_deleted"
|
||||
elif source_file in git_changes["added"]:
|
||||
is_stale = True
|
||||
reason = "source_added"
|
||||
|
||||
# Also check hash if available
|
||||
stored_hash = entry.get("source_hash")
|
||||
if stored_hash:
|
||||
full_path = os.path.join(repo_root, source_file)
|
||||
current_hash = compute_file_hash(full_path)
|
||||
|
||||
if current_hash is None:
|
||||
is_stale = True
|
||||
reason = "source_missing"
|
||||
elif current_hash != stored_hash:
|
||||
is_stale = True
|
||||
reason = "hash_mismatch"
|
||||
|
||||
if is_stale:
|
||||
stale_entries.append({
|
||||
**entry,
|
||||
"status": "stale",
|
||||
"reason": reason
|
||||
})
|
||||
else:
|
||||
fresh_entries.append({**entry, "status": "fresh"})
|
||||
|
||||
# Compute summary
|
||||
total = len(entries)
|
||||
stale_count = len(stale_entries)
|
||||
fresh_count = len(fresh_entries)
|
||||
|
||||
# Group stale entries by reason
|
||||
stale_by_reason = {}
|
||||
for entry in stale_entries:
|
||||
reason = entry.get("reason", "unknown")
|
||||
if reason not in stale_by_reason:
|
||||
stale_by_reason[reason] = 0
|
||||
stale_by_reason[reason] += 1
|
||||
|
||||
return {
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"total_entries": total,
|
||||
"stale_entries": stale_entries,
|
||||
"fresh_entries": fresh_entries,
|
||||
"git_changes": git_changes,
|
||||
"summary": {
|
||||
"total": total,
|
||||
"stale": stale_count,
|
||||
"fresh": fresh_count,
|
||||
"stale_percentage": round(stale_count / total * 100, 1) if total > 0 else 0,
|
||||
"stale_by_reason": stale_by_reason,
|
||||
"git_changes_summary": {
|
||||
"modified": len(git_changes["modified"]),
|
||||
"added": len(git_changes["added"]),
|
||||
"deleted": len(git_changes["deleted"])
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
def update_stale_hashes(knowledge_dir: str, repo_root: str = ".") -> int:
|
||||
"""
|
||||
Update hashes for stale entries. Returns count of updated entries.
|
||||
"""
|
||||
entries = load_knowledge_entries(knowledge_dir)
|
||||
updated = 0
|
||||
|
||||
# This is a simplified version - in practice, you'd need to
|
||||
# write back to the specific YAML files
|
||||
for entry in entries:
|
||||
source_file = entry.get("source_file")
|
||||
if not source_file:
|
||||
continue
|
||||
|
||||
full_path = os.path.join(repo_root, source_file)
|
||||
current_hash = compute_file_hash(full_path)
|
||||
|
||||
if current_hash and entry.get("source_hash") != current_hash:
|
||||
# Mark for update (in practice, you'd write back to the file)
|
||||
updated += 1
|
||||
|
||||
return updated
|
||||
|
||||
|
||||
def format_report(result: Dict[str, Any], max_items: int = 20) -> str:
|
||||
"""Format freshness check results as a human-readable report."""
|
||||
timestamp = result["timestamp"]
|
||||
summary = result["summary"]
|
||||
stale_entries = result["stale_entries"]
|
||||
git_changes = result["git_changes"]
|
||||
|
||||
lines = [
|
||||
"Knowledge Freshness Report",
|
||||
"=" * 50,
|
||||
f"Generated: {timestamp}",
|
||||
f"Total entries: {summary['total']}",
|
||||
f"Stale entries: {summary['stale']} ({summary['stale_percentage']}%)",
|
||||
f"Fresh entries: {summary['fresh']}",
|
||||
""
|
||||
]
|
||||
|
||||
# Git changes summary
|
||||
lines.extend([
|
||||
"Git Changes (last 24h):",
|
||||
f" Modified: {len(git_changes['modified'])} files",
|
||||
f" Added: {len(git_changes['added'])} files",
|
||||
f" Deleted: {len(git_changes['deleted'])} files",
|
||||
""
|
||||
])
|
||||
|
||||
# Stale entries by reason
|
||||
if summary.get("stale_by_reason"):
|
||||
lines.extend([
|
||||
"Stale Entries by Reason:",
|
||||
""
|
||||
])
|
||||
for reason, count in summary["stale_by_reason"].items():
|
||||
lines.append(f" {reason}: {count}")
|
||||
lines.append("")
|
||||
|
||||
# List stale entries
|
||||
if stale_entries:
|
||||
lines.extend([
|
||||
"Stale Entries:",
|
||||
""
|
||||
])
|
||||
for i, entry in enumerate(stale_entries[:max_items], 1):
|
||||
source = entry.get("source_file", "?")
|
||||
reason = entry.get("reason", "unknown")
|
||||
fact = entry.get("fact", "")[:60]
|
||||
lines.append(f"{i:2d}. [{reason}] {source}")
|
||||
if fact:
|
||||
lines.append(f" {fact}")
|
||||
|
||||
if len(stale_entries) > max_items:
|
||||
lines.append(f"\n... and {len(stale_entries) - max_items} more")
|
||||
else:
|
||||
lines.append("No stale entries found. All knowledge is fresh!")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Knowledge Freshness Cron — detect stale entries from code changes")
|
||||
parser.add_argument("--knowledge-dir", required=True,
|
||||
help="Path to knowledge directory")
|
||||
parser.add_argument("--repo", default=".",
|
||||
help="Path to repository for git change detection")
|
||||
parser.add_argument("--days", type=int, default=1,
|
||||
help="Number of days to check for git changes (default: 1)")
|
||||
parser.add_argument("--json", action="store_true",
|
||||
help="Output as JSON instead of human-readable")
|
||||
parser.add_argument("--max", type=int, default=20,
|
||||
help="Maximum stale entries to show (default: 20)")
|
||||
parser.add_argument("--auto-reextract", action="store_true",
|
||||
help="Auto-re-extract knowledge for stale entries")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.knowledge_dir):
|
||||
print(f"Error: {args.knowledge_dir} is not a directory", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
if not os.path.isdir(args.repo):
|
||||
print(f"Error: {args.repo} is not a directory", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
result = check_freshness(args.knowledge_dir, args.repo, args.days)
|
||||
|
||||
if args.json:
|
||||
print(json.dumps(result, indent=2))
|
||||
else:
|
||||
print(format_report(result, args.max))
|
||||
|
||||
# Auto-re-extract if requested
|
||||
if args.auto_reextract and result["stale_entries"]:
|
||||
print(f"\nAuto-re-extracting {len(result['stale_entries'])} stale entries...")
|
||||
# In a real implementation, this would call the harvester
|
||||
print("(Auto-re-extraction not yet implemented)")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
131
scripts/gitea_issue_parser.py
Normal file
131
scripts/gitea_issue_parser.py
Normal file
@@ -0,0 +1,131 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Gitea Issue Body Parser — Extract structured data from markdown issue bodies.
|
||||
|
||||
Usage:
|
||||
cat issue_body.txt | python3 scripts/gitea_issue_parser.py --stdin --pretty
|
||||
python3 scripts/gitea_issue_parser.py --url https://forge.../api/v1/repos/.../issues/123 --pretty
|
||||
python3 scripts/gitea_issue_parser.py body.txt --title "Fix thing (#42)" --labels pipeline extraction
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
from typing import Dict, List, Any, Optional
|
||||
|
||||
|
||||
def parse_issue_body(body: str, title: str = "", labels: List[str] = None) -> Dict[str, Any]:
|
||||
"""Parse a Gitea issue markdown body into structured JSON.
|
||||
|
||||
Extracted fields:
|
||||
- title: Issue title
|
||||
- context: Background/description section
|
||||
- criteria[]: Acceptance criteria (checkboxes or numbered lists)
|
||||
- labels[]: Issue labels
|
||||
- epic_ref: Parent/epic issue reference (from "Closes #N" or title)
|
||||
- sections{}: All ## sections as key-value pairs
|
||||
"""
|
||||
result = {
|
||||
"title": title,
|
||||
"context": "",
|
||||
"criteria": [],
|
||||
"labels": labels or [],
|
||||
"epic_ref": None,
|
||||
"sections": {},
|
||||
}
|
||||
|
||||
if not body:
|
||||
return result
|
||||
|
||||
# Extract epic reference from title or body
|
||||
epic_patterns = [
|
||||
r"(?:closes|fixes|addresses|refs?)\s+#(\d+)",
|
||||
r"#(\d+)",
|
||||
]
|
||||
for pattern in epic_patterns:
|
||||
match = re.search(pattern, (title + " " + body).lower())
|
||||
if match:
|
||||
result["epic_ref"] = int(match.group(1))
|
||||
break
|
||||
|
||||
# Parse ## sections
|
||||
section_pattern = r"^##\s+(.+?)$\n((?:^(?!##\s).*$\n?)*)"
|
||||
for match in re.finditer(section_pattern, body, re.MULTILINE):
|
||||
section_name = match.group(1).strip().lower().replace(" ", "_")
|
||||
section_content = match.group(2).strip()
|
||||
result["sections"][section_name] = section_content
|
||||
|
||||
# Extract acceptance criteria (checkboxes)
|
||||
checkbox_pattern = r"^\s*-\s*\[([ xX])\]\s*(.+)$"
|
||||
for match in re.finditer(checkbox_pattern, body, re.MULTILINE):
|
||||
checked = match.group(1).lower() == "x"
|
||||
text = match.group(2).strip()
|
||||
result["criteria"].append({"text": text, "checked": checked})
|
||||
|
||||
# If no checkboxes, try numbered lists in "Acceptance Criteria" or "Criteria" section
|
||||
if not result["criteria"]:
|
||||
for section_name in ["acceptance_criteria", "criteria", "acceptance criteria"]:
|
||||
if section_name in result["sections"]:
|
||||
numbered = r"^\s*\d+\.\s*(.+)$"
|
||||
for match in re.finditer(numbered, result["sections"][section_name], re.MULTILINE):
|
||||
result["criteria"].append({"text": match.group(1).strip(), "checked": False})
|
||||
break
|
||||
|
||||
# Extract context (first section or first paragraph before any ## heading)
|
||||
first_heading = body.find("## ")
|
||||
if first_heading > 0:
|
||||
context_text = body[:first_heading].strip()
|
||||
else:
|
||||
context_text = body.split("\n\n")[0].strip()
|
||||
# Clean up: remove "## Context" or "## Problem" header if present
|
||||
context_text = re.sub(r"^#+\s*\w+\s*\n?", "", context_text).strip()
|
||||
result["context"] = context_text[:500] # Cap at 500 chars
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def fetch_issue_from_url(url: str) -> Dict[str, Any]:
|
||||
"""Fetch an issue from a Gitea API URL and parse it."""
|
||||
import urllib.request
|
||||
req = urllib.request.Request(url, headers={"Accept": "application/json"})
|
||||
with urllib.request.urlopen(req) as resp:
|
||||
data = json.loads(resp.read())
|
||||
|
||||
return parse_issue_body(
|
||||
body=data.get("body", ""),
|
||||
title=data.get("title", ""),
|
||||
labels=[l["name"] for l in data.get("labels", [])]
|
||||
)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Parse Gitea issue markdown into structured JSON")
|
||||
parser.add_argument("file", nargs="?", help="Issue body file (or use --stdin)")
|
||||
parser.add_argument("--stdin", action="store_true", help="Read from stdin")
|
||||
parser.add_argument("--url", help="Gitea API URL to fetch issue from")
|
||||
parser.add_argument("--title", default="", help="Issue title")
|
||||
parser.add_argument("--labels", nargs="*", default=[], help="Issue labels")
|
||||
parser.add_argument("--pretty", action="store_true", help="Pretty-print JSON output")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.url:
|
||||
result = fetch_issue_from_url(args.url)
|
||||
elif args.stdin:
|
||||
body = sys.stdin.read()
|
||||
result = parse_issue_body(body, args.title, args.labels)
|
||||
elif args.file:
|
||||
with open(args.file) as f:
|
||||
body = f.read()
|
||||
result = parse_issue_body(body, args.title, args.labels)
|
||||
else:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
indent = 2 if args.pretty else None
|
||||
print(json.dumps(result, indent=indent))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
447
scripts/harvester.py
Normal file
447
scripts/harvester.py
Normal file
@@ -0,0 +1,447 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
harvester.py — Extract durable knowledge from Hermes session transcripts.
|
||||
|
||||
Combines session_reader + extraction prompt + LLM inference to pull
|
||||
facts, pitfalls, patterns, and tool quirks from finished sessions.
|
||||
|
||||
Usage:
|
||||
python3 harvester.py --session ~/.hermes/sessions/session_xxx.jsonl --output knowledge/
|
||||
python3 harvester.py --batch --since 2026-04-01 --limit 100
|
||||
python3 harvester.py --session session.jsonl --dry-run # Preview without writing
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import hashlib
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
# Add scripts dir to path for sibling imports
|
||||
SCRIPT_DIR = Path(__file__).parent.absolute()
|
||||
sys.path.insert(0, str(SCRIPT_DIR))
|
||||
|
||||
from session_reader import read_session, extract_conversation, truncate_for_context, messages_to_text
|
||||
|
||||
# --- Configuration ---
|
||||
|
||||
DEFAULT_API_BASE = os.environ.get("HARVESTER_API_BASE", "https://api.nousresearch.com/v1")
|
||||
DEFAULT_API_KEY = os.environ.get("HARVESTER_API_KEY", "")
|
||||
DEFAULT_MODEL = os.environ.get("HARVESTER_MODEL", "xiaomi/mimo-v2-pro")
|
||||
KNOWLEDGE_DIR = os.environ.get("HARVESTER_KNOWLEDGE_DIR", "knowledge")
|
||||
PROMPT_PATH = os.environ.get("HARVESTER_PROMPT_PATH", str(SCRIPT_DIR.parent / "templates" / "harvest-prompt.md"))
|
||||
|
||||
# Where to look for API keys if not set via env
|
||||
API_KEY_PATHS = [
|
||||
os.path.expanduser("~/.config/nous/key"),
|
||||
os.path.expanduser("~/.hermes/keymaxxing/active/minimax.key"),
|
||||
os.path.expanduser("~/.config/openrouter/key"),
|
||||
]
|
||||
|
||||
|
||||
def find_api_key() -> str:
|
||||
"""Find API key from common locations."""
|
||||
for path in API_KEY_PATHS:
|
||||
if os.path.exists(path):
|
||||
with open(path) as f:
|
||||
key = f.read().strip()
|
||||
if key:
|
||||
return key
|
||||
return ""
|
||||
|
||||
|
||||
def load_extraction_prompt() -> str:
|
||||
"""Load the extraction prompt template."""
|
||||
path = Path(PROMPT_PATH)
|
||||
if not path.exists():
|
||||
print(f"ERROR: Extraction prompt not found at {path}", file=sys.stderr)
|
||||
print("Expected templates/harvest-prompt.md from issue #7", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return path.read_text(encoding='utf-8')
|
||||
|
||||
|
||||
def call_llm(prompt: str, transcript: str, api_base: str, api_key: str, model: str) -> Optional[list[dict]]:
|
||||
"""Call the LLM API to extract knowledge from a transcript."""
|
||||
import urllib.request
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": prompt},
|
||||
{"role": "user", "content": f"Extract knowledge from this session transcript:\n\n{transcript}"}
|
||||
]
|
||||
|
||||
payload = json.dumps({
|
||||
"model": model,
|
||||
"messages": messages,
|
||||
"temperature": 0.1, # Low temp for consistent extraction
|
||||
"max_tokens": 4096
|
||||
}).encode('utf-8')
|
||||
|
||||
req = urllib.request.Request(
|
||||
f"{api_base}/chat/completions",
|
||||
data=payload,
|
||||
headers={
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"Content-Type": "application/json"
|
||||
},
|
||||
method="POST"
|
||||
)
|
||||
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=60) as resp:
|
||||
result = json.loads(resp.read().decode('utf-8'))
|
||||
content = result["choices"][0]["message"]["content"]
|
||||
return parse_extraction_response(content)
|
||||
except Exception as e:
|
||||
print(f"ERROR: LLM API call failed: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def parse_extraction_response(content: str) -> Optional[list[dict]]:
|
||||
"""Parse the LLM response to extract knowledge items.
|
||||
|
||||
Handles various response formats: raw JSON, markdown-wrapped JSON, etc.
|
||||
"""
|
||||
# Try direct JSON parse first
|
||||
try:
|
||||
data = json.loads(content)
|
||||
if isinstance(data, dict) and 'knowledge' in data:
|
||||
return data['knowledge']
|
||||
if isinstance(data, list):
|
||||
return data
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Try extracting JSON from markdown code blocks
|
||||
import re
|
||||
json_match = re.search(r'```(?:json)?\s*({.*?})\s*```', content, re.DOTALL)
|
||||
if json_match:
|
||||
try:
|
||||
data = json.loads(json_match.group(1))
|
||||
if isinstance(data, dict) and 'knowledge' in data:
|
||||
return data['knowledge']
|
||||
if isinstance(data, list):
|
||||
return data
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Try finding any JSON object with knowledge array
|
||||
json_match = re.search(r'({[^{}]*"knowledge"[^{}]*[[sS]*?][^{}]*})', content)
|
||||
if json_match:
|
||||
try:
|
||||
data = json.loads(json_match.group(1))
|
||||
return data.get('knowledge', [])
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
print(f"WARNING: Could not parse LLM response as JSON", file=sys.stderr)
|
||||
print(f"Response preview: {content[:500]}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def load_existing_knowledge(knowledge_dir: str) -> dict:
|
||||
"""Load the existing knowledge index."""
|
||||
index_path = Path(knowledge_dir) / "index.json"
|
||||
if not index_path.exists():
|
||||
return {"version": 1, "last_updated": "", "total_facts": 0, "facts": []}
|
||||
|
||||
try:
|
||||
with open(index_path, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
except (json.JSONDecodeError, IOError) as e:
|
||||
print(f"WARNING: Could not load knowledge index: {e}", file=sys.stderr)
|
||||
return {"version": 1, "last_updated": "", "total_facts": 0, "facts": []}
|
||||
|
||||
|
||||
def fact_fingerprint(fact: dict) -> str:
|
||||
"""Generate a deduplication fingerprint for a fact.
|
||||
|
||||
Uses the fact text normalized (lowercase, stripped) as the key.
|
||||
Similar facts will have similar fingerprints.
|
||||
"""
|
||||
text = fact.get('fact', '').lower().strip()
|
||||
# Normalize whitespace
|
||||
text = ' '.join(text.split())
|
||||
return hashlib.md5(text.encode('utf-8')).hexdigest()
|
||||
|
||||
|
||||
def deduplicate(new_facts: list[dict], existing: list[dict], similarity_threshold: float = 0.8) -> list[dict]:
|
||||
"""Remove duplicate facts from new_facts that already exist in the knowledge store.
|
||||
|
||||
Uses fingerprint matching for exact dedup and simple overlap check for near-dupes.
|
||||
"""
|
||||
existing_fingerprints = set()
|
||||
existing_texts = []
|
||||
for f in existing:
|
||||
fp = fact_fingerprint(f)
|
||||
existing_fingerprints.add(fp)
|
||||
existing_texts.append(f.get('fact', '').lower().strip())
|
||||
|
||||
unique = []
|
||||
for fact in new_facts:
|
||||
fp = fact_fingerprint(fact)
|
||||
if fp in existing_fingerprints:
|
||||
continue
|
||||
|
||||
# Check for near-duplicates using simple word overlap
|
||||
fact_words = set(fact.get('fact', '').lower().split())
|
||||
is_dup = False
|
||||
for existing_text in existing_texts:
|
||||
existing_words = set(existing_text.split())
|
||||
if not fact_words or not existing_words:
|
||||
continue
|
||||
overlap = len(fact_words & existing_words) / max(len(fact_words | existing_words), 1)
|
||||
if overlap >= similarity_threshold:
|
||||
is_dup = True
|
||||
break
|
||||
|
||||
if not is_dup:
|
||||
unique.append(fact)
|
||||
existing_fingerprints.add(fp)
|
||||
existing_texts.append(fact.get('fact', '').lower().strip())
|
||||
|
||||
return unique
|
||||
|
||||
|
||||
def validate_fact(fact: dict) -> bool:
|
||||
"""Validate a single knowledge item has required fields."""
|
||||
required = ['fact', 'category', 'repo', 'confidence']
|
||||
for field in required:
|
||||
if field not in fact:
|
||||
return False
|
||||
|
||||
if not isinstance(fact['fact'], str) or not fact['fact'].strip():
|
||||
return False
|
||||
|
||||
valid_categories = ['fact', 'pitfall', 'pattern', 'tool-quirk', 'question']
|
||||
if fact['category'] not in valid_categories:
|
||||
return False
|
||||
|
||||
if not isinstance(fact.get('confidence', 0), (int, float)):
|
||||
return False
|
||||
|
||||
if not (0.0 <= fact['confidence'] <= 1.0):
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def write_knowledge(index: dict, new_facts: list[dict], knowledge_dir: str, source_session: str = ""):
|
||||
"""Write new facts to the knowledge store."""
|
||||
kdir = Path(knowledge_dir)
|
||||
kdir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Add source tracking to each fact
|
||||
for fact in new_facts:
|
||||
fact['source_session'] = source_session
|
||||
fact['harvested_at'] = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
# Update index
|
||||
index['facts'].extend(new_facts)
|
||||
index['total_facts'] = len(index['facts'])
|
||||
index['last_updated'] = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
# Write index
|
||||
index_path = kdir / "index.json"
|
||||
with open(index_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(index, f, indent=2, ensure_ascii=False)
|
||||
|
||||
# Also write per-repo markdown files for human reading
|
||||
repos = {}
|
||||
for fact in new_facts:
|
||||
repo = fact.get('repo', 'global')
|
||||
repos.setdefault(repo, []).append(fact)
|
||||
|
||||
for repo, facts in repos.items():
|
||||
if repo == 'global':
|
||||
md_path = kdir / "global" / "harvested.md"
|
||||
else:
|
||||
md_path = kdir / "repos" / f"{repo}.md"
|
||||
|
||||
md_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Append to existing or create new
|
||||
mode = 'a' if md_path.exists() else 'w'
|
||||
with open(md_path, mode, encoding='utf-8') as f:
|
||||
if mode == 'w':
|
||||
f.write(f"# Knowledge: {repo}\n\n")
|
||||
f.write(f"## Harvested {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M')}\n\n")
|
||||
for fact in facts:
|
||||
icon = {'fact': '📋', 'pitfall': '⚠️', 'pattern': '🔄', 'tool-quirk': '🔧', 'question': '❓'}.get(fact['category'], '•')
|
||||
f.write(f"- {icon} **{fact['category']}** (conf: {fact['confidence']:.1f}): {fact['fact']}\n")
|
||||
f.write("\n")
|
||||
|
||||
|
||||
def harvest_session(session_path: str, knowledge_dir: str, api_base: str, api_key: str,
|
||||
model: str, dry_run: bool = False, min_confidence: float = 0.3) -> dict:
|
||||
"""Harvest knowledge from a single session.
|
||||
|
||||
Returns: dict with stats (facts_found, facts_new, facts_dup, elapsed_seconds, error)
|
||||
"""
|
||||
start_time = time.time()
|
||||
stats = {
|
||||
'session': session_path,
|
||||
'facts_found': 0,
|
||||
'facts_new': 0,
|
||||
'facts_dup': 0,
|
||||
'elapsed_seconds': 0,
|
||||
'error': None
|
||||
}
|
||||
|
||||
try:
|
||||
# 1. Read session
|
||||
messages = read_session(session_path)
|
||||
if not messages:
|
||||
stats['error'] = "Empty session file"
|
||||
return stats
|
||||
|
||||
# 2. Extract conversation
|
||||
conv = extract_conversation(messages)
|
||||
if not conv:
|
||||
stats['error'] = "No conversation turns found"
|
||||
return stats
|
||||
|
||||
# 3. Truncate for context window
|
||||
truncated = truncate_for_context(conv, head=50, tail=50)
|
||||
transcript = messages_to_text(truncated)
|
||||
|
||||
# 4. Load extraction prompt
|
||||
prompt = load_extraction_prompt()
|
||||
|
||||
# 5. Call LLM
|
||||
raw_facts = call_llm(prompt, transcript, api_base, api_key, model)
|
||||
if raw_facts is None:
|
||||
stats['error'] = "LLM extraction failed"
|
||||
return stats
|
||||
|
||||
# 6. Validate
|
||||
valid_facts = [f for f in raw_facts if validate_fact(f) and f.get('confidence', 0) >= min_confidence]
|
||||
stats['facts_found'] = len(valid_facts)
|
||||
|
||||
# 7. Deduplicate
|
||||
existing_index = load_existing_knowledge(knowledge_dir)
|
||||
existing_facts = existing_index.get('facts', [])
|
||||
new_facts = deduplicate(valid_facts, existing_facts)
|
||||
stats['facts_new'] = len(new_facts)
|
||||
stats['facts_dup'] = len(valid_facts) - len(new_facts)
|
||||
|
||||
# 8. Write (unless dry run)
|
||||
if new_facts and not dry_run:
|
||||
write_knowledge(existing_index, new_facts, knowledge_dir, source_session=session_path)
|
||||
|
||||
stats['elapsed_seconds'] = round(time.time() - start_time, 2)
|
||||
return stats
|
||||
|
||||
except Exception as e:
|
||||
stats['error'] = str(e)
|
||||
stats['elapsed_seconds'] = round(time.time() - start_time, 2)
|
||||
return stats
|
||||
|
||||
|
||||
def batch_harvest(sessions_dir: str, knowledge_dir: str, api_base: str, api_key: str,
|
||||
model: str, since: str = "", limit: int = 0, dry_run: bool = False) -> list[dict]:
|
||||
"""Harvest knowledge from multiple sessions in batch."""
|
||||
sessions_path = Path(sessions_dir)
|
||||
if not sessions_path.is_dir():
|
||||
print(f"ERROR: Sessions directory not found: {sessions_dir}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
# Find session files
|
||||
session_files = sorted(sessions_path.glob("*.jsonl"), reverse=True) # Newest first
|
||||
|
||||
# Filter by date if --since provided
|
||||
if since:
|
||||
since_dt = datetime.fromisoformat(since.replace('Z', '+00:00'))
|
||||
filtered = []
|
||||
for sf in session_files:
|
||||
# Try to parse timestamp from filename (common format: session_YYYYMMDD_HHMMSS_hash.jsonl)
|
||||
try:
|
||||
parts = sf.stem.split('_')
|
||||
if len(parts) >= 3:
|
||||
date_str = parts[1]
|
||||
file_dt = datetime.strptime(date_str, '%Y%m%d').replace(tzinfo=timezone.utc)
|
||||
if file_dt >= since_dt:
|
||||
filtered.append(sf)
|
||||
except (ValueError, IndexError):
|
||||
# If we can't parse the date, include the file (be permissive)
|
||||
filtered.append(sf)
|
||||
session_files = filtered
|
||||
|
||||
# Apply limit
|
||||
if limit > 0:
|
||||
session_files = session_files[:limit]
|
||||
|
||||
print(f"Harvesting {len(session_files)} sessions...")
|
||||
|
||||
results = []
|
||||
for i, sf in enumerate(session_files, 1):
|
||||
print(f"[{i}/{len(session_files)}] {sf.name}...", end=" ", flush=True)
|
||||
stats = harvest_session(str(sf), knowledge_dir, api_base, api_key, model, dry_run)
|
||||
if stats['error']:
|
||||
print(f"ERROR: {stats['error']}")
|
||||
else:
|
||||
print(f"{stats['facts_new']} new, {stats['facts_dup']} dup ({stats['elapsed_seconds']}s)")
|
||||
results.append(stats)
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Harvest knowledge from session transcripts")
|
||||
parser.add_argument('--session', help='Path to a single session JSONL file')
|
||||
parser.add_argument('--batch', action='store_true', help='Batch mode: process multiple sessions')
|
||||
parser.add_argument('--sessions-dir', default=os.path.expanduser('~/.hermes/sessions'),
|
||||
help='Directory containing session files (default: ~/.hermes/sessions)')
|
||||
parser.add_argument('--output', default='knowledge', help='Output directory for knowledge store')
|
||||
parser.add_argument('--since', default='', help='Only process sessions after this date (YYYY-MM-DD)')
|
||||
parser.add_argument('--limit', type=int, default=0, help='Max sessions to process (0=unlimited)')
|
||||
parser.add_argument('--api-base', default=DEFAULT_API_BASE, help='LLM API base URL')
|
||||
parser.add_argument('--api-key', default='', help='LLM API key (or set HARVESTER_API_KEY)')
|
||||
parser.add_argument('--model', default=DEFAULT_MODEL, help='Model to use for extraction')
|
||||
parser.add_argument('--dry-run', action='store_true', help='Preview without writing to knowledge store')
|
||||
parser.add_argument('--min-confidence', type=float, default=0.3, help='Minimum confidence threshold')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Resolve API key
|
||||
api_key = args.api_key or DEFAULT_API_KEY or find_api_key()
|
||||
if not api_key:
|
||||
print("ERROR: No API key found. Set HARVESTER_API_KEY or store in one of:", file=sys.stderr)
|
||||
for p in API_KEY_PATHS:
|
||||
print(f" {p}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Resolve knowledge directory
|
||||
knowledge_dir = args.output
|
||||
if not os.path.isabs(knowledge_dir):
|
||||
knowledge_dir = os.path.join(SCRIPT_DIR.parent, knowledge_dir)
|
||||
|
||||
if args.session:
|
||||
# Single session mode
|
||||
stats = harvest_session(
|
||||
args.session, knowledge_dir, args.api_base, api_key, args.model,
|
||||
dry_run=args.dry_run, min_confidence=args.min_confidence
|
||||
)
|
||||
print(json.dumps(stats, indent=2))
|
||||
if stats['error']:
|
||||
sys.exit(1)
|
||||
elif args.batch:
|
||||
# Batch mode
|
||||
results = batch_harvest(
|
||||
args.sessions_dir, knowledge_dir, args.api_base, api_key, args.model,
|
||||
since=args.since, limit=args.limit, dry_run=args.dry_run
|
||||
)
|
||||
total_new = sum(r['facts_new'] for r in results)
|
||||
total_dup = sum(r['facts_dup'] for r in results)
|
||||
errors = sum(1 for r in results if r['error'])
|
||||
print(f"\nDone: {total_new} new facts, {total_dup} duplicates, {errors} errors")
|
||||
else:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
451
scripts/improvement_proposals.py
Normal file
451
scripts/improvement_proposals.py
Normal file
@@ -0,0 +1,451 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Improvement Proposal Generator for compounding-intelligence.
|
||||
|
||||
Analyzes fleet session data to identify waste patterns and generates
|
||||
concrete improvement proposals with ROI estimates.
|
||||
|
||||
Input: Session analytics JSON (from fleet metrics or session database)
|
||||
Output: Markdown proposal document + JSON proposals
|
||||
|
||||
Usage:
|
||||
python3 scripts/improvement_proposals.py --input analytics.json
|
||||
python3 scripts/improvement_proposals.py --input analytics.json --format json
|
||||
python3 scripts/improvement_proposals.py --input analytics.json --output proposals.md
|
||||
python3 scripts/improvement_proposals.py --input analytics.json --threshold 5.0
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from dataclasses import dataclass, field, asdict
|
||||
from datetime import datetime, timezone
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
|
||||
# ── Data types ─────────────────────────────────────────────────
|
||||
|
||||
@dataclass
|
||||
class WastePattern:
|
||||
"""A detected waste pattern in fleet sessions."""
|
||||
pattern_type: str # "repeated_error", "manual_process", "slow_tool", "failed_retry"
|
||||
description: str
|
||||
occurrences: int
|
||||
total_time_hours: float
|
||||
affected_repos: List[str] = field(default_factory=list)
|
||||
example_sessions: List[str] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
class ImprovementProposal:
|
||||
"""A concrete improvement proposal with ROI estimate."""
|
||||
title: str
|
||||
problem: str
|
||||
proposed_solution: str
|
||||
estimated_weekly_hours_saved: float
|
||||
estimated_monthly_hours_saved: float
|
||||
implementation_hours: float
|
||||
roi_weeks: float # weeks to break even
|
||||
priority: str # "critical", "high", "medium", "low"
|
||||
affected_area: str
|
||||
supporting_evidence: List[str] = field(default_factory=list)
|
||||
|
||||
|
||||
# ── Session analysis ──────────────────────────────────────────
|
||||
|
||||
def analyze_sessions(sessions: List[dict]) -> List[WastePattern]:
|
||||
"""
|
||||
Analyze session data to find waste patterns.
|
||||
|
||||
Looks for:
|
||||
- Repeated errors (same error across sessions)
|
||||
- Manual processes (long sequences of similar actions)
|
||||
- Slow tools (tools with high latency)
|
||||
- Failed retries (multiple attempts at same task)
|
||||
"""
|
||||
patterns = []
|
||||
|
||||
# Track error frequency across sessions
|
||||
error_counts: Dict[str, List[dict]] = defaultdict(list)
|
||||
tool_latencies: Dict[str, List[float]] = defaultdict(list)
|
||||
retry_counts: Dict[str, int] = defaultdict(int)
|
||||
manual_sequences: List[dict] = []
|
||||
|
||||
for session in sessions:
|
||||
session_id = session.get("session_id", "unknown")
|
||||
repo = session.get("repo", "global")
|
||||
messages = session.get("messages", [])
|
||||
errors = session.get("errors", [])
|
||||
tool_calls = session.get("tool_calls", [])
|
||||
duration_min = session.get("duration_minutes", 0)
|
||||
|
||||
# 1. Repeated errors
|
||||
for err in errors:
|
||||
err_key = _normalize_error(err.get("message", ""))
|
||||
if err_key:
|
||||
error_counts[err_key].append({
|
||||
"session_id": session_id,
|
||||
"repo": repo,
|
||||
})
|
||||
|
||||
# 2. Tool latency tracking
|
||||
for tc in tool_calls:
|
||||
tool_name = tc.get("tool", "unknown")
|
||||
latency = tc.get("latency_ms", 0)
|
||||
if latency > 0:
|
||||
tool_latencies[tool_name].append(latency)
|
||||
|
||||
# 3. Failed retries (same tool called 3+ times in sequence)
|
||||
prev_tool = None
|
||||
streak = 0
|
||||
for tc in tool_calls:
|
||||
tool_name = tc.get("tool", "unknown")
|
||||
if tool_name == prev_tool:
|
||||
streak += 1
|
||||
else:
|
||||
if streak >= 3:
|
||||
retry_counts[prev_tool] += 1
|
||||
streak = 1
|
||||
prev_tool = tool_name
|
||||
if streak >= 3:
|
||||
retry_counts[prev_tool] += 1
|
||||
|
||||
# 4. Manual processes (10+ sequential tool calls with no automation)
|
||||
if len(tool_calls) > 10:
|
||||
tool_sequence = [tc.get("tool") for tc in tool_calls]
|
||||
unique_tools = set(tool_sequence)
|
||||
if len(unique_tools) <= 3 and len(tool_calls) > 10:
|
||||
manual_sequences.append({
|
||||
"session_id": session_id,
|
||||
"repo": repo,
|
||||
"tool_count": len(tool_calls),
|
||||
"unique_tools": list(unique_tools),
|
||||
"duration_min": duration_min,
|
||||
})
|
||||
|
||||
# Generate patterns from collected data
|
||||
|
||||
# Repeated errors (appearing in 3+ sessions)
|
||||
for err_key, occurrences in error_counts.items():
|
||||
if len(occurrences) >= 3:
|
||||
repos = list(set(o["repo"] for o in occurrences))
|
||||
sessions_list = [o["session_id"] for o in occurrences[:5]]
|
||||
# Estimate time wasted: 5 min per error occurrence
|
||||
hours = len(occurrences) * 5 / 60
|
||||
patterns.append(WastePattern(
|
||||
pattern_type="repeated_error",
|
||||
description=f"Error: {err_key[:100]}",
|
||||
occurrences=len(occurrences),
|
||||
total_time_hours=round(hours, 1),
|
||||
affected_repos=repos,
|
||||
example_sessions=sessions_list,
|
||||
))
|
||||
|
||||
# Slow tools (avg latency > 5000ms across 5+ calls)
|
||||
for tool, latencies in tool_latencies.items():
|
||||
if len(latencies) >= 5:
|
||||
avg_ms = sum(latencies) / len(latencies)
|
||||
if avg_ms > 5000:
|
||||
hours = sum(latencies) / 1000 / 3600
|
||||
patterns.append(WastePattern(
|
||||
pattern_type="slow_tool",
|
||||
description=f"Tool '{tool}' averages {avg_ms:.0f}ms latency",
|
||||
occurrences=len(latencies),
|
||||
total_time_hours=round(hours, 1),
|
||||
affected_repos=["global"],
|
||||
example_sessions=[],
|
||||
))
|
||||
|
||||
# Failed retries
|
||||
for tool, count in retry_counts.items():
|
||||
if count >= 2:
|
||||
hours = count * 10 / 60 # ~10 min per failed retry sequence
|
||||
patterns.append(WastePattern(
|
||||
pattern_type="failed_retry",
|
||||
description=f"Tool '{tool}' had {count} retry sequences (3+ consecutive calls)",
|
||||
occurrences=count,
|
||||
total_time_hours=round(hours, 1),
|
||||
affected_repos=["global"],
|
||||
example_sessions=[],
|
||||
))
|
||||
|
||||
# Manual processes
|
||||
for seq in manual_sequences:
|
||||
hours = seq["duration_min"] / 60
|
||||
patterns.append(WastePattern(
|
||||
pattern_type="manual_process",
|
||||
description=f"Session {seq['session_id'][:12]}: {seq['tool_count']} tool calls with only {len(seq['unique_tools'])} unique tools",
|
||||
occurrences=1,
|
||||
total_time_hours=round(hours, 1),
|
||||
affected_repos=[seq["repo"]],
|
||||
example_sessions=[seq["session_id"]],
|
||||
))
|
||||
|
||||
return sorted(patterns, key=lambda p: p.total_time_hours, reverse=True)
|
||||
|
||||
|
||||
def _normalize_error(message: str) -> str:
|
||||
"""Normalize error message to a common key."""
|
||||
if not message:
|
||||
return ""
|
||||
msg = message.lower().strip()
|
||||
# Remove variable parts (paths, IDs, timestamps)
|
||||
import re
|
||||
msg = re.sub(r'/\S+', '/PATH', msg)
|
||||
msg = re.sub(r'\b[0-9a-f]{8,}\b', 'HASH', msg)
|
||||
msg = re.sub(r'\d{4}-\d{2}-\d{2}[tT]\d{2}:\d{2}', 'TIME', msg)
|
||||
return msg[:150]
|
||||
|
||||
|
||||
# ── Proposal generation ───────────────────────────────────────
|
||||
|
||||
def generate_proposals(
|
||||
patterns: List[WastePattern],
|
||||
hourly_rate: float = 50.0,
|
||||
implementation_overhead: float = 1.5,
|
||||
) -> List[ImprovementProposal]:
|
||||
"""
|
||||
Generate improvement proposals from waste patterns.
|
||||
|
||||
Args:
|
||||
patterns: Detected waste patterns
|
||||
hourly_rate: Developer hourly rate for ROI calculation
|
||||
implementation_overhead: Multiplier for implementation time estimate
|
||||
"""
|
||||
proposals = []
|
||||
|
||||
# Group patterns by type
|
||||
by_type: Dict[str, List[WastePattern]] = defaultdict(list)
|
||||
for p in patterns:
|
||||
by_type[p.pattern_type].append(p)
|
||||
|
||||
# 1. Repeated errors → Create fix/skill
|
||||
for p in by_type.get("repeated_error", []):
|
||||
weekly_hours = p.total_time_hours / 4 # monthly → weekly
|
||||
impl_hours = max(1.0, p.occurrences * 0.25) * implementation_overhead
|
||||
roi_weeks = impl_hours / weekly_hours if weekly_hours > 0 else float('inf')
|
||||
|
||||
proposals.append(ImprovementProposal(
|
||||
title=f"Fix repeated error: {p.description[:60]}",
|
||||
problem=f"This error occurred {p.occurrences} times across {len(p.affected_repos)} repos, wasting ~{p.total_time_hours:.1f} hours.",
|
||||
proposed_solution="Root-cause the error and create a permanent fix or mitigation skill.",
|
||||
estimated_weekly_hours_saved=round(weekly_hours, 1),
|
||||
estimated_monthly_hours_saved=round(p.total_time_hours, 1),
|
||||
implementation_hours=round(impl_hours, 1),
|
||||
roi_weeks=round(roi_weeks, 1),
|
||||
priority=_priority_from_roi(roi_weeks),
|
||||
affected_area="reliability",
|
||||
supporting_evidence=[f"{p.occurrences} occurrences in sessions: {', '.join(p.example_sessions[:3])}"],
|
||||
))
|
||||
|
||||
# 2. Slow tools → Optimize or replace
|
||||
for p in by_type.get("slow_tool", []):
|
||||
weekly_hours = p.total_time_hours / 4
|
||||
impl_hours = 3.0 * implementation_overhead # optimization task
|
||||
roi_weeks = impl_hours / weekly_hours if weekly_hours > 0 else float('inf')
|
||||
|
||||
proposals.append(ImprovementProposal(
|
||||
title=f"Optimize slow tool: {p.description[:60]}",
|
||||
problem=f"Tool has {p.occurrences} calls with high latency, wasting ~{p.total_time_hours:.1f} hours total.",
|
||||
proposed_solution="Profile the tool, add caching, or replace with a faster alternative.",
|
||||
estimated_weekly_hours_saved=round(weekly_hours, 1),
|
||||
estimated_monthly_hours_saved=round(p.total_time_hours, 1),
|
||||
implementation_hours=round(impl_hours, 1),
|
||||
roi_weeks=round(roi_weeks, 1),
|
||||
priority=_priority_from_roi(roi_weeks),
|
||||
affected_area="performance",
|
||||
supporting_evidence=[f"{p.occurrences} slow calls detected"],
|
||||
))
|
||||
|
||||
# 3. Failed retries → Add retry logic or validation
|
||||
for p in by_type.get("failed_retry", []):
|
||||
weekly_hours = p.total_time_hours / 4
|
||||
impl_hours = 2.0 * implementation_overhead
|
||||
roi_weeks = impl_hours / weekly_hours if weekly_hours > 0 else float('inf')
|
||||
|
||||
proposals.append(ImprovementProposal(
|
||||
title=f"Reduce retries for tool '{p.description[:50]}'",
|
||||
problem=f"Tool had {p.occurrences} retry sequences, wasting ~{p.total_time_hours:.1f} hours.",
|
||||
proposed_solution="Add input validation, pre-flight checks, or automatic retry with backoff.",
|
||||
estimated_weekly_hours_saved=round(weekly_hours, 1),
|
||||
estimated_monthly_hours_saved=round(p.total_time_hours, 1),
|
||||
implementation_hours=round(impl_hours, 1),
|
||||
roi_weeks=round(roi_weeks, 1),
|
||||
priority=_priority_from_roi(roi_weeks),
|
||||
affected_area="reliability",
|
||||
supporting_evidence=[f"{p.occurrences} retry sequences detected"],
|
||||
))
|
||||
|
||||
# 4. Manual processes → Automate
|
||||
total_manual_hours = sum(p.total_time_hours for p in by_type.get("manual_process", []))
|
||||
manual_patterns = by_type.get("manual_process", [])
|
||||
if manual_patterns:
|
||||
weekly_hours = total_manual_hours / 4
|
||||
impl_hours = len(manual_patterns) * 2.0 * implementation_overhead
|
||||
roi_weeks = impl_hours / weekly_hours if weekly_hours > 0 else float('inf')
|
||||
|
||||
proposals.append(ImprovementProposal(
|
||||
title=f"Automate {len(manual_patterns)} manual processes",
|
||||
problem=f"{len(manual_patterns)} sessions had long manual tool sequences, wasting ~{total_manual_hours:.1f} hours.",
|
||||
proposed_solution="Create composite skills or scripts that combine the repeated tool sequences into single operations.",
|
||||
estimated_weekly_hours_saved=round(weekly_hours, 1),
|
||||
estimated_monthly_hours_saved=round(total_manual_hours, 1),
|
||||
implementation_hours=round(impl_hours, 1),
|
||||
roi_weeks=round(roi_weeks, 1),
|
||||
priority=_priority_from_roi(roi_weeks),
|
||||
affected_area="automation",
|
||||
supporting_evidence=[f"{len(manual_patterns)} manual sessions detected"],
|
||||
))
|
||||
|
||||
return sorted(proposals, key=lambda p: p.estimated_monthly_hours_saved, reverse=True)
|
||||
|
||||
|
||||
def _priority_from_roi(roi_weeks: float) -> str:
|
||||
"""Determine priority from ROI break-even time."""
|
||||
if roi_weeks <= 1:
|
||||
return "critical"
|
||||
elif roi_weeks <= 4:
|
||||
return "high"
|
||||
elif roi_weeks <= 12:
|
||||
return "medium"
|
||||
return "low"
|
||||
|
||||
|
||||
# ── Output formatting ─────────────────────────────────────────
|
||||
|
||||
def format_proposals_markdown(
|
||||
proposals: List[ImprovementProposal],
|
||||
patterns: List[WastePattern],
|
||||
generated_at: str,
|
||||
) -> str:
|
||||
"""Format proposals as a markdown document."""
|
||||
lines = [
|
||||
"# Improvement Proposals",
|
||||
"",
|
||||
f"Generated: {generated_at}",
|
||||
f"Based on analysis of {sum(p.occurrences for p in patterns)} waste events across {len(set(r for p in patterns for r in p.affected_repos))} repos.",
|
||||
"",
|
||||
"---",
|
||||
"",
|
||||
"## Summary",
|
||||
"",
|
||||
f"| Metric | Value |",
|
||||
f"|--------|-------|",
|
||||
f"| Total proposals | {len(proposals)} |",
|
||||
f"| Critical priority | {sum(1 for p in proposals if p.priority == 'critical')} |",
|
||||
f"| Total monthly hours wasted | {sum(p.estimated_monthly_hours_saved for p in proposals):.1f}h |",
|
||||
f"| Total weekly hours recoverable | {sum(p.estimated_weekly_hours_saved for p in proposals):.1f}h |",
|
||||
f"| Implementation investment | {sum(p.implementation_hours for p in proposals):.1f}h |",
|
||||
"",
|
||||
"---",
|
||||
"",
|
||||
]
|
||||
|
||||
for i, prop in enumerate(proposals, 1):
|
||||
lines.extend([
|
||||
f"## {i}. {prop.title}",
|
||||
"",
|
||||
f"**Priority:** {prop.priority.upper()} ",
|
||||
f"**Area:** {prop.affected_area} ",
|
||||
f"**ROI break-even:** {prop.roi_weeks:.1f} weeks",
|
||||
"",
|
||||
"### Problem",
|
||||
"",
|
||||
prop.problem,
|
||||
"",
|
||||
"### Proposed Solution",
|
||||
"",
|
||||
prop.proposed_solution,
|
||||
"",
|
||||
"### ROI Estimate",
|
||||
"",
|
||||
f"- Weekly hours saved: **{prop.estimated_weekly_hours_saved}h**",
|
||||
f"- Monthly hours saved: **{prop.estimated_monthly_hours_saved}h**",
|
||||
f"- Implementation time: **{prop.implementation_hours}h**",
|
||||
f"- Break-even: **{prop.roi_weeks:.1f} weeks**",
|
||||
"",
|
||||
])
|
||||
|
||||
if prop.supporting_evidence:
|
||||
lines.extend([
|
||||
"### Evidence",
|
||||
"",
|
||||
])
|
||||
for ev in prop.supporting_evidence:
|
||||
lines.append(f"- {ev}")
|
||||
lines.append("")
|
||||
|
||||
lines.extend(["---", ""])
|
||||
|
||||
# Waste pattern appendix
|
||||
lines.extend([
|
||||
"## Appendix: Detected Waste Patterns",
|
||||
"",
|
||||
"| Type | Description | Occurrences | Hours |",
|
||||
"|------|-------------|-------------|-------|",
|
||||
])
|
||||
for p in patterns[:20]:
|
||||
lines.append(
|
||||
f"| {p.pattern_type} | {p.description[:60]} | {p.occurrences} | {p.total_time_hours}h |"
|
||||
)
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def format_proposals_json(proposals: List[ImprovementProposal]) -> str:
|
||||
"""Format proposals as JSON."""
|
||||
return json.dumps(
|
||||
[asdict(p) for p in proposals],
|
||||
indent=2,
|
||||
default=str,
|
||||
)
|
||||
|
||||
|
||||
# ── Main ──────────────────────────────────────────────────────
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Generate improvement proposals from session analytics")
|
||||
parser.add_argument("--input", required=True, help="Path to session analytics JSON file")
|
||||
parser.add_argument("--output", help="Output file path (default: stdout)")
|
||||
parser.add_argument("--format", choices=["markdown", "json"], default="markdown", help="Output format")
|
||||
parser.add_argument("--hourly-rate", type=float, default=50.0, help="Developer hourly rate for ROI calc")
|
||||
parser.add_argument("--threshold", type=float, default=2.0, help="Min monthly hours to include proposal")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
with open(args.input) as f:
|
||||
data = json.load(f)
|
||||
|
||||
sessions = data if isinstance(data, list) else data.get("sessions", [])
|
||||
|
||||
if not sessions:
|
||||
print("No sessions found in input file.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Analyze
|
||||
patterns = analyze_sessions(sessions)
|
||||
proposals = generate_proposals(patterns, hourly_rate=args.hourly_rate)
|
||||
|
||||
# Filter by threshold
|
||||
proposals = [p for p in proposals if p.estimated_monthly_hours_saved >= args.threshold]
|
||||
|
||||
generated_at = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
if args.format == "markdown":
|
||||
output = format_proposals_markdown(proposals, patterns, generated_at)
|
||||
else:
|
||||
output = format_proposals_json(proposals)
|
||||
|
||||
if args.output:
|
||||
with open(args.output, "w") as f:
|
||||
f.write(output)
|
||||
print(f"Wrote {len(proposals)} proposals to {args.output}", file=sys.stderr)
|
||||
else:
|
||||
print(output)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
275
scripts/knowledge_gap_identifier.py
Normal file
275
scripts/knowledge_gap_identifier.py
Normal file
@@ -0,0 +1,275 @@
|
||||
"""
|
||||
Knowledge Gap Identifier — Pipeline 10.7
|
||||
|
||||
Cross-references code, docs, and tests to find gaps:
|
||||
- Undocumented functions/classes
|
||||
- Untested code paths
|
||||
- Documented but missing implementations
|
||||
- Test files without corresponding source
|
||||
|
||||
Produces a gap report with severity and suggestions.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import ast
|
||||
import os
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional, Set
|
||||
|
||||
|
||||
class GapSeverity(Enum):
|
||||
INFO = "info"
|
||||
WARNING = "warning"
|
||||
ERROR = "error"
|
||||
|
||||
|
||||
class GapType(Enum):
|
||||
UNDOCUMENTED = "undocumented"
|
||||
UNTESTED = "untested"
|
||||
MISSING_IMPLEMENTATION = "missing_implementation"
|
||||
ORPHAN_TEST = "orphan_test"
|
||||
STALE_DOC = "stale_doc"
|
||||
|
||||
|
||||
@dataclass
|
||||
class Gap:
|
||||
"""A single knowledge gap."""
|
||||
gap_type: GapType
|
||||
severity: GapSeverity
|
||||
file: str
|
||||
line: Optional[int]
|
||||
name: str
|
||||
description: str
|
||||
suggestion: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class GapReport:
|
||||
"""Full gap analysis report."""
|
||||
repo_path: str
|
||||
gaps: List[Gap] = field(default_factory=list)
|
||||
stats: Dict[str, int] = field(default_factory=dict)
|
||||
|
||||
def summary(self) -> str:
|
||||
lines = [f"Gap Report for {self.repo_path}", "=" * 40]
|
||||
by_type = {}
|
||||
for g in self.gaps:
|
||||
by_type.setdefault(g.gap_type.value, []).append(g)
|
||||
|
||||
for gtype, items in sorted(by_type.items()):
|
||||
lines.append(f"\n{gtype.upper()} ({len(items)}):")
|
||||
for g in items:
|
||||
loc = f"{g.file}:{g.line}" if g.line else g.file
|
||||
lines.append(f" [{g.severity.value}] {g.name} @ {loc}")
|
||||
lines.append(f" {g.description}")
|
||||
|
||||
lines.append(f"\nTotal gaps: {len(self.gaps)}")
|
||||
self.stats = {k: len(v) for k, v in by_type.items()}
|
||||
return "\n".join(lines)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"repo_path": self.repo_path,
|
||||
"total_gaps": len(self.gaps),
|
||||
"stats": {k: len(v) for k, v in
|
||||
{gt: [g for g in self.gaps if g.gap_type == gt]
|
||||
for gt in GapType}.items() if v},
|
||||
"gaps": [
|
||||
{
|
||||
"type": g.gap_type.value,
|
||||
"severity": g.severity.value,
|
||||
"file": g.file,
|
||||
"line": g.line,
|
||||
"name": g.name,
|
||||
"description": g.description,
|
||||
"suggestion": g.suggestion,
|
||||
}
|
||||
for g in self.gaps
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def _collect_python_files(root: Path) -> List[Path]:
|
||||
"""Collect .py files, excluding venv/node_modules/.git."""
|
||||
skip = {".git", "venv", "env", ".venv", "node_modules", "__pycache__", ".tox", ".mypy_cache"}
|
||||
files = []
|
||||
for dirpath, dirnames, filenames in os.walk(root):
|
||||
dirnames[:] = [d for d in dirnames if d not in skip]
|
||||
for f in filenames:
|
||||
if f.endswith(".py"):
|
||||
files.append(Path(dirpath) / f)
|
||||
return files
|
||||
|
||||
|
||||
def _extract_python_symbols(filepath: Path) -> Set[str]:
|
||||
"""Extract top-level function and class names from a Python file."""
|
||||
symbols = set()
|
||||
try:
|
||||
source = filepath.read_text(encoding="utf-8", errors="replace")
|
||||
tree = ast.parse(source, filename=str(filepath))
|
||||
except (SyntaxError, UnicodeDecodeError):
|
||||
return symbols
|
||||
|
||||
for node in ast.iter_child_nodes(tree):
|
||||
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
|
||||
symbols.add(node.name)
|
||||
return symbols
|
||||
|
||||
|
||||
def _extract_doc_symbols(filepath: Path) -> Set[str]:
|
||||
"""Extract function/class names mentioned in markdown docs."""
|
||||
symbols = set()
|
||||
try:
|
||||
text = filepath.read_text(encoding="utf-8", errors="replace")
|
||||
except (UnicodeDecodeError, OSError):
|
||||
return symbols
|
||||
|
||||
# Match backtick-quoted identifiers: `ClassName`, `func_name`, `func()`
|
||||
for m in re.finditer(r"`([A-Za-z_]\w+)(?:\(\))?`", text):
|
||||
symbols.add(m.group(1))
|
||||
# Match ## ClassName or ### func_name headings
|
||||
for m in re.finditer(r"^#{1,4}\s+(\w+)", text, re.MULTILINE):
|
||||
symbols.add(m.group(1))
|
||||
return symbols
|
||||
|
||||
|
||||
def _collect_test_files(root: Path) -> Dict[str, Path]:
|
||||
"""Map test module names to their file paths."""
|
||||
test_map = {}
|
||||
for dirpath, dirnames, filenames in os.walk(root):
|
||||
dirnames[:] = [d for d in dirnames if d not in {".git", "venv", "node_modules"}]
|
||||
for f in filenames:
|
||||
if f.startswith("test_") and f.endswith(".py"):
|
||||
# test_foo.py -> foo
|
||||
module_name = f[5:-3]
|
||||
test_map[module_name] = Path(dirpath) / f
|
||||
return test_map
|
||||
|
||||
|
||||
class KnowledgeGapIdentifier:
|
||||
"""Analyzes a repo for knowledge gaps between code, docs, and tests."""
|
||||
|
||||
def analyze(self, repo_path: str) -> GapReport:
|
||||
root = Path(repo_path).resolve()
|
||||
report = GapReport(repo_path=str(root))
|
||||
|
||||
if not root.is_dir():
|
||||
report.gaps.append(Gap(
|
||||
gap_type=GapType.UNDOCUMENTED,
|
||||
severity=GapSeverity.ERROR,
|
||||
file=str(root),
|
||||
line=None,
|
||||
name="repo",
|
||||
description="Path is not a directory",
|
||||
suggestion="Provide a valid repo directory",
|
||||
))
|
||||
return report
|
||||
|
||||
# Collect artifacts
|
||||
py_files = _collect_python_files(root)
|
||||
doc_files = list(root.glob("docs/**/*.md")) + list(root.glob("*.md"))
|
||||
test_map = _collect_test_files(root / "tests") if (root / "tests").is_dir() else {}
|
||||
|
||||
# Extract symbols from each source file
|
||||
source_symbols: Dict[str, Set[str]] = {} # relative_path -> symbols
|
||||
all_source_symbols: Set[str] = set()
|
||||
|
||||
for pf in py_files:
|
||||
rel = str(pf.relative_to(root))
|
||||
# Skip test files and setup/config
|
||||
if "/tests/" in rel or rel.startswith("tests/") or rel.startswith("test_"):
|
||||
continue
|
||||
if pf.name in ("setup.py", "conftest.py", "conf.py"):
|
||||
continue
|
||||
|
||||
syms = _extract_python_symbols(pf)
|
||||
if syms:
|
||||
source_symbols[rel] = syms
|
||||
all_source_symbols.update(syms)
|
||||
|
||||
# Extract documented symbols
|
||||
doc_symbols: Set[str] = set()
|
||||
for df in doc_files:
|
||||
doc_symbols.update(_extract_doc_symbols(df))
|
||||
|
||||
# Extract test-covered symbols
|
||||
tested_modules: Set[str] = set(test_map.keys())
|
||||
|
||||
# --- Find gaps ---
|
||||
|
||||
# 1. Undocumented: source symbols not in any doc
|
||||
for rel_path, syms in source_symbols.items():
|
||||
for sym in sorted(syms):
|
||||
if sym.startswith("_") and not sym.startswith("__"):
|
||||
continue # Skip private
|
||||
if sym not in doc_symbols:
|
||||
report.gaps.append(Gap(
|
||||
gap_type=GapType.UNDOCUMENTED,
|
||||
severity=GapSeverity.WARNING,
|
||||
file=rel_path,
|
||||
line=None,
|
||||
name=sym,
|
||||
description=f"{sym} defined in {rel_path} but not referenced in any docs",
|
||||
suggestion=f"Add documentation for {sym} in a .md file",
|
||||
))
|
||||
|
||||
# 2. Untested: source modules without a corresponding test file
|
||||
for rel_path in source_symbols:
|
||||
module_name = Path(rel_path).stem
|
||||
if module_name not in tested_modules and module_name not in ("__init__", "main", "config"):
|
||||
report.gaps.append(Gap(
|
||||
gap_type=GapType.UNTESTED,
|
||||
severity=GapSeverity.ERROR,
|
||||
file=rel_path,
|
||||
line=None,
|
||||
name=module_name,
|
||||
description=f"No test file found for {rel_path}",
|
||||
suggestion=f"Create tests/test_{module_name}.py",
|
||||
))
|
||||
|
||||
# 3. Missing implementation: doc references symbol not in any source
|
||||
referenced_but_missing = doc_symbols - all_source_symbols
|
||||
for sym in sorted(referenced_but_missing):
|
||||
# Filter out common non-code terms
|
||||
if sym.lower() in {"todo", "fixme", "note", "example", "usage", "api",
|
||||
"install", "setup", "config", "license", "contributing",
|
||||
"changelog", "readme", "python", "bash", "json", "yaml",
|
||||
"http", "url", "cli", "gui", "ui", "api", "rest"}:
|
||||
continue
|
||||
if len(sym) < 3:
|
||||
continue
|
||||
report.gaps.append(Gap(
|
||||
gap_type=GapType.MISSING_IMPLEMENTATION,
|
||||
severity=GapSeverity.INFO,
|
||||
file="(docs)",
|
||||
line=None,
|
||||
name=sym,
|
||||
description=f"{sym} referenced in docs but not found in source code",
|
||||
suggestion=f"Verify if {sym} should be implemented or update docs",
|
||||
))
|
||||
|
||||
# 4. Orphan tests: test files without matching source
|
||||
for test_mod, test_path in test_map.items():
|
||||
if test_mod not in tested_modules and not any(
|
||||
test_mod in Path(f).stem for f in source_symbols
|
||||
):
|
||||
# Check if any source file partially matches
|
||||
matches_source = any(test_mod.replace("_", "-") in f or test_mod.replace("_", "") in Path(f).stem
|
||||
for f in source_symbols)
|
||||
if not matches_source:
|
||||
rel = str(test_path.relative_to(root))
|
||||
report.gaps.append(Gap(
|
||||
gap_type=GapType.ORPHAN_TEST,
|
||||
severity=GapSeverity.WARNING,
|
||||
file=rel,
|
||||
line=None,
|
||||
name=test_mod,
|
||||
description=f"Test file {rel} exists but no matching source module found",
|
||||
suggestion=f"Verify if the source was renamed or removed",
|
||||
))
|
||||
|
||||
return report
|
||||
131
scripts/knowledge_staleness_check.py
Normal file
131
scripts/knowledge_staleness_check.py
Normal file
@@ -0,0 +1,131 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Knowledge Store Staleness Detector — Detect stale knowledge entries by comparing source file hashes.
|
||||
|
||||
Usage:
|
||||
python3 scripts/knowledge_staleness_check.py --index knowledge/index.json
|
||||
python3 scripts/knowledge_staleness_check.py --index knowledge/index.json --json
|
||||
python3 scripts/knowledge_staleness_check.py --index knowledge/index.json --fix
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import hashlib
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any, Optional
|
||||
|
||||
|
||||
def compute_file_hash(filepath: str) -> Optional[str]:
|
||||
"""Compute SHA-256 hash of a file. Returns None if file doesn't exist."""
|
||||
try:
|
||||
with open(filepath, "rb") as f:
|
||||
return "sha256:" + hashlib.sha256(f.read()).hexdigest()
|
||||
except (FileNotFoundError, IsADirectoryError, PermissionError):
|
||||
return None
|
||||
|
||||
|
||||
def check_staleness(index_path: str, repo_root: str = ".") -> List[Dict[str, Any]]:
|
||||
"""Check all entries in knowledge index for staleness.
|
||||
|
||||
Returns list of entries with staleness info:
|
||||
- status: "fresh" | "stale" | "missing_source" | "no_hash"
|
||||
- current_hash: computed hash (if source exists)
|
||||
- stored_hash: hash from index
|
||||
"""
|
||||
with open(index_path) as f:
|
||||
data = json.load(f)
|
||||
|
||||
facts = data.get("facts", [])
|
||||
results = []
|
||||
|
||||
for entry in facts:
|
||||
source_file = entry.get("source_file")
|
||||
stored_hash = entry.get("source_hash")
|
||||
|
||||
if not source_file:
|
||||
results.append({**entry, "status": "no_source", "current_hash": None})
|
||||
continue
|
||||
|
||||
full_path = os.path.join(repo_root, source_file)
|
||||
current_hash = compute_file_hash(full_path)
|
||||
|
||||
if current_hash is None:
|
||||
results.append({**entry, "status": "missing_source", "current_hash": None})
|
||||
elif not stored_hash:
|
||||
results.append({**entry, "status": "no_hash", "current_hash": current_hash})
|
||||
elif current_hash != stored_hash:
|
||||
results.append({**entry, "status": "stale", "current_hash": current_hash})
|
||||
else:
|
||||
results.append({**entry, "status": "fresh", "current_hash": current_hash})
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def fix_hashes(index_path: str, repo_root: str = ".") -> int:
|
||||
"""Add hashes to entries missing them. Returns count of fixed entries."""
|
||||
with open(index_path) as f:
|
||||
data = json.load(f)
|
||||
|
||||
fixed = 0
|
||||
for entry in data.get("facts", []):
|
||||
if entry.get("source_hash"):
|
||||
continue
|
||||
source_file = entry.get("source_file")
|
||||
if not source_file:
|
||||
continue
|
||||
full_path = os.path.join(repo_root, source_file)
|
||||
h = compute_file_hash(full_path)
|
||||
if h:
|
||||
entry["source_hash"] = h
|
||||
fixed += 1
|
||||
|
||||
with open(index_path, "w") as f:
|
||||
json.dump(data, f, indent=2)
|
||||
|
||||
return fixed
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Check knowledge store staleness")
|
||||
parser.add_argument("--index", required=True, help="Path to knowledge/index.json")
|
||||
parser.add_argument("--repo", default=".", help="Repo root for source file resolution")
|
||||
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||
parser.add_argument("--fix", action="store_true", help="Add hashes to entries missing them")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.fix:
|
||||
fixed = fix_hashes(args.index, args.repo)
|
||||
print(f"Fixed {fixed} entries with missing hashes.")
|
||||
return
|
||||
|
||||
results = check_staleness(args.index, args.repo)
|
||||
|
||||
if args.json:
|
||||
print(json.dumps(results, indent=2))
|
||||
else:
|
||||
stale = [r for r in results if r["status"] != "fresh"]
|
||||
fresh = [r for r in results if r["status"] == "fresh"]
|
||||
|
||||
print(f"Knowledge Store Staleness Check")
|
||||
print(f" Total entries: {len(results)}")
|
||||
print(f" Fresh: {len(fresh)}")
|
||||
print(f" Stale/Issues: {len(stale)}")
|
||||
print()
|
||||
|
||||
if stale:
|
||||
print("Issues found:")
|
||||
for r in stale:
|
||||
status = r["status"]
|
||||
fact = r.get("fact", "?")[:60]
|
||||
source = r.get("source_file", "?")
|
||||
print(f" [{status}] {source}: {fact}")
|
||||
else:
|
||||
print("All entries are fresh!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
551
scripts/perf_bottleneck_finder.py
Normal file
551
scripts/perf_bottleneck_finder.py
Normal file
@@ -0,0 +1,551 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Performance Bottleneck Finder — Identify slow tests, builds, and CI steps.
|
||||
|
||||
Analyzes:
|
||||
1. Pytest output for slow tests
|
||||
2. Build logs for slow steps
|
||||
3. CI workflow durations
|
||||
4. File system for large/slow artifacts
|
||||
|
||||
Usage:
|
||||
python3 scripts/perf_bottleneck_finder.py --repo /path/to/repo
|
||||
python3 scripts/perf_bottleneck_finder.py --repo /path/to/repo --json
|
||||
python3 scripts/perf_bottleneck_finder.py --repo /path/to/repo --report metrics/perf_report.md
|
||||
|
||||
Weekly cron:
|
||||
0 9 * * 1 cd /path/to/compounding-intelligence && python3 scripts/perf_bottleneck_finder.py --repo /path/to/target --report metrics/perf_report.md
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from dataclasses import dataclass, field, asdict
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
|
||||
# ── Configuration ──────────────────────────────────────────────────
|
||||
|
||||
SLOW_TEST_THRESHOLD_S = 2.0 # Tests slower than this are flagged
|
||||
SLOW_BUILD_STEP_THRESHOLD_S = 10.0
|
||||
TOP_N_BOTTLENECKS = 10 # Report top N bottlenecks
|
||||
PYTEST_DURATIONS_COUNT = 20 # Number of slow tests to collect
|
||||
LOG_EXTENSIONS = {".log", ".txt"}
|
||||
|
||||
|
||||
@dataclass
|
||||
class Bottleneck:
|
||||
"""A single performance bottleneck."""
|
||||
category: str # "test", "build", "ci", "artifact", "import"
|
||||
name: str # What's slow
|
||||
duration_s: float # How long it takes
|
||||
severity: str # "critical", "warning", "info"
|
||||
recommendation: str # How to fix
|
||||
file_path: Optional[str] = None
|
||||
line_number: Optional[int] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class PerfReport:
|
||||
"""Full performance report."""
|
||||
timestamp: str
|
||||
repo_path: str
|
||||
bottlenecks: List[Bottleneck] = field(default_factory=list)
|
||||
summary: Dict[str, Any] = field(default_factory=dict)
|
||||
test_stats: Dict[str, Any] = field(default_factory=dict)
|
||||
build_stats: Dict[str, Any] = field(default_factory=dict)
|
||||
ci_stats: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
d = asdict(self)
|
||||
return d
|
||||
|
||||
|
||||
# ── Test Analysis ──────────────────────────────────────────────────
|
||||
|
||||
def find_slow_tests_pytest(repo_path: str) -> List[Bottleneck]:
|
||||
"""Run pytest --durations and parse slow tests."""
|
||||
bottlenecks = []
|
||||
|
||||
# Try to run pytest with durations
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["python3", "-m", "pytest", "--co", "-q", "--durations=0"],
|
||||
cwd=repo_path, capture_output=True, text=True, timeout=30
|
||||
)
|
||||
# If tests exist, try to get durations from last run
|
||||
durations_file = os.path.join(repo_path, ".pytest_cache", "v", "cache", "durations")
|
||||
if os.path.exists(durations_file):
|
||||
with open(durations_file) as f:
|
||||
for line in f:
|
||||
parts = line.strip().split()
|
||||
if len(parts) >= 2:
|
||||
try:
|
||||
duration = float(parts[0])
|
||||
test_name = " ".join(parts[1:])
|
||||
if duration > SLOW_TEST_THRESHOLD_S:
|
||||
severity = "critical" if duration > 10 else "warning"
|
||||
bottlenecks.append(Bottleneck(
|
||||
category="test",
|
||||
name=test_name,
|
||||
duration_s=duration,
|
||||
severity=severity,
|
||||
recommendation=f"Test takes {duration:.1f}s. Consider mocking slow I/O, using fixtures, or marking with @pytest.mark.slow."
|
||||
))
|
||||
except ValueError:
|
||||
continue
|
||||
except (subprocess.TimeoutExpired, FileNotFoundError):
|
||||
pass
|
||||
|
||||
return bottlenecks
|
||||
|
||||
|
||||
def find_slow_tests_by_scan(repo_path: str) -> List[Bottleneck]:
|
||||
"""Scan test files for patterns that indicate slow tests."""
|
||||
bottlenecks = []
|
||||
test_patterns = [
|
||||
(r"time\.sleep\((\d+(?:\.\d+)?)\)", "Contains time.sleep() — consider using mock or async wait"),
|
||||
(r"subprocess\.run\(.*timeout=(\d+)", "Subprocess with timeout — may block test"),
|
||||
(r"requests\.(get|post|put|delete)\(", "Real HTTP call — mock with responses or httpretty"),
|
||||
(r"open\\([^)]*)[\x27\x22]w[\x27\x22]", "File I/O in test — use tmp_path fixture"),
|
||||
]
|
||||
|
||||
for root, dirs, files in os.walk(repo_path):
|
||||
# Skip hidden and cache dirs
|
||||
dirs[:] = [d for d in dirs if not d.startswith(('.', '__pycache__', 'node_modules', '.git'))]
|
||||
|
||||
for fname in files:
|
||||
if not (fname.startswith("test_") or fname.endswith("_test.py")):
|
||||
continue
|
||||
if not fname.endswith(".py"):
|
||||
continue
|
||||
|
||||
fpath = os.path.join(root, fname)
|
||||
rel_path = os.path.relpath(fpath, repo_path)
|
||||
|
||||
try:
|
||||
with open(fpath) as f:
|
||||
lines = f.readlines()
|
||||
except (PermissionError, UnicodeDecodeError):
|
||||
continue
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
for pattern, recommendation in test_patterns:
|
||||
match = re.search(pattern, line)
|
||||
if match:
|
||||
duration = 1.0 # Default estimate
|
||||
if "sleep" in pattern:
|
||||
try:
|
||||
duration = float(match.group(1))
|
||||
except (ValueError, IndexError):
|
||||
duration = 1.0
|
||||
elif "timeout" in pattern:
|
||||
try:
|
||||
duration = float(match.group(1))
|
||||
except (ValueError, IndexError):
|
||||
duration = 10.0
|
||||
else:
|
||||
duration = 2.0 # Estimated
|
||||
|
||||
bottlenecks.append(Bottleneck(
|
||||
category="test",
|
||||
name=f"{rel_path}:{i+1}",
|
||||
duration_s=duration,
|
||||
severity="warning" if duration < 5 else "critical",
|
||||
recommendation=recommendation,
|
||||
file_path=rel_path,
|
||||
line_number=i + 1
|
||||
))
|
||||
|
||||
return bottlenecks
|
||||
|
||||
|
||||
# ── Build Analysis ─────────────────────────────────────────────────
|
||||
|
||||
def analyze_build_artifacts(repo_path: str) -> List[Bottleneck]:
|
||||
"""Find large build artifacts that slow down builds."""
|
||||
bottlenecks = []
|
||||
large_dirs = {
|
||||
"node_modules": "Consider using npm ci --production or yarn --production",
|
||||
"__pycache__": "Consider .gitignore and cleaning before builds",
|
||||
".tox": "Consider caching tox environments",
|
||||
".pytest_cache": "Consider cleaning between CI runs",
|
||||
"dist": "Check if dist/ artifacts are being rebuilt unnecessarily",
|
||||
"build": "Check if build/ artifacts are being rebuilt unnecessarily",
|
||||
".next": "Next.js cache — consider incremental builds",
|
||||
"venv": "Virtual env in repo — move outside or use Docker",
|
||||
}
|
||||
|
||||
for dirname, recommendation in large_dirs.items():
|
||||
dirpath = os.path.join(repo_path, dirname)
|
||||
if os.path.isdir(dirpath):
|
||||
total_size = 0
|
||||
file_count = 0
|
||||
for root, dirs, files in os.walk(dirpath):
|
||||
for f in files:
|
||||
try:
|
||||
fpath = os.path.join(root, f)
|
||||
total_size += os.path.getsize(fpath)
|
||||
file_count += 1
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
if total_size > 10 * 1024 * 1024: # > 10MB
|
||||
size_mb = total_size / (1024 * 1024)
|
||||
bottlenecks.append(Bottleneck(
|
||||
category="build",
|
||||
name=f"{dirname}/ ({size_mb:.1f}MB, {file_count} files)",
|
||||
duration_s=size_mb * 0.5, # Rough estimate
|
||||
severity="critical" if size_mb > 100 else "warning",
|
||||
recommendation=recommendation
|
||||
))
|
||||
|
||||
return bottlenecks
|
||||
|
||||
|
||||
def analyze_makefile_targets(repo_path: str) -> List[Bottleneck]:
|
||||
"""Analyze Makefile for potentially slow targets."""
|
||||
bottlenecks = []
|
||||
makefiles = []
|
||||
|
||||
for root, dirs, files in os.walk(repo_path):
|
||||
dirs[:] = [d for d in dirs if not d.startswith(('.', '__pycache__'))]
|
||||
for f in files:
|
||||
if f in ("Makefile", "makefile", "GNUmakefile"):
|
||||
makefiles.append(os.path.join(root, f))
|
||||
|
||||
slow_patterns = [
|
||||
(r"pip install", "pip install without --no-deps or constraints"),
|
||||
(r"npm install(?!.*--production)", "npm install without --production flag"),
|
||||
(r"docker build", "Docker build — consider multi-stage and layer caching"),
|
||||
(r"pytest(?!.*-x|--maxfail)", "pytest without early exit on failure"),
|
||||
(r"mypy|mypy --strict", "Type checking — consider incremental mode"),
|
||||
]
|
||||
|
||||
for mfile in makefiles:
|
||||
rel_path = os.path.relpath(mfile, repo_path)
|
||||
try:
|
||||
with open(mfile) as f:
|
||||
content = f.read()
|
||||
except (PermissionError, UnicodeDecodeError):
|
||||
continue
|
||||
|
||||
for pattern, recommendation in slow_patterns:
|
||||
if re.search(pattern, content):
|
||||
bottlenecks.append(Bottleneck(
|
||||
category="build",
|
||||
name=f"{rel_path}: {pattern}",
|
||||
duration_s=5.0,
|
||||
severity="info",
|
||||
recommendation=recommendation,
|
||||
file_path=rel_path
|
||||
))
|
||||
|
||||
return bottlenecks
|
||||
|
||||
|
||||
# ── CI Analysis ────────────────────────────────────────────────────
|
||||
|
||||
def analyze_github_actions(repo_path: str) -> List[Bottleneck]:
|
||||
"""Analyze GitHub Actions workflow files for inefficiencies."""
|
||||
bottlenecks = []
|
||||
workflow_dir = os.path.join(repo_path, ".github", "workflows")
|
||||
|
||||
if not os.path.isdir(workflow_dir):
|
||||
return bottlenecks
|
||||
|
||||
slow_patterns = [
|
||||
(r"runs-on:\s*ubuntu-latest", 0, "Consider caching dependencies between runs"),
|
||||
(r"npm install", 2, "Use npm ci instead of npm install for reproducible builds"),
|
||||
(r"pip install(?!.*--cache-dir)", 2, "Add --cache-dir or use pip cache action"),
|
||||
(r"docker build(?!.*--cache-from)", 5, "Use Docker layer caching"),
|
||||
(r"python -m pytest(?!.*-n|--numprocesses)", 3, "Consider pytest-xdist for parallel test execution"),
|
||||
]
|
||||
|
||||
for fname in os.listdir(workflow_dir):
|
||||
if not fname.endswith(('.yml', '.yaml')):
|
||||
continue
|
||||
|
||||
fpath = os.path.join(workflow_dir, fname)
|
||||
try:
|
||||
with open(fpath) as f:
|
||||
content = f.read()
|
||||
except (PermissionError, UnicodeDecodeError):
|
||||
continue
|
||||
|
||||
for pattern, est_savings, recommendation in slow_patterns:
|
||||
if re.search(pattern, content):
|
||||
bottlenecks.append(Bottleneck(
|
||||
category="ci",
|
||||
name=f"{fname}: {pattern}",
|
||||
duration_s=est_savings,
|
||||
severity="info",
|
||||
recommendation=recommendation,
|
||||
file_path=f".github/workflows/{fname}"
|
||||
))
|
||||
|
||||
return bottlenecks
|
||||
|
||||
|
||||
def analyze_gitea_ci(repo_path: str) -> List[Bottleneck]:
|
||||
"""Analyze Gitea/Drone CI config files."""
|
||||
bottlenecks = []
|
||||
ci_files = [".gitea/workflows", ".drone.yml", ".woodpecker.yml"]
|
||||
|
||||
for ci_path in ci_files:
|
||||
full_path = os.path.join(repo_path, ci_path)
|
||||
if os.path.isfile(full_path):
|
||||
try:
|
||||
with open(full_path) as f:
|
||||
content = f.read()
|
||||
except (PermissionError, UnicodeDecodeError):
|
||||
continue
|
||||
|
||||
if "pip install" in content and "--cache-dir" not in content:
|
||||
bottlenecks.append(Bottleneck(
|
||||
category="ci",
|
||||
name=f"{ci_path}: pip without cache",
|
||||
duration_s=5.0,
|
||||
severity="warning",
|
||||
recommendation="Add --cache-dir or mount pip cache volume",
|
||||
file_path=ci_path
|
||||
))
|
||||
|
||||
elif os.path.isdir(full_path):
|
||||
for fname in os.listdir(full_path):
|
||||
if not fname.endswith(('.yml', '.yaml')):
|
||||
continue
|
||||
fpath = os.path.join(full_path, fname)
|
||||
try:
|
||||
with open(fpath) as f:
|
||||
content = f.read()
|
||||
except (PermissionError, UnicodeDecodeError):
|
||||
continue
|
||||
|
||||
if "pip install" in content and "--cache-dir" not in content:
|
||||
bottlenecks.append(Bottleneck(
|
||||
category="ci",
|
||||
name=f"{ci_path}/{fname}: pip without cache",
|
||||
duration_s=5.0,
|
||||
severity="warning",
|
||||
recommendation="Add --cache-dir or mount pip cache volume",
|
||||
file_path=f"{ci_path}/{fname}"
|
||||
))
|
||||
|
||||
return bottlenecks
|
||||
|
||||
|
||||
# ── Import Analysis ────────────────────────────────────────────────
|
||||
|
||||
def find_slow_imports(repo_path: str) -> List[Bottleneck]:
|
||||
"""Find Python files with heavy import chains."""
|
||||
bottlenecks = []
|
||||
heavy_imports = {
|
||||
"pandas": 0.5,
|
||||
"numpy": 0.3,
|
||||
"torch": 2.0,
|
||||
"tensorflow": 3.0,
|
||||
"scipy": 0.5,
|
||||
"matplotlib": 0.8,
|
||||
"sklearn": 0.5,
|
||||
"transformers": 1.5,
|
||||
}
|
||||
|
||||
for root, dirs, files in os.walk(repo_path):
|
||||
dirs[:] = [d for d in dirs if not d.startswith(('.', '__pycache__', 'node_modules'))]
|
||||
for fname in files:
|
||||
if not fname.endswith(".py"):
|
||||
continue
|
||||
|
||||
fpath = os.path.join(root, fname)
|
||||
rel_path = os.path.relpath(fpath, repo_path)
|
||||
|
||||
try:
|
||||
with open(fpath) as f:
|
||||
lines = f.readlines()
|
||||
except (PermissionError, UnicodeDecodeError):
|
||||
continue
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
stripped = line.strip()
|
||||
if stripped.startswith("import ") or stripped.startswith("from "):
|
||||
for heavy, est_time in heavy_imports.items():
|
||||
if heavy in stripped:
|
||||
bottlenecks.append(Bottleneck(
|
||||
category="import",
|
||||
name=f"{rel_path}:{i+1}: import {heavy}",
|
||||
duration_s=est_time,
|
||||
severity="info" if est_time < 1.0 else "warning",
|
||||
recommendation=f"Heavy import ({heavy} ~{est_time}s). Consider lazy import or conditional import.",
|
||||
file_path=rel_path,
|
||||
line_number=i + 1
|
||||
))
|
||||
|
||||
return bottlenecks
|
||||
|
||||
|
||||
# ── Report Generation ──────────────────────────────────────────────
|
||||
|
||||
def severity_sort_key(b: Bottleneck) -> Tuple[int, float]:
|
||||
"""Sort by severity then duration."""
|
||||
sev_order = {"critical": 0, "warning": 1, "info": 2}
|
||||
return (sev_order.get(b.severity, 3), -b.duration_s)
|
||||
|
||||
|
||||
def generate_report(repo_path: str) -> PerfReport:
|
||||
"""Run all analyses and generate a performance report."""
|
||||
report = PerfReport(
|
||||
timestamp=datetime.now(timezone.utc).isoformat(),
|
||||
repo_path=os.path.abspath(repo_path)
|
||||
)
|
||||
|
||||
# Collect all bottlenecks
|
||||
all_bottlenecks = []
|
||||
|
||||
print("Scanning for slow tests (pytest cache)...")
|
||||
all_bottlenecks.extend(find_slow_tests_pytest(repo_path))
|
||||
|
||||
print("Scanning for slow test patterns...")
|
||||
all_bottlenecks.extend(find_slow_tests_by_scan(repo_path))
|
||||
|
||||
print("Analyzing build artifacts...")
|
||||
all_bottlenecks.extend(analyze_build_artifacts(repo_path))
|
||||
|
||||
print("Analyzing Makefiles...")
|
||||
all_bottlenecks.extend(analyze_makefile_targets(repo_path))
|
||||
|
||||
print("Analyzing CI workflows...")
|
||||
all_bottlenecks.extend(analyze_github_actions(repo_path))
|
||||
all_bottlenecks.extend(analyze_gitea_ci(repo_path))
|
||||
|
||||
print("Scanning for heavy imports...")
|
||||
all_bottlenecks.extend(find_slow_imports(repo_path))
|
||||
|
||||
# Sort by severity and duration
|
||||
all_bottlenecks.sort(key=severity_sort_key)
|
||||
report.bottlenecks = all_bottlenecks[:TOP_N_BOTTLENECKS * 2] # Keep more for stats
|
||||
|
||||
# Compute summary
|
||||
by_category = defaultdict(list)
|
||||
for b in all_bottlenecks:
|
||||
by_category[b.category].append(b)
|
||||
|
||||
report.summary = {
|
||||
"total_bottlenecks": len(all_bottlenecks),
|
||||
"critical": sum(1 for b in all_bottlenecks if b.severity == "critical"),
|
||||
"warning": sum(1 for b in all_bottlenecks if b.severity == "warning"),
|
||||
"info": sum(1 for b in all_bottlenecks if b.severity == "info"),
|
||||
"estimated_total_slowdown_s": sum(b.duration_s for b in all_bottlenecks),
|
||||
"by_category": {cat: len(items) for cat, items in by_category.items()},
|
||||
}
|
||||
|
||||
report.test_stats = {
|
||||
"slow_tests": len(by_category.get("test", [])),
|
||||
"total_estimated_s": sum(b.duration_s for b in by_category.get("test", [])),
|
||||
}
|
||||
|
||||
report.build_stats = {
|
||||
"build_issues": len(by_category.get("build", [])),
|
||||
"total_estimated_s": sum(b.duration_s for b in by_category.get("build", [])),
|
||||
}
|
||||
|
||||
report.ci_stats = {
|
||||
"ci_issues": len(by_category.get("ci", [])),
|
||||
"total_estimated_s": sum(b.duration_s for b in by_category.get("ci", [])),
|
||||
}
|
||||
|
||||
return report
|
||||
|
||||
|
||||
def format_markdown(report: PerfReport) -> str:
|
||||
"""Format report as markdown."""
|
||||
lines = []
|
||||
lines.append(f"# Performance Bottleneck Report")
|
||||
lines.append(f"")
|
||||
lines.append(f"Generated: {report.timestamp}")
|
||||
lines.append(f"Repository: {report.repo_path}")
|
||||
lines.append(f"")
|
||||
|
||||
# Summary
|
||||
s = report.summary
|
||||
lines.append(f"## Summary")
|
||||
lines.append(f"")
|
||||
lines.append(f"- **Total bottlenecks:** {s['total_bottlenecks']}")
|
||||
lines.append(f"- **Critical:** {s['critical']} | **Warning:** {s['warning']} | **Info:** {s['info']}")
|
||||
lines.append(f"- **Estimated total slowdown:** {s['estimated_total_slowdown_s']:.1f}s")
|
||||
lines.append(f"- **By category:** {', '.join(f'{k}: {v}' for k, v in s['by_category'].items())}")
|
||||
lines.append(f"")
|
||||
|
||||
# Top bottlenecks
|
||||
lines.append(f"## Top {min(TOP_N_BOTTLENECKS, len(report.bottlenecks))} Bottlenecks")
|
||||
lines.append(f"")
|
||||
|
||||
for i, b in enumerate(report.bottlenecks[:TOP_N_BOTTLENECKS], 1):
|
||||
icon = {"critical": "🔴", "warning": "🟡", "info": "🔵"}.get(b.severity, "⚪")
|
||||
loc = f" ({b.file_path}:{b.line_number})" if b.file_path else ""
|
||||
lines.append(f"{i}. {icon} **{b.category}** — {b.name}{loc}")
|
||||
lines.append(f" - Duration: ~{b.duration_s:.1f}s | Severity: {b.severity}")
|
||||
lines.append(f" - Fix: {b.recommendation}")
|
||||
lines.append(f"")
|
||||
|
||||
# Category breakdowns
|
||||
for cat in ["test", "build", "ci", "import"]:
|
||||
items = [b for b in report.bottlenecks if b.category == cat]
|
||||
if items:
|
||||
lines.append(f"## {cat.title()} Bottlenecks")
|
||||
lines.append(f"")
|
||||
for b in items:
|
||||
icon = {"critical": "🔴", "warning": "🟡", "info": "🔵"}.get(b.severity, "⚪")
|
||||
loc = f" ({b.file_path}:{b.line_number})" if b.file_path else ""
|
||||
lines.append(f"- {icon} {b.name}{loc} — ~{b.duration_s:.1f}s — {b.recommendation}")
|
||||
lines.append(f"")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
|
||||
# ── Main ───────────────────────────────────────────────────────────
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Performance Bottleneck Finder")
|
||||
parser.add_argument("--repo", default=".", help="Path to repository to analyze")
|
||||
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||
parser.add_argument("--report", help="Write markdown report to file")
|
||||
parser.add_argument("--threshold", type=float, default=SLOW_TEST_THRESHOLD_S,
|
||||
help="Slow test threshold in seconds")
|
||||
args = parser.parse_args()
|
||||
|
||||
# Threshold override handled via module-level default
|
||||
# (scan_tests uses SLOW_TEST_THRESHOLD_S from module scope)
|
||||
|
||||
if not os.path.isdir(args.repo):
|
||||
print(f"Error: {args.repo} is not a directory", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
report = generate_report(args.repo)
|
||||
|
||||
if args.json:
|
||||
print(json.dumps(report.to_dict(), indent=2))
|
||||
else:
|
||||
md = format_markdown(report)
|
||||
if args.report:
|
||||
os.makedirs(os.path.dirname(args.report) or ".", exist_ok=True)
|
||||
with open(args.report, "w") as f:
|
||||
f.write(md)
|
||||
print(f"Report written to {args.report}")
|
||||
else:
|
||||
print(md)
|
||||
|
||||
# Exit code: 1 if critical bottlenecks found
|
||||
if report.summary.get("critical", 0) > 0:
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
351
scripts/pr_complexity_scorer.py
Normal file
351
scripts/pr_complexity_scorer.py
Normal file
@@ -0,0 +1,351 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
PR Complexity Scorer - Estimate review effort for PRs.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from dataclasses import dataclass, asdict
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
|
||||
GITEA_BASE = "https://forge.alexanderwhitestone.com/api/v1"
|
||||
|
||||
DEPENDENCY_FILES = {
|
||||
"requirements.txt", "pyproject.toml", "setup.py", "setup.cfg",
|
||||
"Pipfile", "poetry.lock", "package.json", "yarn.lock", "Gemfile",
|
||||
"go.mod", "Cargo.toml", "pom.xml", "build.gradle"
|
||||
}
|
||||
|
||||
TEST_PATTERNS = [
|
||||
r"tests?/.*\.py$", r".*_test\.py$", r"test_.*\.py$",
|
||||
r"spec/.*\.rb$", r".*_spec\.rb$",
|
||||
r"__tests__/", r".*\.test\.(js|ts|jsx|tsx)$"
|
||||
]
|
||||
|
||||
WEIGHT_FILES = 0.25
|
||||
WEIGHT_LINES = 0.25
|
||||
WEIGHT_DEPS = 0.30
|
||||
WEIGHT_TEST_COV = 0.20
|
||||
|
||||
SMALL_FILES = 5
|
||||
MEDIUM_FILES = 20
|
||||
LARGE_FILES = 50
|
||||
|
||||
SMALL_LINES = 100
|
||||
MEDIUM_LINES = 500
|
||||
LARGE_LINES = 2000
|
||||
|
||||
TIME_PER_POINT = {1: 5, 2: 10, 3: 15, 4: 20, 5: 25, 6: 30, 7: 45, 8: 60, 9: 90, 10: 120}
|
||||
|
||||
|
||||
@dataclass
|
||||
class PRComplexity:
|
||||
pr_number: int
|
||||
title: str
|
||||
files_changed: int
|
||||
additions: int
|
||||
deletions: int
|
||||
has_dependency_changes: bool
|
||||
test_coverage_delta: Optional[int]
|
||||
score: int
|
||||
estimated_minutes: int
|
||||
reasons: List[str]
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return asdict(self)
|
||||
|
||||
|
||||
class GiteaClient:
|
||||
def __init__(self, token: str):
|
||||
self.token = token
|
||||
self.base_url = GITEA_BASE.rstrip("/")
|
||||
|
||||
def _request(self, path: str, params: Dict = None) -> Any:
|
||||
url = f"{self.base_url}{path}"
|
||||
if params:
|
||||
qs = "&".join(f"{k}={v}" for k, v in params.items() if v is not None)
|
||||
url += f"?{qs}"
|
||||
|
||||
req = urllib.request.Request(url)
|
||||
req.add_header("Authorization", f"token {self.token}")
|
||||
req.add_header("Content-Type", "application/json")
|
||||
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=30) as resp:
|
||||
return json.loads(resp.read().decode())
|
||||
except urllib.error.HTTPError as e:
|
||||
print(f"API error {e.code}: {e.read().decode()[:200]}", file=sys.stderr)
|
||||
return None
|
||||
except urllib.error.URLError as e:
|
||||
print(f"Network error: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
def get_open_prs(self, org: str, repo: str) -> List[Dict]:
|
||||
prs = []
|
||||
page = 1
|
||||
while True:
|
||||
batch = self._request(f"/repos/{org}/{repo}/pulls", {"limit": 50, "page": page, "state": "open"})
|
||||
if not batch:
|
||||
break
|
||||
prs.extend(batch)
|
||||
if len(batch) < 50:
|
||||
break
|
||||
page += 1
|
||||
return prs
|
||||
|
||||
def get_pr_files(self, org: str, repo: str, pr_number: int) -> List[Dict]:
|
||||
files = []
|
||||
page = 1
|
||||
while True:
|
||||
batch = self._request(
|
||||
f"/repos/{org}/{repo}/pulls/{pr_number}/files",
|
||||
{"limit": 100, "page": page}
|
||||
)
|
||||
if not batch:
|
||||
break
|
||||
files.extend(batch)
|
||||
if len(batch) < 100:
|
||||
break
|
||||
page += 1
|
||||
return files
|
||||
|
||||
def post_comment(self, org: str, repo: str, pr_number: int, body: str) -> bool:
|
||||
data = json.dumps({"body": body}).encode("utf-8")
|
||||
req = urllib.request.Request(
|
||||
f"{self.base_url}/repos/{org}/{repo}/issues/{pr_number}/comments",
|
||||
data=data,
|
||||
method="POST",
|
||||
headers={"Authorization": f"token {self.token}", "Content-Type": "application/json"}
|
||||
)
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=30) as resp:
|
||||
return resp.status in (200, 201)
|
||||
except urllib.error.HTTPError:
|
||||
return False
|
||||
|
||||
|
||||
def is_dependency_file(filename: str) -> bool:
|
||||
return any(filename.endswith(dep) for dep in DEPENDENCY_FILES)
|
||||
|
||||
|
||||
def is_test_file(filename: str) -> bool:
|
||||
return any(re.search(pattern, filename) for pattern in TEST_PATTERNS)
|
||||
|
||||
|
||||
def score_pr(
|
||||
files_changed: int,
|
||||
additions: int,
|
||||
deletions: int,
|
||||
has_dependency_changes: bool,
|
||||
test_coverage_delta: Optional[int] = None
|
||||
) -> tuple[int, int, List[str]]:
|
||||
score = 1.0
|
||||
reasons = []
|
||||
|
||||
# Files changed
|
||||
if files_changed <= SMALL_FILES:
|
||||
fscore = 1.0
|
||||
reasons.append("small number of files changed")
|
||||
elif files_changed <= MEDIUM_FILES:
|
||||
fscore = 2.0
|
||||
reasons.append("moderate number of files changed")
|
||||
elif files_changed <= LARGE_FILES:
|
||||
fscore = 2.5
|
||||
reasons.append("large number of files changed")
|
||||
else:
|
||||
fscore = 3.0
|
||||
reasons.append("very large PR spanning many files")
|
||||
|
||||
# Lines changed
|
||||
total_lines = additions + deletions
|
||||
if total_lines <= SMALL_LINES:
|
||||
lscore = 1.0
|
||||
reasons.append("small change size")
|
||||
elif total_lines <= MEDIUM_LINES:
|
||||
lscore = 2.0
|
||||
reasons.append("moderate change size")
|
||||
elif total_lines <= LARGE_LINES:
|
||||
lscore = 3.0
|
||||
reasons.append("large change size")
|
||||
else:
|
||||
lscore = 4.0
|
||||
reasons.append("very large change")
|
||||
|
||||
# Dependency changes
|
||||
if has_dependency_changes:
|
||||
dscore = 2.5
|
||||
reasons.append("dependency changes (architectural impact)")
|
||||
else:
|
||||
dscore = 0.0
|
||||
|
||||
# Test coverage delta
|
||||
tscore = 0.0
|
||||
if test_coverage_delta is not None:
|
||||
if test_coverage_delta > 0:
|
||||
reasons.append(f"test additions (+{test_coverage_delta} test files)")
|
||||
tscore = -min(2.0, test_coverage_delta / 2.0)
|
||||
elif test_coverage_delta < 0:
|
||||
reasons.append(f"test removals ({abs(test_coverage_delta)} test files)")
|
||||
tscore = min(2.0, abs(test_coverage_delta) * 0.5)
|
||||
else:
|
||||
reasons.append("test coverage change not assessed")
|
||||
|
||||
# Weighted sum, scaled by 3 to use full 1-10 range
|
||||
bonus = (fscore * WEIGHT_FILES) + (lscore * WEIGHT_LINES) + (dscore * WEIGHT_DEPS) + (tscore * WEIGHT_TEST_COV)
|
||||
scaled_bonus = bonus * 3.0
|
||||
score = 1.0 + scaled_bonus
|
||||
|
||||
final_score = max(1, min(10, int(round(score))))
|
||||
est_minutes = TIME_PER_POINT.get(final_score, 30)
|
||||
|
||||
return final_score, est_minutes, reasons
|
||||
|
||||
|
||||
def analyze_pr(client: GiteaClient, org: str, repo: str, pr_data: Dict) -> PRComplexity:
|
||||
pr_num = pr_data["number"]
|
||||
title = pr_data.get("title", "")
|
||||
files = client.get_pr_files(org, repo, pr_num)
|
||||
|
||||
additions = sum(f.get("additions", 0) for f in files)
|
||||
deletions = sum(f.get("deletions", 0) for f in files)
|
||||
filenames = [f.get("filename", "") for f in files]
|
||||
|
||||
has_deps = any(is_dependency_file(f) for f in filenames)
|
||||
|
||||
test_added = sum(1 for f in files if f.get("status") == "added" and is_test_file(f.get("filename", "")))
|
||||
test_removed = sum(1 for f in files if f.get("status") == "removed" and is_test_file(f.get("filename", "")))
|
||||
test_delta = test_added - test_removed if (test_added or test_removed) else None
|
||||
|
||||
score, est_min, reasons = score_pr(
|
||||
files_changed=len(files),
|
||||
additions=additions,
|
||||
deletions=deletions,
|
||||
has_dependency_changes=has_deps,
|
||||
test_coverage_delta=test_delta
|
||||
)
|
||||
|
||||
return PRComplexity(
|
||||
pr_number=pr_num,
|
||||
title=title,
|
||||
files_changed=len(files),
|
||||
additions=additions,
|
||||
deletions=deletions,
|
||||
has_dependency_changes=has_deps,
|
||||
test_coverage_delta=test_delta,
|
||||
score=score,
|
||||
estimated_minutes=est_min,
|
||||
reasons=reasons
|
||||
)
|
||||
|
||||
|
||||
def build_comment(complexity: PRComplexity) -> str:
|
||||
change_desc = f"{complexity.files_changed} files, +{complexity.additions}/-{complexity.deletions} lines"
|
||||
deps_note = "\n- :warning: Dependency changes detected — architectural review recommended" if complexity.has_dependency_changes else ""
|
||||
test_note = ""
|
||||
if complexity.test_coverage_delta is not None:
|
||||
if complexity.test_coverage_delta > 0:
|
||||
test_note = f"\n- :+1: {complexity.test_coverage_delta} test file(s) added"
|
||||
elif complexity.test_coverage_delta < 0:
|
||||
test_note = f"\n- :warning: {abs(complexity.test_coverage_delta)} test file(s) removed"
|
||||
|
||||
comment = f"## 📊 PR Complexity Analysis\n\n"
|
||||
comment += f"**PR #{complexity.pr_number}: {complexity.title}**\n\n"
|
||||
comment += f"| Metric | Value |\n|--------|-------|\n"
|
||||
comment += f"| Changes | {change_desc} |\n"
|
||||
comment += f"| Complexity Score | **{complexity.score}/10** |\n"
|
||||
comment += f"| Estimated Review Time | ~{complexity.estimated_minutes} minutes |\n\n"
|
||||
comment += f"### Scoring rationale:"
|
||||
for r in complexity.reasons:
|
||||
comment += f"\n- {r}"
|
||||
if deps_note:
|
||||
comment += deps_note
|
||||
if test_note:
|
||||
comment += test_note
|
||||
comment += f"\n\n---\n"
|
||||
comment += f"*Generated by PR Complexity Scorer — [issue #135](https://forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence/issues/135)*"
|
||||
return comment
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="PR Complexity Scorer")
|
||||
parser.add_argument("--org", default="Timmy_Foundation")
|
||||
parser.add_argument("--repo", default="compounding-intelligence")
|
||||
parser.add_argument("--token", default=os.environ.get("GITEA_TOKEN") or os.path.expanduser("~/.config/gitea/token"))
|
||||
parser.add_argument("--dry-run", action="store_true")
|
||||
parser.add_argument("--apply", action="store_true")
|
||||
parser.add_argument("--output", default="metrics/pr_complexity.json")
|
||||
args = parser.parse_args()
|
||||
|
||||
token_path = args.token
|
||||
if os.path.exists(token_path):
|
||||
with open(token_path) as f:
|
||||
token = f.read().strip()
|
||||
else:
|
||||
token = args.token
|
||||
|
||||
if not token:
|
||||
print("ERROR: No Gitea token provided", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
client = GiteaClient(token)
|
||||
|
||||
print(f"Fetching open PRs for {args.org}/{args.repo}...")
|
||||
prs = client.get_open_prs(args.org, args.repo)
|
||||
if not prs:
|
||||
print("No open PRs found.")
|
||||
sys.exit(0)
|
||||
|
||||
print(f"Found {len(prs)} open PR(s). Analyzing...")
|
||||
|
||||
results = []
|
||||
Path(args.output).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
for pr in prs:
|
||||
pr_num = pr["number"]
|
||||
title = pr.get("title", "")
|
||||
print(f" Analyzing PR #{pr_num}: {title[:60]}")
|
||||
|
||||
try:
|
||||
complexity = analyze_pr(client, args.org, args.repo, pr)
|
||||
results.append(complexity.to_dict())
|
||||
|
||||
comment = build_comment(complexity)
|
||||
|
||||
if args.dry_run:
|
||||
print(f" → Score: {complexity.score}/10, Est: {complexity.estimated_minutes}min [DRY-RUN]")
|
||||
elif args.apply:
|
||||
success = client.post_comment(args.org, args.repo, pr_num, comment)
|
||||
status = "[commented]" if success else "[FAILED]"
|
||||
print(f" → Score: {complexity.score}/10, Est: {complexity.estimated_minutes}min {status}")
|
||||
else:
|
||||
print(f" → Score: {complexity.score}/10, Est: {complexity.estimated_minutes}min [no action]")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ERROR analyzing PR #{pr_num}: {e}", file=sys.stderr)
|
||||
|
||||
with open(args.output, "w") as f:
|
||||
json.dump({
|
||||
"org": args.org,
|
||||
"repo": args.repo,
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"pr_count": len(results),
|
||||
"results": results
|
||||
}, f, indent=2)
|
||||
|
||||
if results:
|
||||
scores = [r["score"] for r in results]
|
||||
print(f"\nResults saved to {args.output}")
|
||||
print(f"Summary: {len(results)} PRs, scores range {min(scores):.0f}-{max(scores):.0f}")
|
||||
else:
|
||||
print("\nNo results to save.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
682
scripts/priority_rebalancer.py
Normal file
682
scripts/priority_rebalancer.py
Normal file
@@ -0,0 +1,682 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Priority Rebalancer — Re-evaluate issue priorities based on accumulated data.
|
||||
|
||||
Reads pipeline outputs, knowledge store, and Gitea issues to suggest
|
||||
priority changes based on what the fleet has learned.
|
||||
|
||||
Usage:
|
||||
python3 scripts/priority_rebalancer.py --org Timmy_Foundation
|
||||
python3 scripts/priority_rebalancer.py --org Timmy_Foundation --repo compounding-intelligence
|
||||
python3 scripts/priority_rebalancer.py --org Timmy_Foundation --dry-run
|
||||
python3 scripts/priority_rebalancer.py --org Timmy_Foundation --apply
|
||||
|
||||
Output:
|
||||
metrics/priority_report.json — full analysis
|
||||
metrics/priority_suggestions.md — human-readable suggestions
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from datetime import datetime, timezone, timedelta
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any, Optional, Tuple
|
||||
from dataclasses import dataclass, field, asdict
|
||||
from collections import Counter, defaultdict
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Data Models
|
||||
# ============================================================
|
||||
|
||||
@dataclass
|
||||
class IssueScore:
|
||||
issue_id: int
|
||||
repo: str
|
||||
title: str
|
||||
current_labels: List[str]
|
||||
current_priority: Optional[str]
|
||||
suggested_priority: Optional[str]
|
||||
score: float
|
||||
reasons: List[str]
|
||||
age_days: int
|
||||
comment_count: int
|
||||
assignee: Optional[str]
|
||||
dependencies: List[str] = field(default_factory=list)
|
||||
blocking: List[str] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
class PipelineSignal:
|
||||
source: str # "knowledge", "metrics", "sessions", "staleness"
|
||||
signal_type: str # "stale_knowledge", "high_error_rate", "missing_coverage", etc.
|
||||
weight: float # 0.0 - 1.0
|
||||
detail: str
|
||||
affected_repos: List[str] = field(default_factory=list)
|
||||
affected_issues: List[int] = field(default_factory=list)
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Gitea API Client
|
||||
# ============================================================
|
||||
|
||||
class GiteaClient:
|
||||
def __init__(self, base_url: str, token: str):
|
||||
self.base_url = base_url.rstrip("/")
|
||||
self.token = token
|
||||
|
||||
def _request(self, path: str, params: Dict = None) -> Any:
|
||||
url = f"{self.base_url}/api/v1{path}"
|
||||
if params:
|
||||
qs = "&".join(f"{k}={v}" for k, v in params.items() if v is not None)
|
||||
url += f"?{qs}"
|
||||
|
||||
req = urllib.request.Request(url)
|
||||
req.add_header("Authorization", f"token {self.token}")
|
||||
req.add_header("Content-Type", "application/json")
|
||||
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=30) as resp:
|
||||
return json.loads(resp.read().decode())
|
||||
except urllib.error.HTTPError as e:
|
||||
print(f"API error {e.code} for {path}: {e.read().decode()[:200]}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
def get_org_repos(self, org: str) -> List[Dict]:
|
||||
repos = []
|
||||
page = 1
|
||||
while True:
|
||||
batch = self._request(f"/orgs/{org}/repos", {"limit": 50, "page": page})
|
||||
if not batch:
|
||||
break
|
||||
repos.extend(batch)
|
||||
if len(batch) < 50:
|
||||
break
|
||||
page += 1
|
||||
return repos
|
||||
|
||||
def get_issues(self, org: str, repo: str, state: str = "open") -> List[Dict]:
|
||||
issues = []
|
||||
page = 1
|
||||
while True:
|
||||
batch = self._request(f"/repos/{org}/{repo}/issues",
|
||||
{"state": state, "limit": 50, "page": page, "type": "issues"})
|
||||
if not batch:
|
||||
break
|
||||
issues.extend(batch)
|
||||
if len(batch) < 50:
|
||||
break
|
||||
page += 1
|
||||
return issues
|
||||
|
||||
def add_label_to_issue(self, org: str, repo: str, issue_num: int, label_ids: List[int]) -> bool:
|
||||
url = f"{self.base_url}/api/v1/repos/{org}/{repo}/issues/{issue_num}/labels"
|
||||
data = json.dumps({"labels": label_ids}).encode()
|
||||
req = urllib.request.Request(url, data=data, method="POST")
|
||||
req.add_header("Authorization", f"token {self.token}")
|
||||
req.add_header("Content-Type", "application/json")
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=15) as resp:
|
||||
return resp.status == 200
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
def remove_label_from_issue(self, org: str, repo: str, issue_num: int, label_id: int) -> bool:
|
||||
url = f"{self.base_url}/api/v1/repos/{org}/{repo}/issues/{issue_num}/labels/{label_id}"
|
||||
req = urllib.request.Request(url, method="DELETE")
|
||||
req.add_header("Authorization", f"token {self.token}")
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=15) as resp:
|
||||
return resp.status == 200
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
def get_repo_labels(self, org: str, repo: str) -> List[Dict]:
|
||||
labels = []
|
||||
page = 1
|
||||
while True:
|
||||
batch = self._request(f"/repos/{org}/{repo}/labels", {"limit": 50, "page": page})
|
||||
if not batch:
|
||||
break
|
||||
labels.extend(batch)
|
||||
if len(batch) < 50:
|
||||
break
|
||||
page += 1
|
||||
return labels
|
||||
|
||||
def add_comment(self, org: str, repo: str, issue_num: int, body: str) -> bool:
|
||||
url = f"{self.base_url}/api/v1/repos/{org}/{repo}/issues/{issue_num}/comments"
|
||||
data = json.dumps({"body": body}).encode()
|
||||
req = urllib.request.Request(url, data=data, method="POST")
|
||||
req.add_header("Authorization", f"token {self.token}")
|
||||
req.add_header("Content-Type", "application/json")
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=15) as resp:
|
||||
return resp.status == 201
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Pipeline Signal Collectors
|
||||
# ============================================================
|
||||
|
||||
def collect_knowledge_signals(knowledge_dir: str) -> List[PipelineSignal]:
|
||||
"""Analyze knowledge store for coverage gaps and staleness."""
|
||||
signals = []
|
||||
index_path = os.path.join(knowledge_dir, "index.json")
|
||||
|
||||
if not os.path.exists(index_path):
|
||||
signals.append(PipelineSignal(
|
||||
source="knowledge",
|
||||
signal_type="missing_index",
|
||||
weight=0.8,
|
||||
detail="knowledge/index.json not found — no knowledge base exists"
|
||||
))
|
||||
return signals
|
||||
|
||||
try:
|
||||
with open(index_path) as f:
|
||||
data = json.load(f)
|
||||
except (json.JSONDecodeError, IOError) as e:
|
||||
signals.append(PipelineSignal(
|
||||
source="knowledge",
|
||||
signal_type="corrupt_index",
|
||||
weight=0.9,
|
||||
detail=f"knowledge/index.json is corrupt: {e}"
|
||||
))
|
||||
return signals
|
||||
|
||||
facts = data.get("facts", [])
|
||||
total = len(facts)
|
||||
|
||||
if total == 0:
|
||||
signals.append(PipelineSignal(
|
||||
source="knowledge",
|
||||
signal_type="empty_knowledge",
|
||||
weight=0.7,
|
||||
detail="Knowledge store has 0 facts — harvester not running or not finding sessions"
|
||||
))
|
||||
return signals
|
||||
|
||||
# Check staleness
|
||||
stale_count = 0
|
||||
missing_source = 0
|
||||
for fact in facts:
|
||||
status = fact.get("status", "unknown")
|
||||
if status == "stale":
|
||||
stale_count += 1
|
||||
elif status in ("missing_source", "no_source"):
|
||||
missing_source += 1
|
||||
|
||||
if stale_count > 0:
|
||||
signals.append(PipelineSignal(
|
||||
source="knowledge",
|
||||
signal_type="stale_knowledge",
|
||||
weight=min(1.0, stale_count / max(1, total)),
|
||||
detail=f"{stale_count}/{total} facts are stale (source files changed)"
|
||||
))
|
||||
|
||||
if missing_source > 0:
|
||||
signals.append(PipelineSignal(
|
||||
source="knowledge",
|
||||
signal_type="missing_sources",
|
||||
weight=min(1.0, missing_source / max(1, total)),
|
||||
detail=f"{missing_source}/{total} facts have missing source files"
|
||||
))
|
||||
|
||||
# Coverage by repo
|
||||
repo_counts = Counter(f.get("repo", "unknown") for f in facts)
|
||||
if len(repo_counts) < 3:
|
||||
signals.append(PipelineSignal(
|
||||
source="knowledge",
|
||||
signal_type="low_coverage",
|
||||
weight=0.5,
|
||||
detail=f"Knowledge covers only {len(repo_counts)} repos — expand harvester scope",
|
||||
affected_repos=list(repo_counts.keys())
|
||||
))
|
||||
|
||||
return signals
|
||||
|
||||
|
||||
def collect_staleness_signals(scripts_dir: str, knowledge_dir: str) -> List[PipelineSignal]:
|
||||
"""Run staleness checker if available."""
|
||||
signals = []
|
||||
checker = os.path.join(scripts_dir, "knowledge_staleness_check.py")
|
||||
index_path = os.path.join(knowledge_dir, "index.json")
|
||||
|
||||
if not os.path.exists(checker) or not os.path.exists(index_path):
|
||||
return signals
|
||||
|
||||
try:
|
||||
import subprocess
|
||||
result = subprocess.run(
|
||||
["python3", checker, "--index", index_path, "--json"],
|
||||
capture_output=True, text=True, timeout=30
|
||||
)
|
||||
if result.returncode == 0:
|
||||
data = json.loads(result.stdout)
|
||||
stale = data.get("stale_count", 0)
|
||||
total = data.get("total", 0)
|
||||
if stale > 0:
|
||||
signals.append(PipelineSignal(
|
||||
source="staleness",
|
||||
signal_type="stale_knowledge",
|
||||
weight=min(1.0, stale / max(1, total)),
|
||||
detail=f"Staleness checker found {stale}/{total} stale entries"
|
||||
))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return signals
|
||||
|
||||
|
||||
def collect_metrics_signals(metrics_dir: str) -> List[PipelineSignal]:
|
||||
"""Analyze metrics directory for pipeline health."""
|
||||
signals = []
|
||||
|
||||
if not os.path.isdir(metrics_dir):
|
||||
return signals
|
||||
|
||||
files = os.listdir(metrics_dir)
|
||||
if len(files) <= 1: # Only .gitkeep
|
||||
signals.append(PipelineSignal(
|
||||
source="metrics",
|
||||
signal_type="no_metrics",
|
||||
weight=0.6,
|
||||
detail="Metrics directory is empty — measurer pipeline not producing output"
|
||||
))
|
||||
|
||||
return signals
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Priority Scoring Engine
|
||||
# ============================================================
|
||||
|
||||
PRIORITY_LEVELS = ["P0", "P1", "P2", "P3", "none"]
|
||||
PRIORITY_LABELS = {"p0": "P0", "p1": "P1", "p2": "P2", "p3": "P3",
|
||||
"priority:critical": "P0", "priority:high": "P1",
|
||||
"priority:medium": "P2", "priority:low": "P3"}
|
||||
|
||||
|
||||
def extract_priority(labels: List[str]) -> Optional[str]:
|
||||
"""Extract priority level from issue labels."""
|
||||
for label in labels:
|
||||
lower = label.lower().strip()
|
||||
if lower in PRIORITY_LABELS:
|
||||
return PRIORITY_LABELS[lower]
|
||||
return None
|
||||
|
||||
|
||||
def compute_issue_score(
|
||||
issue: Dict,
|
||||
repo: str,
|
||||
signals: List[PipelineSignal],
|
||||
now: datetime
|
||||
) -> IssueScore:
|
||||
"""Compute priority score for a single issue."""
|
||||
|
||||
labels = [l.get("name", "") if isinstance(l, dict) else l for l in issue.get("labels", [])]
|
||||
title = issue.get("title", "")
|
||||
issue_id = issue.get("number", 0)
|
||||
current_priority = extract_priority(labels)
|
||||
|
||||
# Parse dates
|
||||
created_str = issue.get("created_at", "")
|
||||
try:
|
||||
created = datetime.fromisoformat(created_str.replace("Z", "+00:00"))
|
||||
age_days = (now - created.replace(tzinfo=None)).days
|
||||
except (ValueError, AttributeError):
|
||||
age_days = 0
|
||||
|
||||
comment_count = issue.get("comments", 0)
|
||||
assignee = None
|
||||
assignees = issue.get("assignees") or []
|
||||
if assignees:
|
||||
assignee = assignees[0].get("login") if isinstance(assignees[0], dict) else str(assignees[0])
|
||||
|
||||
# Base score
|
||||
score = 0.0
|
||||
reasons = []
|
||||
|
||||
# Age factor: older issues drift down unless actively discussed
|
||||
if age_days > 90 and comment_count < 2:
|
||||
score -= 15
|
||||
reasons.append(f"Dormant: {age_days} days old with only {comment_count} comments")
|
||||
elif age_days > 30:
|
||||
score -= 5
|
||||
reasons.append(f"Aging: {age_days} days old")
|
||||
|
||||
# Activity factor: recent discussion suggests urgency
|
||||
if comment_count > 5:
|
||||
score += 10
|
||||
reasons.append(f"Active discussion: {comment_count} comments")
|
||||
elif comment_count > 0:
|
||||
score += 3
|
||||
|
||||
# Assignment factor: unassigned issues need triage
|
||||
if not assignee:
|
||||
score += 5
|
||||
reasons.append("Unassigned — needs triage")
|
||||
|
||||
# Pipeline signal alignment
|
||||
for signal in signals:
|
||||
title_lower = title.lower()
|
||||
if signal.signal_type == "stale_knowledge" and "stale" in title_lower:
|
||||
score += signal.weight * 20
|
||||
reasons.append(f"Matches signal: {signal.detail}")
|
||||
elif signal.signal_type == "empty_knowledge" and ("harvester" in title_lower or "knowledge" in title_lower):
|
||||
score += signal.weight * 25
|
||||
reasons.append(f"Critical gap: {signal.detail}")
|
||||
elif signal.signal_type == "no_metrics" and "measur" in title_lower:
|
||||
score += signal.weight * 15
|
||||
reasons.append(f"Pipeline gap: {signal.detail}")
|
||||
elif signal.signal_type == "low_coverage" and any(r.lower() in title_lower for r in signal.affected_repos):
|
||||
score += signal.weight * 10
|
||||
reasons.append(f"Coverage gap: {signal.detail}")
|
||||
|
||||
# Keyword boosts
|
||||
keyword_scores = {
|
||||
"broken": 20, "bug": 15, "fix": 10, "error": 12, "fail": 15,
|
||||
"security": 25, "auth": 20, "data loss": 30, "crash": 25,
|
||||
"blocker": 20, "urgent": 15, "critical": 15,
|
||||
"epic": 8, "feature": -3, "nice to have": -10, "someday": -15
|
||||
}
|
||||
title_lower = title.lower()
|
||||
for keyword, boost in keyword_scores.items():
|
||||
if keyword in title_lower:
|
||||
score += boost
|
||||
if boost > 0:
|
||||
reasons.append(f"Keyword match: '{keyword}' (+{boost})")
|
||||
|
||||
# Label-based adjustments
|
||||
for label in labels:
|
||||
lower = label.lower()
|
||||
if lower == "pipeline":
|
||||
score += 5 # Pipeline issues are infrastructure
|
||||
elif lower == "bug":
|
||||
score += 12
|
||||
elif lower == "enhancement":
|
||||
score -= 2
|
||||
elif lower == "documentation":
|
||||
score -= 5
|
||||
elif "epic" in lower:
|
||||
score += 3
|
||||
|
||||
# Determine suggested priority
|
||||
if score >= 40:
|
||||
suggested = "P0"
|
||||
elif score >= 25:
|
||||
suggested = "P1"
|
||||
elif score >= 10:
|
||||
suggested = "P2"
|
||||
elif score >= 0:
|
||||
suggested = "P3"
|
||||
else:
|
||||
suggested = None # Consider closing or deprioritizing
|
||||
|
||||
return IssueScore(
|
||||
issue_id=issue_id,
|
||||
repo=repo,
|
||||
title=title,
|
||||
current_labels=labels,
|
||||
current_priority=current_priority,
|
||||
suggested_priority=suggested,
|
||||
score=round(score, 1),
|
||||
reasons=reasons if reasons else ["No strong signals"],
|
||||
age_days=age_days,
|
||||
comment_count=comment_count,
|
||||
assignee=assignee
|
||||
)
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Report Generation
|
||||
# ============================================================
|
||||
|
||||
def generate_report(
|
||||
scores: List[IssueScore],
|
||||
signals: List[PipelineSignal],
|
||||
org: str,
|
||||
repos_scanned: List[str]
|
||||
) -> Dict[str, Any]:
|
||||
"""Generate the full priority report."""
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
# Categorize changes
|
||||
upgrades = [s for s in scores if s.suggested_priority and s.current_priority and
|
||||
PRIORITY_LEVELS.index(s.suggested_priority) < PRIORITY_LEVELS.index(s.current_priority)]
|
||||
downgrades = [s for s in scores if s.suggested_priority and s.current_priority and
|
||||
PRIORITY_LEVELS.index(s.suggested_priority) > PRIORITY_LEVELS.index(s.current_priority)]
|
||||
new_assignments = [s for s in scores if s.suggested_priority and not s.current_priority]
|
||||
no_change = [s for s in scores if s.suggested_priority == s.current_priority]
|
||||
|
||||
return {
|
||||
"generated_at": now,
|
||||
"org": org,
|
||||
"repos_scanned": repos_scanned,
|
||||
"total_issues": len(scores),
|
||||
"signals": [asdict(s) for s in signals],
|
||||
"summary": {
|
||||
"suggested_upgrades": len(upgrades),
|
||||
"suggested_downgrades": len(downgrades),
|
||||
"suggested_new_priorities": len(new_assignments),
|
||||
"unchanged": len(no_change)
|
||||
},
|
||||
"top_priority": [asdict(s) for s in sorted(scores, key=lambda x: x.score, reverse=True)[:10]],
|
||||
"upgrades": [asdict(s) for s in upgrades],
|
||||
"downgrades": [asdict(s) for s in downgrades],
|
||||
"new_assignments": [asdict(s) for s in new_assignments],
|
||||
"all_scores": [asdict(s) for s in sorted(scores, key=lambda x: x.score, reverse=True)]
|
||||
}
|
||||
|
||||
|
||||
def generate_markdown_report(report: Dict[str, Any]) -> str:
|
||||
"""Generate human-readable markdown report."""
|
||||
lines = []
|
||||
lines.append("# Priority Rebalancer Report")
|
||||
lines.append(f"**Generated:** {report['generated_at']}")
|
||||
lines.append(f"**Org:** {report['org']}")
|
||||
lines.append(f"**Repos scanned:** {', '.join(report['repos_scanned'])}")
|
||||
lines.append(f"**Issues analyzed:** {report['total_issues']}")
|
||||
lines.append("")
|
||||
|
||||
# Signals
|
||||
if report["signals"]:
|
||||
lines.append("## Pipeline Signals")
|
||||
for sig in report["signals"]:
|
||||
weight_bar = "█" * int(sig["weight"] * 10) + "░" * (10 - int(sig["weight"] * 10))
|
||||
lines.append(f"- [{weight_bar}] **{sig['source']}/{sig['signal_type']}** — {sig['detail']}")
|
||||
lines.append("")
|
||||
|
||||
# Summary
|
||||
s = report["summary"]
|
||||
lines.append("## Summary")
|
||||
lines.append(f"- Suggested upgrades: **{s['suggested_upgrades']}**")
|
||||
lines.append(f"- Suggested downgrades: **{s['suggested_downgrades']}**")
|
||||
lines.append(f"- New priority assignments: **{s['suggested_new_priorities']}**")
|
||||
lines.append(f"- Unchanged: **{s['unchanged']}**")
|
||||
lines.append("")
|
||||
|
||||
# Top 10
|
||||
lines.append("## Top 10 by Score")
|
||||
for i, item in enumerate(report["top_priority"][:10], 1):
|
||||
cur = item["current_priority"] or "none"
|
||||
sug = item["suggested_priority"] or "none"
|
||||
arrow = "↑" if PRIORITY_LEVELS.index(sug) < PRIORITY_LEVELS.index(cur) else "↓" if PRIORITY_LEVELS.index(sug) > PRIORITY_LEVELS.index(cur) else "→"
|
||||
lines.append(f"{i}. **[{item['repo']}#{item['issue_id']}]** {item['title']}")
|
||||
lines.append(f" Score: {item['score']} | Current: {cur} {arrow} Suggested: {sug}")
|
||||
lines.append(f" Reasons: {'; '.join(item['reasons'][:3])}")
|
||||
lines.append("")
|
||||
|
||||
# Upgrades
|
||||
if report["upgrades"]:
|
||||
lines.append("## Suggested Upgrades")
|
||||
for item in report["upgrades"]:
|
||||
lines.append(f"- **[{item['repo']}#{item['issue_id']}]** {item['title']}")
|
||||
lines.append(f" {item['current_priority']} → {item['suggested_priority']} (score: {item['score']})")
|
||||
lines.append(f" {'; '.join(item['reasons'][:2])}")
|
||||
lines.append("")
|
||||
|
||||
# Downgrades
|
||||
if report["downgrades"]:
|
||||
lines.append("## Suggested Downgrades")
|
||||
for item in report["downgrades"]:
|
||||
lines.append(f"- **[{item['repo']}#{item['issue_id']}]** {item['title']}")
|
||||
lines.append(f" {item['current_priority']} → {item['suggested_priority']} (score: {item['score']})")
|
||||
lines.append(f" {'; '.join(item['reasons'][:2])}")
|
||||
lines.append("")
|
||||
|
||||
# New assignments
|
||||
if report["new_assignments"]:
|
||||
lines.append("## New Priority Suggestions (currently unlabelled)")
|
||||
for item in report["new_assignments"][:20]:
|
||||
lines.append(f"- **[{item['repo']}#{item['issue_id']}]** {item['title']}")
|
||||
lines.append(f" Suggested: {item['suggested_priority']} (score: {item['score']})")
|
||||
lines.append(f" {'; '.join(item['reasons'][:2])}")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Main
|
||||
# ============================================================
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Priority Rebalancer — re-score issues based on pipeline data")
|
||||
parser.add_argument("--org", default="Timmy_Foundation", help="Gitea org name")
|
||||
parser.add_argument("--repo", help="Single repo to analyze (default: all)")
|
||||
parser.add_argument("--base-url", default="https://forge.alexanderwhitestone.com", help="Gitea base URL")
|
||||
parser.add_argument("--knowledge-dir", default=None, help="Path to knowledge directory")
|
||||
parser.add_argument("--metrics-dir", default=None, help="Path to metrics directory")
|
||||
parser.add_argument("--scripts-dir", default=None, help="Path to scripts directory")
|
||||
parser.add_argument("--output-dir", default=None, help="Path to output directory")
|
||||
parser.add_argument("--dry-run", action="store_true", help="Show what would change without applying")
|
||||
parser.add_argument("--apply", action="store_true", help="Apply priority changes via API")
|
||||
parser.add_argument("--json", action="store_true", help="Output JSON instead of markdown")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Resolve paths relative to script location
|
||||
script_dir = Path(__file__).parent
|
||||
repo_root = script_dir.parent
|
||||
|
||||
knowledge_dir = args.knowledge_dir or str(repo_root / "knowledge")
|
||||
metrics_dir = args.metrics_dir or str(repo_root / "metrics")
|
||||
scripts_dir = args.scripts_dir or str(repo_root / "scripts")
|
||||
output_dir = args.output_dir or str(repo_root / "metrics")
|
||||
|
||||
# Get token
|
||||
token = os.environ.get("GITEA_TOKEN")
|
||||
if not token:
|
||||
token_path = os.path.expanduser("~/.config/gitea/token")
|
||||
if os.path.exists(token_path):
|
||||
with open(token_path) as f:
|
||||
token = f.read().strip()
|
||||
|
||||
if not token:
|
||||
print("Error: No Gitea token found. Set GITEA_TOKEN or create ~/.config/gitea/token", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
client = GiteaClient(args.base_url, token)
|
||||
now = datetime.utcnow()
|
||||
|
||||
# Collect pipeline signals
|
||||
print("Collecting pipeline signals...", file=sys.stderr)
|
||||
signals = []
|
||||
signals.extend(collect_knowledge_signals(knowledge_dir))
|
||||
signals.extend(collect_staleness_signals(scripts_dir, knowledge_dir))
|
||||
signals.extend(collect_metrics_signals(metrics_dir))
|
||||
print(f" Found {len(signals)} signals", file=sys.stderr)
|
||||
|
||||
# Get repos
|
||||
if args.repo:
|
||||
repos = [{"name": args.repo}]
|
||||
else:
|
||||
repos = client.get_org_repos(args.org)
|
||||
|
||||
repo_names = [r["name"] for r in repos]
|
||||
print(f"Scanning {len(repo_names)} repos: {', '.join(repo_names[:5])}{'...' if len(repo_names) > 5 else ''}", file=sys.stderr)
|
||||
|
||||
# Score all issues
|
||||
all_scores = []
|
||||
for repo in repos:
|
||||
repo_name = repo["name"]
|
||||
issues = client.get_issues(args.org, repo_name)
|
||||
print(f" {repo_name}: {len(issues)} open issues", file=sys.stderr)
|
||||
|
||||
for issue in issues:
|
||||
if issue.get("pull_request"):
|
||||
continue
|
||||
score = compute_issue_score(issue, repo_name, signals, now)
|
||||
all_scores.append(score)
|
||||
|
||||
# Generate report
|
||||
report = generate_report(all_scores, signals, args.org, repo_names)
|
||||
|
||||
# Output
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
|
||||
if args.json:
|
||||
print(json.dumps(report, indent=2))
|
||||
else:
|
||||
md = generate_markdown_report(report)
|
||||
print(md)
|
||||
|
||||
# Write files
|
||||
report_path = os.path.join(output_dir, "priority_report.json")
|
||||
with open(report_path, "w") as f:
|
||||
json.dump(report, f, indent=2)
|
||||
print(f"\nFull report: {report_path}", file=sys.stderr)
|
||||
|
||||
md_path = os.path.join(output_dir, "priority_suggestions.md")
|
||||
with open(md_path, "w") as f:
|
||||
f.write(generate_markdown_report(report))
|
||||
print(f"Suggestions: {md_path}", file=sys.stderr)
|
||||
|
||||
# Apply changes if requested
|
||||
if args.apply:
|
||||
print("\nApplying priority changes...", file=sys.stderr)
|
||||
applied = 0
|
||||
|
||||
# Get label IDs for priority labels
|
||||
priority_label_map = {}
|
||||
for repo_name in repo_names:
|
||||
labels = client.get_repo_labels(args.org, repo_name)
|
||||
for label in labels:
|
||||
name = label.get("name", "").lower()
|
||||
if name in ("p0", "p1", "p2", "p3"):
|
||||
priority_label_map[(repo_name, name)] = label["id"]
|
||||
|
||||
for score in all_scores:
|
||||
if score.suggested_priority and score.suggested_priority != score.current_priority:
|
||||
sug_lower = score.suggested_priority.lower()
|
||||
label_id = priority_label_map.get((score.repo, sug_lower))
|
||||
if label_id:
|
||||
ok = client.add_label_to_issue(args.org, score.repo, score.issue_id, [label_id])
|
||||
if ok:
|
||||
applied += 1
|
||||
print(f" Applied: [{score.repo}#{score.issue_id}] → {score.suggested_priority}", file=sys.stderr)
|
||||
|
||||
# Add comment explaining the change
|
||||
comment = f"**Priority Rebalancer** suggested: **{score.suggested_priority}** (was: {score.current_priority or 'none'})\n\n"
|
||||
comment += f"Score: {score.score}\n"
|
||||
comment += f"Reasons:\n"
|
||||
for r in score.reasons[:5]:
|
||||
comment += f"- {r}\n"
|
||||
client.add_comment(args.org, score.repo, score.issue_id, comment)
|
||||
|
||||
print(f"Applied {applied} priority changes", file=sys.stderr)
|
||||
|
||||
elif args.dry_run:
|
||||
print(f"\nDry run — {report['summary']['suggested_upgrades'] + report['summary']['suggested_downgrades'] + report['summary']['suggested_new_priorities']} changes would be applied", file=sys.stderr)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
290
scripts/refactoring_opportunity_finder.py
Executable file
290
scripts/refactoring_opportunity_finder.py
Executable file
@@ -0,0 +1,290 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Finds refactoring opportunities in codebases
|
||||
|
||||
Engine ID: 10.4
|
||||
|
||||
Usage:
|
||||
python3 scripts/refactoring_opportunity_finder.py --output proposals/refactoring_opportunity_finder.json
|
||||
python3 scripts/refactoring_opportunity_finder.py --output proposals/refactoring_opportunity_finder.json --dry-run
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import ast
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime, timezone
|
||||
from typing import Optional, Tuple
|
||||
|
||||
|
||||
# ── Data Classes ────────────────────────────────────────────────────────
|
||||
|
||||
@dataclass
|
||||
class FileMetrics:
|
||||
"""Metrics for a single source file."""
|
||||
path: str
|
||||
lines: int = 0
|
||||
complexity: float = 0.0
|
||||
max_complexity: int = 0
|
||||
functions: int = 0
|
||||
classes: int = 0
|
||||
churn_30d: int = 0
|
||||
churn_90d: int = 0
|
||||
test_coverage: Optional[float] = None
|
||||
refactoring_score: float = 0.0
|
||||
|
||||
|
||||
# ── Complexity Analysis ─────────────────────────────────────────────────
|
||||
|
||||
class ComplexityVisitor(ast.NodeVisitor):
|
||||
"""AST visitor that computes cyclomatic complexity per function."""
|
||||
|
||||
def __init__(self):
|
||||
self.complexities = []
|
||||
self.function_count = 0
|
||||
self.class_count = 0
|
||||
self._current_complexity = 0
|
||||
self._in_function = False
|
||||
|
||||
def visit_FunctionDef(self, node):
|
||||
self.function_count += 1
|
||||
old_complexity = self._current_complexity
|
||||
old_in_function = self._in_function
|
||||
self._current_complexity = 1 # Base complexity
|
||||
self._in_function = True
|
||||
|
||||
self.generic_visit(node)
|
||||
|
||||
self.complexities.append(self._current_complexity)
|
||||
self._current_complexity = old_complexity
|
||||
self._in_function = old_in_function
|
||||
|
||||
visit_AsyncFunctionDef = visit_FunctionDef
|
||||
|
||||
def visit_ClassDef(self, node):
|
||||
self.class_count += 1
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_If(self, node):
|
||||
if self._in_function:
|
||||
self._current_complexity += 1
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_For(self, node):
|
||||
if self._in_function:
|
||||
self._current_complexity += 1
|
||||
self.generic_visit(node)
|
||||
|
||||
visit_AsyncFor = visit_For
|
||||
|
||||
def visit_While(self, node):
|
||||
if self._in_function:
|
||||
self._current_complexity += 1
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_ExceptHandler(self, node):
|
||||
if self._in_function:
|
||||
self._current_complexity += 1
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_With(self, node):
|
||||
if self._in_function:
|
||||
self._current_complexity += 1
|
||||
self.generic_visit(node)
|
||||
|
||||
visit_AsyncWith = visit_With
|
||||
|
||||
def visit_Assert(self, node):
|
||||
if self._in_function:
|
||||
self._current_complexity += 1
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_BoolOp(self, node):
|
||||
# Each 'and'/'or' adds a branch
|
||||
if self._in_function:
|
||||
self._current_complexity += len(node.values) - 1
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_IfExp(self, node):
|
||||
# Ternary expression
|
||||
if self._in_function:
|
||||
self._current_complexity += 1
|
||||
self.generic_visit(node)
|
||||
|
||||
|
||||
def compute_file_complexity(filepath: str) -> Tuple[float, int, int, int, int]:
|
||||
"""
|
||||
Compute cyclomatic complexity for a Python file.
|
||||
|
||||
Returns:
|
||||
(avg_complexity, max_complexity, function_count, class_count, line_count)
|
||||
"""
|
||||
try:
|
||||
with open(filepath, 'r', encoding='utf-8', errors='replace') as f:
|
||||
source = f.read()
|
||||
except (IOError, OSError):
|
||||
return 0.0, 0, 0, 0, 0
|
||||
|
||||
try:
|
||||
tree = ast.parse(source, filename=filepath)
|
||||
except SyntaxError:
|
||||
return 0.0, 0, 0, 0, 0
|
||||
|
||||
visitor = ComplexityVisitor()
|
||||
visitor.visit(tree)
|
||||
|
||||
line_count = len(source.splitlines())
|
||||
|
||||
if not visitor.complexities:
|
||||
# No functions, but might have classes
|
||||
return 0.0, 0, visitor.function_count, visitor.class_count, line_count
|
||||
|
||||
avg = sum(visitor.complexities) / len(visitor.complexities)
|
||||
max_c = max(visitor.complexities)
|
||||
|
||||
return avg, max_c, visitor.function_count, visitor.class_count, line_count
|
||||
|
||||
|
||||
# ── Refactoring Score ───────────────────────────────────────────────────
|
||||
|
||||
def calculate_refactoring_score(metrics: FileMetrics) -> float:
|
||||
"""
|
||||
Calculate a refactoring priority score (0-100) based on file metrics.
|
||||
|
||||
Higher score = higher priority for refactoring.
|
||||
Components:
|
||||
- Complexity (0-30 points): higher avg/max complexity = higher score
|
||||
- Size (0-20 points): larger files = higher score
|
||||
- Churn (0-30 points): more changes recently = higher score
|
||||
- Coverage (0-20 points): lower test coverage = higher score
|
||||
"""
|
||||
score = 0.0
|
||||
|
||||
# Complexity component (0-30)
|
||||
# avg=10+ or max=20+ → 30 points
|
||||
complexity_score = min(30.0, (metrics.complexity * 2) + (metrics.max_complexity * 0.5))
|
||||
score += max(0.0, complexity_score)
|
||||
|
||||
# Size component (0-20)
|
||||
# 500+ lines → 20 points
|
||||
size_score = min(20.0, metrics.lines / 25.0)
|
||||
score += max(0.0, size_score)
|
||||
|
||||
# Churn component (0-30)
|
||||
# Weighted: recent churn (30d) counts more than older (90d)
|
||||
churn_score = min(30.0, (metrics.churn_30d * 2) + (metrics.churn_90d * 0.5))
|
||||
score += max(0.0, churn_score)
|
||||
|
||||
# Coverage component (0-20)
|
||||
# Lower coverage → higher score
|
||||
if metrics.test_coverage is not None:
|
||||
# coverage=0 → 20 points, coverage=1 → 0 points
|
||||
coverage_score = (1.0 - metrics.test_coverage) * 20.0
|
||||
else:
|
||||
# No data → assume medium risk (10 points)
|
||||
coverage_score = 10.0
|
||||
score += max(0.0, coverage_score)
|
||||
|
||||
return min(100.0, max(0.0, score))
|
||||
|
||||
|
||||
# ── Proposal Generation ─────────────────────────────────────────────────
|
||||
|
||||
def scan_directory(directory: str, extensions: tuple = ('.py',)) -> list:
|
||||
"""Scan directory for source files."""
|
||||
files = []
|
||||
for root, dirs, filenames in os.walk(directory):
|
||||
# Skip hidden dirs and common non-source dirs
|
||||
dirs[:] = [d for d in dirs if not d.startswith('.') and d not in (
|
||||
'__pycache__', 'node_modules', 'venv', '.venv', 'env',
|
||||
'build', 'dist', '.git', '.tox'
|
||||
)]
|
||||
for fname in filenames:
|
||||
if any(fname.endswith(ext) for ext in extensions):
|
||||
files.append(os.path.join(root, fname))
|
||||
return files
|
||||
|
||||
|
||||
def generate_proposals(directory: str = '.', min_score: float = 30.0) -> list:
|
||||
"""Generate refactoring proposals by analyzing source files."""
|
||||
proposals = []
|
||||
files = scan_directory(directory)
|
||||
|
||||
for filepath in files:
|
||||
avg, max_c, funcs, classes, lines = compute_file_complexity(filepath)
|
||||
|
||||
if funcs == 0 and classes == 0:
|
||||
continue
|
||||
|
||||
metrics = FileMetrics(
|
||||
path=filepath,
|
||||
lines=lines,
|
||||
complexity=avg,
|
||||
max_complexity=max_c,
|
||||
functions=funcs,
|
||||
classes=classes
|
||||
)
|
||||
score = calculate_refactoring_score(metrics)
|
||||
metrics.refactoring_score = score
|
||||
|
||||
if score >= min_score:
|
||||
reasons = []
|
||||
if max_c > 10:
|
||||
reasons.append(f"high max complexity ({max_c})")
|
||||
if avg > 5:
|
||||
reasons.append(f"high avg complexity ({avg:.1f})")
|
||||
if lines > 300:
|
||||
reasons.append(f"large file ({lines} lines)")
|
||||
|
||||
proposals.append({
|
||||
"title": f"Refactor {os.path.basename(filepath)} (score: {score:.0f})",
|
||||
"description": f"{filepath}: {', '.join(reasons) if reasons else 'general improvement candidate'}",
|
||||
"impact": min(10, int(score / 10)),
|
||||
"effort": min(10, max(1, int(max_c / 3))),
|
||||
"category": "refactoring",
|
||||
"source_engine": "10.4",
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"metrics": {
|
||||
"path": filepath,
|
||||
"score": round(score, 2),
|
||||
"avg_complexity": round(avg, 2),
|
||||
"max_complexity": max_c,
|
||||
"lines": lines,
|
||||
"functions": funcs,
|
||||
"classes": classes
|
||||
}
|
||||
})
|
||||
|
||||
# Sort by score descending
|
||||
proposals.sort(key=lambda p: p.get('metrics', {}).get('score', 0), reverse=True)
|
||||
return proposals
|
||||
|
||||
|
||||
# ── CLI ─────────────────────────────────────────────────────────────────
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Finds refactoring opportunities in codebases")
|
||||
parser.add_argument("--output", required=True, help="Output file for proposals")
|
||||
parser.add_argument("--dry-run", action="store_true", help="Don't write output file")
|
||||
parser.add_argument("--directory", default=".", help="Directory to scan")
|
||||
parser.add_argument("--min-score", type=float, default=30.0, help="Minimum score threshold")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
proposals = generate_proposals(args.directory, args.min_score)
|
||||
|
||||
if not args.dry_run:
|
||||
os.makedirs(os.path.dirname(args.output) or '.', exist_ok=True)
|
||||
with open(args.output, "w") as f:
|
||||
json.dump({"proposals": proposals}, f, indent=2)
|
||||
print(f"Generated {len(proposals)} proposals -> {args.output}")
|
||||
else:
|
||||
print(f"Would generate {len(proposals)} proposals")
|
||||
for p in proposals:
|
||||
print(f" - {p['title']}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
353
scripts/sampler.py
Normal file
353
scripts/sampler.py
Normal file
@@ -0,0 +1,353 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
sampler.py — Score and rank sessions by harvest value.
|
||||
|
||||
With 20k+ sessions on disk, we can't harvest all at once. This script
|
||||
scores each session by how likely it is to contain valuable knowledge,
|
||||
so the harvester processes the best ones first.
|
||||
|
||||
Scoring strategy:
|
||||
- Recency: last 7d=3pts, last 30d=2pts, older=1pt
|
||||
- Length: >50 messages=3pts, >20=2pts, <20=1pt
|
||||
- Repo uniqueness: first session for a repo=5pts, otherwise=1pt
|
||||
- Outcome: failure=3pts (most to learn), success=2pts, unknown=1pt
|
||||
- Tool calls: >10 tool invocations=2pts (complex sessions)
|
||||
|
||||
Usage:
|
||||
python3 sampler.py --count 100 # Top 100 sessions
|
||||
python3 sampler.py --repo the-nexus --count 20 # Top 20 for a repo
|
||||
python3 sampler.py --since 2026-04-01 # All sessions since date
|
||||
python3 sampler.py --count 50 --min-score 8 # Only high-value sessions
|
||||
python3 sampler.py --count 100 --output sample.json # Save to file
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
|
||||
# --- Fast session scanning (no full parse) ---
|
||||
|
||||
def scan_session_fast(path: str) -> dict:
|
||||
"""Extract scoring metadata from a session without parsing the full JSONL.
|
||||
|
||||
Reads only: first line, last ~20 lines, and line count. This processes
|
||||
20k sessions in seconds instead of minutes.
|
||||
"""
|
||||
meta = {
|
||||
'path': path,
|
||||
'message_count': 0,
|
||||
'has_tool_calls': False,
|
||||
'tool_call_count': 0,
|
||||
'first_timestamp': '',
|
||||
'last_timestamp': '',
|
||||
'is_failure': False,
|
||||
'repos_mentioned': set(),
|
||||
'first_role': '',
|
||||
'last_content_preview': '',
|
||||
}
|
||||
|
||||
try:
|
||||
file_size = os.path.getsize(path)
|
||||
if file_size == 0:
|
||||
return meta
|
||||
|
||||
with open(path, 'r', encoding='utf-8', errors='replace') as f:
|
||||
# Read first line for timestamp + role
|
||||
first_line = f.readline().strip()
|
||||
if first_line:
|
||||
try:
|
||||
first_msg = json.loads(first_line)
|
||||
meta['first_timestamp'] = first_msg.get('timestamp', '')
|
||||
meta['first_role'] = first_msg.get('role', '')
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Fast line count + collect tail lines
|
||||
# For the tail, seek to near end of file
|
||||
tail_lines = []
|
||||
line_count = 1 # already read first
|
||||
|
||||
if file_size > 8192:
|
||||
# Seek to last 8KB for tail sampling
|
||||
f.seek(max(0, file_size - 8192))
|
||||
f.readline() # skip partial line
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line:
|
||||
tail_lines.append(line)
|
||||
line_count += 1
|
||||
# We lost the exact count for big files — estimate from file size
|
||||
# Average JSONL line is ~500 bytes
|
||||
if line_count < 100:
|
||||
line_count = max(line_count, file_size // 500)
|
||||
else:
|
||||
# Small file — read all
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line:
|
||||
tail_lines.append(line)
|
||||
line_count += 1
|
||||
|
||||
meta['message_count'] = line_count
|
||||
|
||||
# Parse tail lines for outcome, tool calls, repos
|
||||
for line in tail_lines[-30:]: # last 30 non-empty lines
|
||||
try:
|
||||
msg = json.loads(line)
|
||||
|
||||
# Track last timestamp
|
||||
ts = msg.get('timestamp', '')
|
||||
if ts:
|
||||
meta['last_timestamp'] = ts
|
||||
|
||||
# Count tool calls
|
||||
if msg.get('tool_calls'):
|
||||
meta['has_tool_calls'] = True
|
||||
meta['tool_call_count'] += len(msg['tool_calls'])
|
||||
|
||||
# Detect failure signals in content
|
||||
content = ''
|
||||
if isinstance(msg.get('content'), str):
|
||||
content = msg['content'].lower()
|
||||
elif isinstance(msg.get('content'), list):
|
||||
for part in msg['content']:
|
||||
if isinstance(part, dict) and part.get('type') == 'text':
|
||||
content += part.get('text', '').lower()
|
||||
|
||||
if content:
|
||||
meta['last_content_preview'] = content[:200]
|
||||
failure_signals = ['error', 'failed', 'cannot', 'unable',
|
||||
'exception', 'traceback', 'rejected', 'denied']
|
||||
if any(sig in content for sig in failure_signals):
|
||||
meta['is_failure'] = True
|
||||
|
||||
# Extract repo references from tool call arguments
|
||||
if msg.get('tool_calls'):
|
||||
for tc in msg['tool_calls']:
|
||||
args = tc.get('function', {}).get('arguments', '')
|
||||
if isinstance(args, str):
|
||||
# Look for repo patterns
|
||||
for pattern in ['Timmy_Foundation/', 'Rockachopa/', 'compounding-intelligence', 'the-nexus', 'timmy-home', 'hermes-agent', 'the-beacon', 'the-door']:
|
||||
if pattern in args:
|
||||
repo = pattern.rstrip('/')
|
||||
meta['repos_mentioned'].add(repo)
|
||||
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
|
||||
except (IOError, OSError):
|
||||
pass
|
||||
|
||||
meta['repos_mentioned'] = list(meta['repos_mentioned'])
|
||||
return meta
|
||||
|
||||
|
||||
# --- Filename timestamp parsing ---
|
||||
|
||||
def parse_session_timestamp(filename: str) -> Optional[datetime]:
|
||||
"""Parse timestamp from session filename.
|
||||
|
||||
Common formats:
|
||||
session_20260413_123456_hash.jsonl
|
||||
20260413_123456_hash.jsonl
|
||||
"""
|
||||
stem = Path(filename).stem
|
||||
parts = stem.split('_')
|
||||
|
||||
# Try session_YYYYMMDD_HHMMSS format
|
||||
for i, part in enumerate(parts):
|
||||
if len(part) == 8 and part.isdigit():
|
||||
date_part = part
|
||||
time_part = parts[i + 1] if i + 1 < len(parts) and len(parts[i + 1]) == 6 else '000000'
|
||||
try:
|
||||
return datetime.strptime(f"{date_part}_{time_part}", '%Y%m%d_%H%M%S').replace(tzinfo=timezone.utc)
|
||||
except ValueError:
|
||||
continue
|
||||
|
||||
# Fallback: use file modification time
|
||||
return None
|
||||
|
||||
|
||||
# --- Scoring ---
|
||||
|
||||
def score_session(meta: dict, now: datetime, seen_repos: set) -> tuple[int, dict]:
|
||||
"""Score a session for harvest value. Returns (score, breakdown)."""
|
||||
score = 0
|
||||
breakdown = {}
|
||||
|
||||
# 1. Recency
|
||||
ts = parse_session_timestamp(os.path.basename(meta['path']))
|
||||
if ts is None:
|
||||
# Fallback to mtime
|
||||
try:
|
||||
ts = datetime.fromtimestamp(os.path.getmtime(meta['path']), tz=timezone.utc)
|
||||
except OSError:
|
||||
ts = now - timedelta(days=365)
|
||||
|
||||
age_days = (now - ts).days
|
||||
if age_days <= 7:
|
||||
recency = 3
|
||||
elif age_days <= 30:
|
||||
recency = 2
|
||||
else:
|
||||
recency = 1
|
||||
score += recency
|
||||
breakdown['recency'] = recency
|
||||
|
||||
# 2. Length
|
||||
count = meta['message_count']
|
||||
if count > 50:
|
||||
length = 3
|
||||
elif count > 20:
|
||||
length = 2
|
||||
else:
|
||||
length = 1
|
||||
score += length
|
||||
breakdown['length'] = length
|
||||
|
||||
# 3. Repo uniqueness (first session mentioning a repo gets bonus)
|
||||
repo_score = 0
|
||||
for repo in meta.get('repos_mentioned', []):
|
||||
if repo not in seen_repos:
|
||||
seen_repos.add(repo)
|
||||
repo_score = max(repo_score, 5)
|
||||
else:
|
||||
repo_score = max(repo_score, 1)
|
||||
score += repo_score
|
||||
breakdown['repo_unique'] = repo_score
|
||||
|
||||
# 4. Outcome
|
||||
if meta.get('is_failure'):
|
||||
outcome = 3
|
||||
elif meta.get('last_content_preview', '').strip():
|
||||
outcome = 2 # has some content = likely completed
|
||||
else:
|
||||
outcome = 1
|
||||
score += outcome
|
||||
breakdown['outcome'] = outcome
|
||||
|
||||
# 5. Tool calls
|
||||
if meta.get('tool_call_count', 0) > 10:
|
||||
tool = 2
|
||||
else:
|
||||
tool = 0
|
||||
score += tool
|
||||
breakdown['tool_calls'] = tool
|
||||
|
||||
return score, breakdown
|
||||
|
||||
|
||||
# --- Main ---
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Score and rank sessions for harvesting")
|
||||
parser.add_argument('--sessions-dir', default=os.path.expanduser('~/.hermes/sessions'),
|
||||
help='Directory containing session files')
|
||||
parser.add_argument('--count', type=int, default=100, help='Number of top sessions to return')
|
||||
parser.add_argument('--repo', default='', help='Filter to sessions mentioning this repo')
|
||||
parser.add_argument('--since', default='', help='Only score sessions after this date (YYYY-MM-DD)')
|
||||
parser.add_argument('--min-score', type=int, default=0, help='Minimum score threshold')
|
||||
parser.add_argument('--output', default='', help='Output file (JSON). Default: stdout')
|
||||
parser.add_argument('--format', choices=['json', 'paths', 'table'], default='table',
|
||||
help='Output format: json (full), paths (one per line), table (human)')
|
||||
parser.add_argument('--top-percent', type=float, default=0, help='Return top N%% instead of --count')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
sessions_dir = Path(args.sessions_dir)
|
||||
if not sessions_dir.is_dir():
|
||||
print(f"ERROR: Sessions directory not found: {sessions_dir}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Find all JSONL files
|
||||
print(f"Scanning {sessions_dir}...", file=sys.stderr)
|
||||
t0 = time.time()
|
||||
|
||||
session_files = list(sessions_dir.glob('*.jsonl'))
|
||||
total = len(session_files)
|
||||
print(f"Found {total} session files", file=sys.stderr)
|
||||
|
||||
# Parse since date
|
||||
since_dt = None
|
||||
if args.since:
|
||||
since_dt = datetime.strptime(args.since, '%Y-%m-%d').replace(tzinfo=timezone.utc)
|
||||
|
||||
# Score all sessions
|
||||
now = datetime.now(timezone.utc)
|
||||
seen_repos = set() # Track repos for uniqueness scoring
|
||||
scored = []
|
||||
|
||||
for i, sf in enumerate(session_files):
|
||||
# Date filter (fast path: check filename first)
|
||||
if since_dt:
|
||||
ts = parse_session_timestamp(sf.name)
|
||||
if ts and ts < since_dt:
|
||||
continue
|
||||
|
||||
meta = scan_session_fast(str(sf))
|
||||
|
||||
# Repo filter
|
||||
if args.repo:
|
||||
repos = meta.get('repos_mentioned', [])
|
||||
if args.repo.lower() not in [r.lower() for r in repos]:
|
||||
# Also check filename
|
||||
if args.repo.lower() not in sf.name.lower():
|
||||
continue
|
||||
|
||||
score, breakdown = score_session(meta, now, seen_repos)
|
||||
|
||||
if score >= args.min_score:
|
||||
scored.append({
|
||||
'path': str(sf),
|
||||
'filename': sf.name,
|
||||
'score': score,
|
||||
'breakdown': breakdown,
|
||||
'message_count': meta['message_count'],
|
||||
'repos': meta['repos_mentioned'],
|
||||
'is_failure': meta['is_failure'],
|
||||
})
|
||||
|
||||
if (i + 1) % 5000 == 0:
|
||||
elapsed = time.time() - t0
|
||||
print(f" Scanned {i + 1}/{total} ({elapsed:.1f}s)", file=sys.stderr)
|
||||
|
||||
elapsed = time.time() - t0
|
||||
print(f"Scored {len(scored)} sessions in {elapsed:.1f}s", file=sys.stderr)
|
||||
|
||||
# Sort by score descending
|
||||
scored.sort(key=lambda x: x['score'], reverse=True)
|
||||
|
||||
# Apply count or percent
|
||||
if args.top_percent > 0:
|
||||
count = max(1, int(len(scored) * args.top_percent / 100))
|
||||
else:
|
||||
count = args.count
|
||||
scored = scored[:count]
|
||||
|
||||
# Output
|
||||
if args.output:
|
||||
with open(args.output, 'w', encoding='utf-8') as f:
|
||||
json.dump(scored, f, indent=2)
|
||||
print(f"Wrote {len(scored)} sessions to {args.output}", file=sys.stderr)
|
||||
elif args.format == 'json':
|
||||
json.dump(scored, sys.stdout, indent=2)
|
||||
elif args.format == 'paths':
|
||||
for s in scored:
|
||||
print(s['path'])
|
||||
else: # table
|
||||
print(f"{'SCORE':>5} {'MSGS':>5} {'REPOS':<25} {'FILE'}")
|
||||
print(f"{'-'*5} {'-'*5} {'-'*25} {'-'*40}")
|
||||
for s in scored:
|
||||
repos = ', '.join(s['repos'][:2]) if s['repos'] else '-'
|
||||
fail = ' FAIL' if s['is_failure'] else ''
|
||||
print(f"{s['score']:>5} {s['message_count']:>5} {repos:<25} {s['filename'][:40]}{fail}")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
276
scripts/session_metadata.py
Normal file
276
scripts/session_metadata.py
Normal file
@@ -0,0 +1,276 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
session_metadata.py - Extract structured metadata from Hermes session transcripts.
|
||||
Works alongside session_reader.py to provide higher-level session analysis.
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
from dataclasses import dataclass, asdict
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional, Any
|
||||
|
||||
# Import from session_reader (the canonical reader)
|
||||
from session_reader import read_session
|
||||
|
||||
|
||||
@dataclass
|
||||
class SessionSummary:
|
||||
"""Structured summary of a Hermes session transcript."""
|
||||
session_id: str
|
||||
model: str
|
||||
repo: str
|
||||
outcome: str
|
||||
message_count: int
|
||||
tool_calls: int
|
||||
duration_estimate: str
|
||||
key_actions: List[str]
|
||||
errors_encountered: List[str]
|
||||
start_time: Optional[str] = None
|
||||
end_time: Optional[str] = None
|
||||
total_tokens_estimate: int = 0
|
||||
user_messages: int = 0
|
||||
assistant_messages: int = 0
|
||||
tool_outputs: int = 0
|
||||
|
||||
|
||||
def extract_session_metadata(file_path: str) -> SessionSummary:
|
||||
"""
|
||||
Extract structured metadata from a Hermes session JSONL transcript.
|
||||
Uses session_reader.read_session() for file reading.
|
||||
"""
|
||||
session_id = Path(file_path).stem
|
||||
messages = []
|
||||
model = "unknown"
|
||||
repo = "unknown"
|
||||
tool_calls_count = 0
|
||||
key_actions = []
|
||||
errors = []
|
||||
start_time = None
|
||||
end_time = None
|
||||
total_tokens = 0
|
||||
|
||||
# Common repo patterns to look for
|
||||
repo_patterns = [
|
||||
r"(?:the-nexus|compounding-intelligence|timmy-config|hermes-agent)",
|
||||
r"(?:forge\.alexanderwhitestone\.com/([^/]+/[^/\\s]+))",
|
||||
r"(?:github\.com/([^/]+/[^/\\s]+))",
|
||||
r"(?:Timmy_Foundation/([^/\\s]+))",
|
||||
]
|
||||
|
||||
try:
|
||||
# Use the canonical reader from session_reader.py
|
||||
messages = read_session(file_path)
|
||||
except FileNotFoundError:
|
||||
return SessionSummary(
|
||||
session_id=session_id,
|
||||
model="unknown",
|
||||
repo="unknown",
|
||||
outcome="failure",
|
||||
message_count=0,
|
||||
tool_calls=0,
|
||||
duration_estimate="0m",
|
||||
key_actions=[],
|
||||
errors_encountered=[f"File not found: {file_path}"]
|
||||
)
|
||||
|
||||
# Process messages for metadata
|
||||
for entry in messages:
|
||||
# Extract model from assistant messages
|
||||
if entry.get("role") == "assistant" and entry.get("model"):
|
||||
model = entry["model"]
|
||||
|
||||
# Extract timestamps
|
||||
if entry.get("timestamp"):
|
||||
ts = entry["timestamp"]
|
||||
if start_time is None:
|
||||
start_time = ts
|
||||
end_time = ts
|
||||
|
||||
# Count tool calls
|
||||
if entry.get("tool_calls"):
|
||||
tool_calls_count += len(entry["tool_calls"])
|
||||
for tc in entry["tool_calls"]:
|
||||
if tc.get("function", {}).get("name"):
|
||||
action = f"{tc['function']['name']}"
|
||||
if action not in key_actions:
|
||||
key_actions.append(action)
|
||||
|
||||
# Estimate tokens from content length
|
||||
content = entry.get("content", "")
|
||||
if isinstance(content, str):
|
||||
total_tokens += len(content.split())
|
||||
elif isinstance(content, list):
|
||||
for item in content:
|
||||
if isinstance(item, dict) and "text" in item:
|
||||
total_tokens += len(item["text"].split())
|
||||
|
||||
# Look for repo mentions in content
|
||||
if entry.get("content"):
|
||||
content_str = str(entry["content"])
|
||||
for pattern in repo_patterns:
|
||||
match = re.search(pattern, content_str, re.IGNORECASE)
|
||||
if match:
|
||||
if match.groups():
|
||||
repo = match.group(1)
|
||||
else:
|
||||
repo = match.group(0)
|
||||
break
|
||||
|
||||
# Look for error messages
|
||||
if entry.get("role") == "tool" and entry.get("is_error"):
|
||||
error_msg = entry.get("content", "Unknown error")
|
||||
if isinstance(error_msg, str) and len(error_msg) < 200:
|
||||
errors.append(error_msg[:200])
|
||||
|
||||
# Count message types
|
||||
user_messages = sum(1 for m in messages if m.get("role") == "user")
|
||||
assistant_messages = sum(1 for m in messages if m.get("role") == "assistant")
|
||||
tool_outputs = sum(1 for m in messages if m.get("role") == "tool")
|
||||
|
||||
# Calculate duration estimate
|
||||
duration_estimate = "unknown"
|
||||
if start_time and end_time:
|
||||
try:
|
||||
# Try to parse timestamps
|
||||
start_dt = None
|
||||
end_dt = None
|
||||
|
||||
# Handle various timestamp formats
|
||||
for fmt in ["%Y-%m-%dT%H:%M:%S.%fZ", "%Y-%m-%dT%H:%M:%SZ", "%Y-%m-%d %H:%M:%S"]:
|
||||
try:
|
||||
if start_dt is None:
|
||||
start_dt = datetime.strptime(start_time, fmt)
|
||||
if end_dt is None:
|
||||
end_dt = datetime.strptime(end_time, fmt)
|
||||
except ValueError:
|
||||
continue
|
||||
|
||||
if start_dt and end_dt:
|
||||
duration = end_dt - start_dt
|
||||
minutes = duration.total_seconds() / 60
|
||||
duration_estimate = f"{minutes:.0f}m"
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Classify outcome
|
||||
outcome = "unknown"
|
||||
if errors:
|
||||
# Check if any errors are fatal
|
||||
fatal_errors = any("405" in e or "permission" in e.lower() or "authentication" in e.lower()
|
||||
for e in errors)
|
||||
if fatal_errors:
|
||||
outcome = "failure"
|
||||
else:
|
||||
outcome = "partial"
|
||||
elif messages:
|
||||
# Check last message for success indicators
|
||||
last_msg = messages[-1]
|
||||
if last_msg.get("role") == "assistant":
|
||||
content = last_msg.get("content", "")
|
||||
if isinstance(content, str):
|
||||
success_indicators = ["done", "completed", "success", "merged", "pushed"]
|
||||
if any(indicator in content.lower() for indicator in success_indicators):
|
||||
outcome = "success"
|
||||
else:
|
||||
outcome = "unknown"
|
||||
|
||||
# Deduplicate key actions (keep unique, limit to 10)
|
||||
unique_actions = []
|
||||
for action in key_actions:
|
||||
if action not in unique_actions:
|
||||
unique_actions.append(action)
|
||||
if len(unique_actions) >= 10:
|
||||
break
|
||||
|
||||
# Deduplicate errors (keep unique, limit to 5)
|
||||
unique_errors = []
|
||||
for error in errors:
|
||||
if error not in unique_errors:
|
||||
unique_errors.append(error)
|
||||
if len(unique_errors) >= 5:
|
||||
break
|
||||
|
||||
return SessionSummary(
|
||||
session_id=session_id,
|
||||
model=model,
|
||||
repo=repo,
|
||||
outcome=outcome,
|
||||
message_count=len(messages),
|
||||
tool_calls=tool_calls_count,
|
||||
duration_estimate=duration_estimate,
|
||||
key_actions=unique_actions,
|
||||
errors_encountered=unique_errors,
|
||||
start_time=start_time,
|
||||
end_time=end_time,
|
||||
total_tokens_estimate=total_tokens,
|
||||
user_messages=user_messages,
|
||||
assistant_messages=assistant_messages,
|
||||
tool_outputs=tool_outputs
|
||||
)
|
||||
|
||||
|
||||
def process_session_directory(directory_path: str, output_file: Optional[str] = None) -> List[SessionSummary]:
|
||||
"""
|
||||
Process all JSONL files in a directory.
|
||||
"""
|
||||
directory = Path(directory_path)
|
||||
if not directory.exists():
|
||||
print(f"Error: Directory {directory_path} does not exist", file=sys.stderr)
|
||||
return []
|
||||
|
||||
jsonl_files = list(directory.glob("*.jsonl"))
|
||||
if not jsonl_files:
|
||||
print(f"Warning: No JSONL files found in {directory_path}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
summaries = []
|
||||
for jsonl_file in sorted(jsonl_files):
|
||||
print(f"Processing {jsonl_file.name}...", file=sys.stderr)
|
||||
summary = extract_session_metadata(str(jsonl_file))
|
||||
summaries.append(summary)
|
||||
|
||||
if output_file:
|
||||
with open(output_file, 'w', encoding='utf-8') as f:
|
||||
json.dump([asdict(s) for s in summaries], f, indent=2)
|
||||
print(f"Wrote {len(summaries)} summaries to {output_file}", file=sys.stderr)
|
||||
|
||||
return summaries
|
||||
|
||||
|
||||
def main():
|
||||
"""CLI entry point."""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="Extract metadata from Hermes session JSONL transcripts")
|
||||
parser.add_argument("path", help="Path to JSONL file or directory of session files")
|
||||
parser.add_argument("-o", "--output", help="Output JSON file (default: stdout)")
|
||||
parser.add_argument("-v", "--verbose", action="store_true", help="Verbose output")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
path = Path(args.path)
|
||||
|
||||
if path.is_file():
|
||||
summary = extract_session_metadata(str(path))
|
||||
if args.output:
|
||||
with open(args.output, 'w') as f:
|
||||
json.dump(asdict(summary), f, indent=2)
|
||||
print(f"Wrote summary to {args.output}", file=sys.stderr)
|
||||
else:
|
||||
print(json.dumps(asdict(summary), indent=2))
|
||||
|
||||
elif path.is_dir():
|
||||
summaries = process_session_directory(str(path), args.output)
|
||||
if not args.output:
|
||||
print(json.dumps([asdict(s) for s in summaries], indent=2))
|
||||
|
||||
else:
|
||||
print(f"Error: {args.path} is not a file or directory", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
234
scripts/session_pair_harvester.py
Normal file
234
scripts/session_pair_harvester.py
Normal file
@@ -0,0 +1,234 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Session Transcript → Training Pair Harvester
|
||||
|
||||
Scans Hermes session JSONL files for Q&A patterns and extracts
|
||||
terse→rich training pairs. Outputs JSONL matching the timmy-config
|
||||
training pairs spec.
|
||||
|
||||
Usage:
|
||||
python3 scripts/session_pair_harvester.py ~/.hermes/sessions/
|
||||
python3 scripts/session_pair_harvester.py session.jsonl --output pairs.jsonl
|
||||
python3 scripts/session_pair_harvester.py --dir ~/.hermes/sessions/ --min-ratio 2.0
|
||||
|
||||
Output format:
|
||||
{"terse": "user short prompt", "rich": "ai detailed response", "source": "session_id", "model": "..."}
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import hashlib
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
|
||||
def compute_hash(text: str) -> str:
|
||||
"""Content hash for deduplication."""
|
||||
return hashlib.sha256(text.encode()).hexdigest()[:16]
|
||||
|
||||
|
||||
def extract_pairs_from_session(session_data: dict, min_ratio: float = 1.5,
|
||||
min_response_words: int = 20) -> list:
|
||||
"""Extract terse→rich pairs from a single session object."""
|
||||
pairs = []
|
||||
conversations = session_data.get("conversations", [])
|
||||
session_id = session_data.get("id", "unknown")
|
||||
model = session_data.get("model", "unknown")
|
||||
|
||||
seen_hashes = set()
|
||||
|
||||
for i, msg in enumerate(conversations):
|
||||
# Look for assistant/gpt responses
|
||||
if msg.get("from") not in ("gpt", "assistant"):
|
||||
continue
|
||||
|
||||
response_text = msg.get("value", "")
|
||||
if not response_text or len(response_text.split()) < min_response_words:
|
||||
continue
|
||||
|
||||
# Find the preceding human message
|
||||
prompt_text = ""
|
||||
for j in range(i - 1, -1, -1):
|
||||
if conversations[j].get("from") == "human":
|
||||
prompt_text = conversations[j].get("value", "")
|
||||
break
|
||||
|
||||
if not prompt_text:
|
||||
continue
|
||||
|
||||
# Filter: skip tool results, system messages embedded as human
|
||||
if prompt_text.startswith("{") and "output" in prompt_text[:100]:
|
||||
continue # likely a tool result
|
||||
if prompt_text.startswith("# SOUL.md") or prompt_text.startswith("You are"):
|
||||
continue # system prompt leak
|
||||
|
||||
# Quality filters
|
||||
prompt_words = len(prompt_text.split())
|
||||
response_words = len(response_text.split())
|
||||
|
||||
# Must have meaningful length ratio
|
||||
if prompt_words == 0 or response_words == 0:
|
||||
continue
|
||||
ratio = response_words / prompt_words
|
||||
if ratio < min_ratio:
|
||||
continue
|
||||
|
||||
# Skip responses that are mostly code
|
||||
code_blocks = response_text.count("```")
|
||||
if code_blocks >= 4 and len(response_text.replace("```", "").strip()) < 50:
|
||||
continue
|
||||
|
||||
# Skip responses with tool call artifacts
|
||||
if "tool_call" in response_text[:100] or "function_call" in response_text[:100]:
|
||||
continue
|
||||
|
||||
# Deduplicate by content hash
|
||||
content_hash = compute_hash(prompt_text + response_text[:200])
|
||||
if content_hash in seen_hashes:
|
||||
continue
|
||||
seen_hashes.add(content_hash)
|
||||
|
||||
# Clean up response: remove markdown headers if too many
|
||||
clean_response = response_text
|
||||
|
||||
pairs.append({
|
||||
"terse": prompt_text.strip(),
|
||||
"rich": clean_response.strip(),
|
||||
"source": session_id,
|
||||
"model": model,
|
||||
"prompt_words": prompt_words,
|
||||
"response_words": response_words,
|
||||
"ratio": round(ratio, 2),
|
||||
})
|
||||
|
||||
return pairs
|
||||
|
||||
|
||||
def extract_from_jsonl_file(filepath: str, **kwargs) -> list:
|
||||
"""Extract pairs from a session JSONL file."""
|
||||
pairs = []
|
||||
path = Path(filepath)
|
||||
|
||||
if not path.exists():
|
||||
print(f"Warning: {filepath} not found", file=sys.stderr)
|
||||
return pairs
|
||||
|
||||
content = path.read_text()
|
||||
lines = content.strip().split("\n")
|
||||
|
||||
for line in lines:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
session = json.loads(line)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
|
||||
session_pairs = extract_pairs_from_session(session, **kwargs)
|
||||
pairs.extend(session_pairs)
|
||||
|
||||
return pairs
|
||||
|
||||
|
||||
def deduplicate_pairs(pairs: list) -> list:
|
||||
"""Remove duplicate pairs across files."""
|
||||
seen = set()
|
||||
unique = []
|
||||
for pair in pairs:
|
||||
key = compute_hash(pair["terse"] + pair["rich"][:200])
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
unique.append(pair)
|
||||
return unique
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Harvest training pairs from session transcripts")
|
||||
parser.add_argument("input", nargs="?", help="Session JSONL file or directory")
|
||||
parser.add_argument("--dir", "-d", help="Directory to scan for session files")
|
||||
parser.add_argument("--output", "-o", default="harvested_pairs.jsonl", help="Output file")
|
||||
parser.add_argument("--min-ratio", type=float, default=1.5, help="Min response/prompt word ratio")
|
||||
parser.add_argument("--min-words", type=int, default=20, help="Min response word count")
|
||||
parser.add_argument("--dry-run", action="store_true", help="Print stats without writing")
|
||||
args = parser.parse_args()
|
||||
|
||||
all_pairs = []
|
||||
files_scanned = 0
|
||||
|
||||
scan_dir = args.dir or args.input
|
||||
if not scan_dir:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
scan_path = Path(scan_dir)
|
||||
if scan_path.is_dir():
|
||||
jsonl_files = sorted(scan_path.rglob("*.jsonl"))
|
||||
print(f"Scanning {len(jsonl_files)} files in {scan_dir}...", file=sys.stderr)
|
||||
for fpath in jsonl_files:
|
||||
pairs = extract_from_jsonl_file(
|
||||
str(fpath),
|
||||
min_ratio=args.min_ratio,
|
||||
min_response_words=args.min_words
|
||||
)
|
||||
all_pairs.extend(pairs)
|
||||
files_scanned += 1
|
||||
else:
|
||||
pairs = extract_from_jsonl_file(
|
||||
str(scan_path),
|
||||
min_ratio=args.min_ratio,
|
||||
min_response_words=args.min_words
|
||||
)
|
||||
all_pairs.extend(pairs)
|
||||
files_scanned = 1
|
||||
|
||||
# Deduplicate
|
||||
unique_pairs = deduplicate_pairs(all_pairs)
|
||||
|
||||
# Stats
|
||||
if unique_pairs:
|
||||
avg_prompt = sum(p["prompt_words"] for p in unique_pairs) / len(unique_pairs)
|
||||
avg_response = sum(p["response_words"] for p in unique_pairs) / len(unique_pairs)
|
||||
avg_ratio = sum(p["ratio"] for p in unique_pairs) / len(unique_pairs)
|
||||
else:
|
||||
avg_prompt = avg_response = avg_ratio = 0
|
||||
|
||||
stats = {
|
||||
"files_scanned": files_scanned,
|
||||
"raw_pairs": len(all_pairs),
|
||||
"unique_pairs": len(unique_pairs),
|
||||
"duplicates_removed": len(all_pairs) - len(unique_pairs),
|
||||
"avg_prompt_words": round(avg_prompt, 1),
|
||||
"avg_response_words": round(avg_response, 1),
|
||||
"avg_ratio": round(avg_ratio, 2),
|
||||
}
|
||||
|
||||
print(json.dumps(stats, indent=2), file=sys.stderr)
|
||||
|
||||
if args.dry_run:
|
||||
# Print sample pairs
|
||||
for pair in unique_pairs[:3]:
|
||||
print(f"\n--- Source: {pair['source']} (ratio: {pair['ratio']}) ---", file=sys.stderr)
|
||||
print(f"TERSE: {pair['terse'][:100]}...", file=sys.stderr)
|
||||
print(f"RICH: {pair['rich'][:150]}...", file=sys.stderr)
|
||||
return
|
||||
|
||||
# Write output
|
||||
output_path = Path(args.output)
|
||||
with open(output_path, "w") as f:
|
||||
for pair in unique_pairs:
|
||||
# Strip internal fields for output
|
||||
output = {
|
||||
"terse": pair["terse"],
|
||||
"rich": pair["rich"],
|
||||
"source": pair["source"],
|
||||
"model": pair["model"],
|
||||
}
|
||||
f.write(json.dumps(output) + "\n")
|
||||
|
||||
print(f"\nWrote {len(unique_pairs)} pairs to {output_path}", file=sys.stderr)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
142
scripts/session_reader.py
Normal file
142
scripts/session_reader.py
Normal file
@@ -0,0 +1,142 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
session_reader.py — Parse Hermes session JSONL transcripts.
|
||||
|
||||
Each line in a session file is a JSON object representing a message.
|
||||
Standard fields: role (user|assistant|system), content (str), timestamp (str).
|
||||
Tool calls and tool results are also captured.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Iterator, Optional
|
||||
|
||||
|
||||
def read_session(path: str) -> list[dict]:
|
||||
"""Read a session JSONL file and return all messages as a list."""
|
||||
messages = []
|
||||
with open(path, 'r', encoding='utf-8') as f:
|
||||
for line_num, line in enumerate(f, 1):
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
msg = json.loads(line)
|
||||
messages.append(msg)
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"WARNING: Skipping malformed JSON at line {line_num}: {e}", file=sys.stderr)
|
||||
return messages
|
||||
|
||||
|
||||
def read_session_iter(path: str) -> Iterator[dict]:
|
||||
"""Iterate over session messages without loading all into memory."""
|
||||
with open(path, 'r', encoding='utf-8') as f:
|
||||
for line_num, line in enumerate(f, 1):
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
yield json.loads(line)
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"WARNING: Skipping malformed JSON at line {line_num}: {e}", file=sys.stderr)
|
||||
|
||||
|
||||
def extract_conversation(messages: list[dict]) -> list[dict]:
|
||||
"""Extract user/assistant conversation turns, skipping tool-only messages."""
|
||||
conversation = []
|
||||
for msg in messages:
|
||||
role = msg.get('role', '')
|
||||
content = msg.get('content', '')
|
||||
|
||||
# Skip empty messages and pure tool calls
|
||||
if role in ('user', 'assistant', 'system'):
|
||||
if isinstance(content, str) and content.strip():
|
||||
conversation.append({
|
||||
'role': role,
|
||||
'content': content.strip(),
|
||||
'timestamp': msg.get('timestamp', '')
|
||||
})
|
||||
elif isinstance(content, list):
|
||||
# Multimodal content — extract text parts
|
||||
text_parts = []
|
||||
for part in content:
|
||||
if isinstance(part, dict) and part.get('type') == 'text':
|
||||
text_parts.append(part.get('text', ''))
|
||||
if text_parts:
|
||||
conversation.append({
|
||||
'role': role,
|
||||
'content': '\n'.join(text_parts),
|
||||
'timestamp': msg.get('timestamp', '')
|
||||
})
|
||||
return conversation
|
||||
|
||||
|
||||
def truncate_for_context(messages: list[dict], head: int = 50, tail: int = 50) -> list[dict]:
|
||||
"""Truncate long sessions: keep first N + last N messages.
|
||||
|
||||
This preserves session start (initial context) and end (final results),
|
||||
skipping the messy middle of long debugging sessions.
|
||||
"""
|
||||
if len(messages) <= head + tail:
|
||||
return messages
|
||||
|
||||
truncated = messages[:head]
|
||||
truncated.append({
|
||||
'role': 'system',
|
||||
'content': f'[{len(messages) - head - tail} messages truncated]',
|
||||
'timestamp': ''
|
||||
})
|
||||
truncated.extend(messages[-tail:])
|
||||
return truncated
|
||||
|
||||
|
||||
def messages_to_text(messages: list[dict]) -> str:
|
||||
"""Convert message list to plain text for LLM consumption."""
|
||||
lines = []
|
||||
for msg in messages:
|
||||
role = msg.get('role', 'unknown').upper()
|
||||
content = msg.get('content', '')
|
||||
if msg.get('role') == 'system' and 'truncated' in content:
|
||||
lines.append(f'--- {content} ---')
|
||||
else:
|
||||
lines.append(f'{role}: {content}')
|
||||
return '\n\n'.join(lines)
|
||||
|
||||
|
||||
def get_session_metadata(path: str) -> dict:
|
||||
"""Extract metadata from a session file (first message often has config info)."""
|
||||
messages = read_session(path)
|
||||
if not messages:
|
||||
return {'path': path, 'message_count': 0}
|
||||
|
||||
first = messages[0]
|
||||
last = messages[-1]
|
||||
|
||||
return {
|
||||
'path': path,
|
||||
'message_count': len(messages),
|
||||
'first_timestamp': first.get('timestamp', ''),
|
||||
'last_timestamp': last.get('timestamp', ''),
|
||||
'first_role': first.get('role', ''),
|
||||
'has_tool_calls': any(m.get('tool_calls') for m in messages),
|
||||
}
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
if len(sys.argv) < 2:
|
||||
print(f"Usage: {sys.argv[0]} <session.jsonl>")
|
||||
sys.exit(1)
|
||||
|
||||
path = sys.argv[1]
|
||||
meta = get_session_metadata(path)
|
||||
print(json.dumps(meta, indent=2))
|
||||
|
||||
messages = read_session(path)
|
||||
conv = extract_conversation(messages)
|
||||
print(f"\nConversation: {len(conv)} turns")
|
||||
|
||||
truncated = truncate_for_context(conv)
|
||||
print(f"After truncation: {len(truncated)} turns")
|
||||
print(f"\nPreview (first 500 chars):")
|
||||
print(messages_to_text(truncated[:5])[:500])
|
||||
147
scripts/test_automation_opportunity_finder.py
Normal file
147
scripts/test_automation_opportunity_finder.py
Normal file
@@ -0,0 +1,147 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Tests for scripts/automation_opportunity_finder.py — 8 tests."""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import tempfile
|
||||
|
||||
sys.path.insert(0, os.path.dirname(__file__) or ".")
|
||||
import importlib.util
|
||||
spec = importlib.util.spec_from_file_location(
|
||||
"aof",
|
||||
os.path.join(os.path.dirname(__file__) or ".", "automation_opportunity_finder.py"),
|
||||
)
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(mod)
|
||||
|
||||
|
||||
def test_analyze_cron_jobs_no_file():
|
||||
"""Returns empty list when no cron jobs file exists."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
result = mod.analyze_cron_jobs(tmpdir)
|
||||
assert result == []
|
||||
print("PASS: test_analyze_cron_jobs_no_file")
|
||||
|
||||
|
||||
def test_analyze_cron_jobs_disabled():
|
||||
"""Detects disabled cron jobs."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
cron_dir = os.path.join(tmpdir, "cron")
|
||||
os.makedirs(cron_dir)
|
||||
jobs = [
|
||||
{"id": "j1", "name": "backup", "enabled": False, "schedule": "0 * * * *"},
|
||||
{"id": "j2", "name": "health", "enabled": True, "schedule": "*/5 * * * *"},
|
||||
]
|
||||
with open(os.path.join(cron_dir, "jobs.json"), "w") as f:
|
||||
json.dump(jobs, f)
|
||||
result = mod.analyze_cron_jobs(tmpdir)
|
||||
assert any(p["category"] == "cron_disabled" for p in result)
|
||||
print("PASS: test_analyze_cron_jobs_disabled")
|
||||
|
||||
|
||||
def test_analyze_cron_jobs_errors():
|
||||
"""Detects cron jobs with error status."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
cron_dir = os.path.join(tmpdir, "cron")
|
||||
os.makedirs(cron_dir)
|
||||
jobs = [
|
||||
{"id": "j1", "name": "broken", "enabled": True, "last_status": "error", "schedule": "0 * * * *"},
|
||||
]
|
||||
with open(os.path.join(cron_dir, "jobs.json"), "w") as f:
|
||||
json.dump(jobs, f)
|
||||
result = mod.analyze_cron_jobs(tmpdir)
|
||||
assert any(p["category"] == "cron_errors" for p in result)
|
||||
print("PASS: test_analyze_cron_jobs_errors")
|
||||
|
||||
|
||||
def test_analyze_documents_finds_todos():
|
||||
"""Detects TODO markers in documents."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
docs_dir = os.path.join(tmpdir, "docs")
|
||||
os.makedirs(docs_dir)
|
||||
for i in range(3):
|
||||
with open(os.path.join(docs_dir, f"guide{i}.md"), "w") as f:
|
||||
f.write(f"# Guide {i}\n\nTODO: Automate this step\n")
|
||||
result = mod.analyze_documents([tmpdir])
|
||||
assert any(p["category"] == "manual_todo" for p in result)
|
||||
todo_proposals = [p for p in result if p["category"] == "manual_todo"]
|
||||
assert todo_proposals[0]["details"].__len__() == 3
|
||||
print("PASS: test_analyze_documents_finds_todos")
|
||||
|
||||
|
||||
def test_analyze_scripts_repeated_commands():
|
||||
"""Detects repeated shell commands across scripts."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
scripts_dir = os.path.join(tmpdir, "scripts")
|
||||
os.makedirs(scripts_dir)
|
||||
repeated_cmd = "docker restart myapp"
|
||||
for i in range(4):
|
||||
with open(os.path.join(scripts_dir, f"deploy{i}.sh"), "w") as f:
|
||||
f.write(f"#!/bin/bash\n{repeated_cmd}\n")
|
||||
result = mod.analyze_scripts([tmpdir])
|
||||
assert any(p["category"] == "repeated_command" for p in result)
|
||||
print("PASS: test_analyze_scripts_repeated_commands")
|
||||
|
||||
|
||||
def test_analyze_session_transcripts():
|
||||
"""Detects repeated tool-call sequences."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
sessions_dir = os.path.join(tmpdir, "sessions")
|
||||
os.makedirs(sessions_dir)
|
||||
for i in range(4):
|
||||
with open(os.path.join(sessions_dir, f"session{i}.jsonl"), "w") as f:
|
||||
f.write(json.dumps({"role": "user", "content": f"task {i}"}) + "\n")
|
||||
f.write(json.dumps({
|
||||
"role": "assistant",
|
||||
"content": "working",
|
||||
"tool_calls": [
|
||||
{"function": {"name": "read_file"}},
|
||||
{"function": {"name": "write_file"}},
|
||||
]
|
||||
}) + "\n")
|
||||
result = mod.analyze_session_transcripts([sessions_dir])
|
||||
assert any(p["category"] == "tool_sequence" for p in result)
|
||||
seq_proposals = [p for p in result if p["category"] == "tool_sequence"]
|
||||
assert any("read_file" in p["title"] and "write_file" in p["title"] for p in seq_proposals)
|
||||
print("PASS: test_analyze_session_transcripts")
|
||||
|
||||
|
||||
def test_deduplicate_proposals():
|
||||
"""Deduplicates proposals with similar titles."""
|
||||
proposals = [
|
||||
{"title": "TODO found 3 times", "category": "manual_todo", "confidence": 0.7, "impact": "medium", "description": "x", "sources": []},
|
||||
{"title": "TODO found 3 times", "category": "manual_todo", "confidence": 0.7, "impact": "medium", "description": "x", "sources": []},
|
||||
{"title": "FIXME found 5 times", "category": "manual_fixme", "confidence": 0.8, "impact": "medium", "description": "y", "sources": []},
|
||||
]
|
||||
result = mod.deduplicate_proposals(proposals)
|
||||
assert len(result) == 2
|
||||
print("PASS: test_deduplicate_proposals")
|
||||
|
||||
|
||||
def test_rank_proposals():
|
||||
"""Ranks proposals by impact * confidence."""
|
||||
proposals = [
|
||||
{"title": "low", "category": "x", "confidence": 0.9, "impact": "low", "description": "", "sources": []},
|
||||
{"title": "high", "category": "x", "confidence": 0.8, "impact": "high", "description": "", "sources": []},
|
||||
{"title": "med", "category": "x", "confidence": 0.7, "impact": "medium", "description": "", "sources": []},
|
||||
]
|
||||
result = mod.rank_proposals(proposals)
|
||||
assert result[0]["title"] == "high"
|
||||
assert result[-1]["title"] == "low"
|
||||
print("PASS: test_rank_proposals")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
tests = [v for k, v in globals().items() if k.startswith("test_")]
|
||||
passed = 0
|
||||
failed = 0
|
||||
for t in tests:
|
||||
try:
|
||||
t()
|
||||
passed += 1
|
||||
except Exception as e:
|
||||
print(f"FAIL: {t.__name__}: {e}")
|
||||
failed += 1
|
||||
print(f"\n{passed}/{passed+failed} tests passed")
|
||||
sys.exit(1 if failed else 0)
|
||||
239
scripts/test_bootstrapper.py
Normal file
239
scripts/test_bootstrapper.py
Normal file
@@ -0,0 +1,239 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Tests for bootstrapper.py — context assembly from knowledge store.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
# Add scripts dir to path for import
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent))
|
||||
|
||||
from bootstrapper import (
|
||||
build_bootstrap_context,
|
||||
estimate_tokens,
|
||||
filter_facts,
|
||||
load_index,
|
||||
sort_facts,
|
||||
truncate_to_tokens,
|
||||
)
|
||||
|
||||
|
||||
def make_index(facts: list[dict], tmp_dir: Path) -> Path:
|
||||
"""Create a temporary index.json with given facts."""
|
||||
index = {
|
||||
"version": 1,
|
||||
"last_updated": "2026-04-13T20:00:00Z",
|
||||
"total_facts": len(facts),
|
||||
"facts": facts,
|
||||
}
|
||||
path = tmp_dir / "index.json"
|
||||
with open(path, "w") as f:
|
||||
json.dump(index, f)
|
||||
return path
|
||||
|
||||
|
||||
def test_empty_index():
|
||||
"""Empty knowledge store produces graceful output."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_dir = Path(tmp)
|
||||
index_path = make_index([], tmp_dir)
|
||||
|
||||
# Create empty knowledge dirs
|
||||
for sub in ["repos", "agents", "global"]:
|
||||
(tmp_dir / sub).mkdir(exist_ok=True)
|
||||
|
||||
context = build_bootstrap_context(
|
||||
repo="the-nexus", index_path=index_path
|
||||
)
|
||||
assert "No relevant knowledge found" in context
|
||||
assert "Starting fresh" in context
|
||||
print("PASS: empty_index")
|
||||
|
||||
|
||||
def test_filter_by_repo():
|
||||
"""Filter facts by repository."""
|
||||
facts = [
|
||||
{"fact": "A", "category": "fact", "repo": "the-nexus", "confidence": 0.9},
|
||||
{"fact": "B", "category": "fact", "repo": "fleet-ops", "confidence": 0.8},
|
||||
{"fact": "C", "category": "fact", "repo": "global", "confidence": 0.7},
|
||||
]
|
||||
filtered = filter_facts(facts, repo="the-nexus", include_global=True)
|
||||
texts = [f["fact"] for f in filtered]
|
||||
assert "A" in texts
|
||||
assert "B" not in texts
|
||||
assert "C" in texts
|
||||
print("PASS: filter_by_repo")
|
||||
|
||||
|
||||
def test_filter_by_agent():
|
||||
"""Filter facts by agent type."""
|
||||
facts = [
|
||||
{"fact": "A", "category": "pattern", "repo": "global", "agent": "mimo-sprint", "confidence": 0.8},
|
||||
{"fact": "B", "category": "pattern", "repo": "global", "agent": "groq-fast", "confidence": 0.7},
|
||||
{"fact": "C", "category": "fact", "repo": "global", "confidence": 0.9},
|
||||
]
|
||||
filtered = filter_facts(facts, agent="mimo-sprint", include_global=True)
|
||||
texts = [f["fact"] for f in filtered]
|
||||
assert "A" in texts
|
||||
assert "B" not in texts
|
||||
assert "C" in texts # global, no agent restriction
|
||||
print("PASS: filter_by_agent")
|
||||
|
||||
|
||||
def test_no_global_flag():
|
||||
"""Excluding global facts works."""
|
||||
facts = [
|
||||
{"fact": "A", "category": "fact", "repo": "the-nexus", "confidence": 0.9},
|
||||
{"fact": "B", "category": "fact", "repo": "global", "confidence": 0.8},
|
||||
]
|
||||
filtered = filter_facts(facts, repo="the-nexus", include_global=False)
|
||||
texts = [f["fact"] for f in filtered]
|
||||
assert "A" in texts
|
||||
assert "B" not in texts
|
||||
print("PASS: no_global_flag")
|
||||
|
||||
|
||||
def test_sort_by_confidence():
|
||||
"""Facts sort by confidence descending."""
|
||||
facts = [
|
||||
{"fact": "low", "category": "fact", "repo": "global", "confidence": 0.3},
|
||||
{"fact": "high", "category": "fact", "repo": "global", "confidence": 0.95},
|
||||
{"fact": "mid", "category": "fact", "repo": "global", "confidence": 0.7},
|
||||
]
|
||||
sorted_f = sort_facts(facts)
|
||||
assert sorted_f[0]["fact"] == "high"
|
||||
assert sorted_f[1]["fact"] == "mid"
|
||||
assert sorted_f[2]["fact"] == "low"
|
||||
print("PASS: sort_by_confidence")
|
||||
|
||||
|
||||
def test_sort_pitfalls_first():
|
||||
"""Pitfalls sort before facts at same confidence."""
|
||||
facts = [
|
||||
{"fact": "regular fact", "category": "fact", "repo": "global", "confidence": 0.8},
|
||||
{"fact": "danger pitfall", "category": "pitfall", "repo": "global", "confidence": 0.8},
|
||||
]
|
||||
sorted_f = sort_facts(facts)
|
||||
assert sorted_f[0]["category"] == "pitfall"
|
||||
print("PASS: sort_pitfalls_first")
|
||||
|
||||
|
||||
def test_truncate_to_tokens():
|
||||
"""Truncation cuts at line boundary."""
|
||||
text = "line1\nline2\nline3\nline4\nline5\n"
|
||||
truncated = truncate_to_tokens(text, max_tokens=2) # ~8 chars
|
||||
assert "line1" in truncated
|
||||
assert "truncated" in truncated.lower()
|
||||
print("PASS: truncate_to_tokens")
|
||||
|
||||
|
||||
def test_estimate_tokens():
|
||||
"""Token estimation is reasonable."""
|
||||
text = "a" * 400
|
||||
tokens = estimate_tokens(text)
|
||||
assert 90 <= tokens <= 110 # ~100 tokens
|
||||
print("PASS: estimate_tokens")
|
||||
|
||||
|
||||
def test_build_full_context():
|
||||
"""Full context with facts renders correctly."""
|
||||
facts = [
|
||||
{"fact": "API merges fail with 405", "category": "pitfall", "repo": "the-nexus", "confidence": 0.95},
|
||||
{"fact": "Has 50+ open PRs", "category": "fact", "repo": "the-nexus", "confidence": 0.9},
|
||||
{"fact": "Token at ~/.config/gitea/token", "category": "tool-quirk", "repo": "global", "confidence": 0.9},
|
||||
{"fact": "Check git remote -v first", "category": "pattern", "repo": "global", "confidence": 0.8},
|
||||
]
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_dir = Path(tmp)
|
||||
index_path = make_index(facts, tmp_dir)
|
||||
|
||||
# Create knowledge dirs
|
||||
for sub in ["repos", "agents", "global"]:
|
||||
(tmp_dir / sub).mkdir(exist_ok=True)
|
||||
|
||||
context = build_bootstrap_context(
|
||||
repo="the-nexus",
|
||||
agent="mimo-sprint",
|
||||
include_global=True,
|
||||
index_path=index_path,
|
||||
)
|
||||
|
||||
assert "What You Know" in context
|
||||
assert "PITFALLS" in context
|
||||
assert "API merges fail with 405" in context
|
||||
assert "the-nexus" in context
|
||||
assert "Token at" in context # global fact included
|
||||
print("PASS: build_full_context")
|
||||
|
||||
|
||||
def test_max_tokens_respected():
|
||||
"""Output respects max_tokens limit."""
|
||||
# Generate lots of facts
|
||||
facts = [
|
||||
{"fact": f"Fact number {i} with some detail about things", "category": "fact", "repo": "global", "confidence": 0.8}
|
||||
for i in range(100)
|
||||
]
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_dir = Path(tmp)
|
||||
index_path = make_index(facts, tmp_dir)
|
||||
for sub in ["repos", "agents", "global"]:
|
||||
(tmp_dir / sub).mkdir(exist_ok=True)
|
||||
|
||||
context = build_bootstrap_context(
|
||||
repo=None,
|
||||
max_tokens=500,
|
||||
index_path=index_path,
|
||||
)
|
||||
|
||||
actual_tokens = estimate_tokens(context)
|
||||
# Allow 10% overshoot since we cut at line boundaries
|
||||
assert actual_tokens <= 550, f"Expected ~500 tokens, got {actual_tokens}"
|
||||
print(f"PASS: max_tokens_respected (got {actual_tokens} tokens)")
|
||||
|
||||
|
||||
def test_missing_index_graceful():
|
||||
"""Missing index.json doesn't crash."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_dir = Path(tmp)
|
||||
# Don't create index.json
|
||||
for sub in ["repos", "agents", "global"]:
|
||||
(tmp_dir / sub).mkdir(exist_ok=True)
|
||||
|
||||
fake_index = tmp_dir / "nonexistent.json"
|
||||
context = build_bootstrap_context(repo="anything", index_path=fake_index)
|
||||
assert "No relevant knowledge found" in context
|
||||
print("PASS: missing_index_graceful")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
tests = [
|
||||
test_empty_index,
|
||||
test_filter_by_repo,
|
||||
test_filter_by_agent,
|
||||
test_no_global_flag,
|
||||
test_sort_by_confidence,
|
||||
test_sort_pitfalls_first,
|
||||
test_truncate_to_tokens,
|
||||
test_estimate_tokens,
|
||||
test_build_full_context,
|
||||
test_max_tokens_respected,
|
||||
test_missing_index_graceful,
|
||||
]
|
||||
|
||||
passed = 0
|
||||
failed = 0
|
||||
for test in tests:
|
||||
try:
|
||||
test()
|
||||
passed += 1
|
||||
except Exception as e:
|
||||
print(f"FAIL: {test.__name__} — {e}")
|
||||
failed += 1
|
||||
|
||||
print(f"\n{passed} passed, {failed} failed")
|
||||
sys.exit(0 if failed == 0 else 1)
|
||||
189
scripts/test_diff_analyzer.py
Normal file
189
scripts/test_diff_analyzer.py
Normal file
@@ -0,0 +1,189 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Tests for scripts/diff_analyzer.py — 10 tests."""
|
||||
|
||||
import sys
|
||||
import os
|
||||
sys.path.insert(0, os.path.dirname(__file__) or ".")
|
||||
|
||||
import importlib.util
|
||||
spec = importlib.util.spec_from_file_location("da", os.path.join(os.path.dirname(__file__) or ".", "diff_analyzer.py"))
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(mod)
|
||||
DiffAnalyzer = mod.DiffAnalyzer
|
||||
ChangeCategory = mod.ChangeCategory
|
||||
|
||||
|
||||
SAMPLE_ADD = """diff --git a/new.py b/new.py
|
||||
new file mode 100644
|
||||
--- /dev/null
|
||||
+++ b/new.py
|
||||
@@ -0,0 +1,3 @@
|
||||
+def hello():
|
||||
+ print("world")
|
||||
+ return True
|
||||
"""
|
||||
|
||||
SAMPLE_DELETE = """diff --git a/old.py b/old.py
|
||||
deleted file mode 100644
|
||||
--- a/old.py
|
||||
+++ /dev/null
|
||||
@@ -1,2 +0,0 @@
|
||||
-def goodbye():
|
||||
- pass
|
||||
"""
|
||||
|
||||
SAMPLE_MODIFY = """diff --git a/app.py b/app.py
|
||||
--- a/app.py
|
||||
+++ b/app.py
|
||||
@@ -1,3 +1,4 @@
|
||||
def main():
|
||||
- print("old")
|
||||
+ print("new")
|
||||
+ print("extra")
|
||||
return 0
|
||||
"""
|
||||
|
||||
SAMPLE_RENAME = """diff --git a/old_name.py b/new_name.py
|
||||
rename from old_name.py
|
||||
rename to new_name.py
|
||||
--- a/old_name.py
|
||||
+++ b/new_name.py
|
||||
@@ -1,1 +1,1 @@
|
||||
-old content
|
||||
+new content
|
||||
"""
|
||||
|
||||
SAMPLE_MULTI = """diff --git a/a.py b/a.py
|
||||
--- a/a.py
|
||||
+++ b/a.py
|
||||
@@ -1,1 +1,2 @@
|
||||
existing
|
||||
+added line
|
||||
diff --git b/b.py b/b.py
|
||||
new file mode 100644
|
||||
--- /dev/null
|
||||
+++ b/b.py
|
||||
@@ -0,0 +1,1 @@
|
||||
+new file
|
||||
"""
|
||||
|
||||
SAMPLE_BINARY = """diff --git a/img.png b/img.png
|
||||
Binary files a/img.png and b/img.png differ
|
||||
"""
|
||||
|
||||
|
||||
def test_empty():
|
||||
a = DiffAnalyzer()
|
||||
s = a.analyze("")
|
||||
assert s.total_files_changed == 0
|
||||
print("PASS: test_empty")
|
||||
|
||||
def test_addition():
|
||||
a = DiffAnalyzer()
|
||||
s = a.analyze(SAMPLE_ADD)
|
||||
assert s.total_files_changed == 1
|
||||
assert s.total_added == 3
|
||||
assert s.total_deleted == 0
|
||||
assert s.new_files == 1
|
||||
assert s.files[0].hunks[0].category == ChangeCategory.ADDED
|
||||
print("PASS: test_addition")
|
||||
|
||||
def test_deletion():
|
||||
a = DiffAnalyzer()
|
||||
s = a.analyze(SAMPLE_DELETE)
|
||||
assert s.total_deleted == 2
|
||||
assert s.deleted_files == 1
|
||||
assert s.files[0].hunks[0].category == ChangeCategory.DELETED
|
||||
print("PASS: test_deletion")
|
||||
|
||||
def test_modification():
|
||||
a = DiffAnalyzer()
|
||||
s = a.analyze(SAMPLE_MODIFY)
|
||||
assert s.total_added == 2
|
||||
assert s.total_deleted == 1
|
||||
assert s.files[0].hunks[0].category == ChangeCategory.MODIFIED
|
||||
print("PASS: test_modification")
|
||||
|
||||
def test_rename():
|
||||
a = DiffAnalyzer()
|
||||
s = a.analyze(SAMPLE_RENAME)
|
||||
assert s.renamed_files == 1
|
||||
assert s.files[0].old_path == "old_name.py"
|
||||
assert s.files[0].path == "new_name.py"
|
||||
assert s.files[0].is_renamed == True
|
||||
print("PASS: test_rename")
|
||||
|
||||
def test_multiple_files():
|
||||
a = DiffAnalyzer()
|
||||
s = a.analyze(SAMPLE_MULTI)
|
||||
assert s.total_files_changed == 2
|
||||
assert s.new_files == 1
|
||||
print("PASS: test_multiple_files")
|
||||
|
||||
def test_binary():
|
||||
a = DiffAnalyzer()
|
||||
s = a.analyze(SAMPLE_BINARY)
|
||||
assert s.binary_files == 1
|
||||
assert s.files[0].is_binary == True
|
||||
assert len(s.files[0].hunks) == 0
|
||||
print("PASS: test_binary")
|
||||
|
||||
def test_to_dict():
|
||||
a = DiffAnalyzer()
|
||||
s = a.analyze(SAMPLE_MODIFY)
|
||||
d = s.to_dict()
|
||||
assert "total_files_changed" in d
|
||||
assert "files" in d
|
||||
assert isinstance(d["files"], list)
|
||||
print("PASS: test_to_dict")
|
||||
|
||||
def test_context_only():
|
||||
diff = """diff --git a/f.py b/f.py
|
||||
--- a/f.py
|
||||
+++ b/f.py
|
||||
@@ -1,3 +1,3 @@
|
||||
line1
|
||||
-old
|
||||
+new
|
||||
line3
|
||||
"""
|
||||
a = DiffAnalyzer()
|
||||
s = a.analyze(diff)
|
||||
# Has both added and deleted = MODIFIED
|
||||
assert s.files[0].hunks[0].category == ChangeCategory.MODIFIED
|
||||
print("PASS: test_context_only")
|
||||
|
||||
def test_multi_hunk():
|
||||
diff = """diff --git a/f.py b/f.py
|
||||
--- a/f.py
|
||||
+++ b/f.py
|
||||
@@ -1,1 +1,2 @@
|
||||
existing
|
||||
+first addition
|
||||
@@ -10,1 +11,2 @@
|
||||
more
|
||||
+second addition
|
||||
"""
|
||||
a = DiffAnalyzer()
|
||||
s = a.analyze(diff)
|
||||
assert s.total_hunks == 2
|
||||
assert s.total_added == 2
|
||||
print("PASS: test_multi_hunk")
|
||||
|
||||
|
||||
def run_all():
|
||||
test_empty()
|
||||
test_addition()
|
||||
test_deletion()
|
||||
test_modification()
|
||||
test_rename()
|
||||
test_multiple_files()
|
||||
test_binary()
|
||||
test_to_dict()
|
||||
test_context_only()
|
||||
test_multi_hunk()
|
||||
print("\nAll 10 tests passed!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run_all()
|
||||
109
scripts/test_gitea_issue_parser.py
Normal file
109
scripts/test_gitea_issue_parser.py
Normal file
@@ -0,0 +1,109 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Tests for scripts/gitea_issue_parser.py"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
sys.path.insert(0, os.path.dirname(__file__) or ".")
|
||||
|
||||
# Import from sibling
|
||||
import importlib.util
|
||||
spec = importlib.util.spec_from_file_location("parser", os.path.join(os.path.dirname(__file__) or ".", "gitea_issue_parser.py"))
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(mod)
|
||||
parse_issue_body = mod.parse_issue_body
|
||||
|
||||
|
||||
def test_basic_parsing():
|
||||
body = """## Context
|
||||
|
||||
This is the background info.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] First criterion
|
||||
- [x] Second criterion (done)
|
||||
|
||||
## What to build
|
||||
|
||||
Some description.
|
||||
"""
|
||||
result = parse_issue_body(body, title="Test (#42)", labels=["bug"])
|
||||
assert result["title"] == "Test (#42)"
|
||||
assert result["labels"] == ["bug"]
|
||||
assert result["epic_ref"] == 42
|
||||
assert len(result["criteria"]) == 2
|
||||
assert result["criteria"][0]["text"] == "First criterion"
|
||||
assert result["criteria"][0]["checked"] == False
|
||||
assert result["criteria"][1]["checked"] == True
|
||||
assert "context" in result["sections"]
|
||||
print("PASS: test_basic_parsing")
|
||||
|
||||
|
||||
def test_numbered_criteria():
|
||||
body = """## Acceptance Criteria
|
||||
|
||||
1. First item
|
||||
2. Second item
|
||||
3. Third item
|
||||
"""
|
||||
result = parse_issue_body(body)
|
||||
assert len(result["criteria"]) == 3
|
||||
assert result["criteria"][0]["text"] == "First item"
|
||||
print("PASS: test_numbered_criteria")
|
||||
|
||||
|
||||
def test_epic_ref_from_body():
|
||||
body = "Closes #123\n\nSome description."
|
||||
result = parse_issue_body(body)
|
||||
assert result["epic_ref"] == 123
|
||||
print("PASS: test_epic_ref_from_body")
|
||||
|
||||
|
||||
def test_empty_body():
|
||||
result = parse_issue_body("")
|
||||
assert result["criteria"] == []
|
||||
assert result["context"] == ""
|
||||
assert result["sections"] == {}
|
||||
print("PASS: test_empty_body")
|
||||
|
||||
|
||||
def test_no_sections():
|
||||
body = "Just a plain issue body with no headings."
|
||||
result = parse_issue_body(body)
|
||||
assert result["context"] == "Just a plain issue body with no headings."
|
||||
print("PASS: test_no_sections")
|
||||
|
||||
|
||||
def test_multiple_sections():
|
||||
body = """## Problem
|
||||
|
||||
Something is broken.
|
||||
|
||||
## Fix
|
||||
|
||||
Do this instead.
|
||||
|
||||
## Notes
|
||||
|
||||
Additional info.
|
||||
"""
|
||||
result = parse_issue_body(body)
|
||||
assert "problem" in result["sections"]
|
||||
assert "fix" in result["sections"]
|
||||
assert "notes" in result["sections"]
|
||||
assert "Something is broken" in result["sections"]["problem"]
|
||||
print("PASS: test_multiple_sections")
|
||||
|
||||
|
||||
def run_all():
|
||||
test_basic_parsing()
|
||||
test_numbered_criteria()
|
||||
test_epic_ref_from_body()
|
||||
test_empty_body()
|
||||
test_no_sections()
|
||||
test_multiple_sections()
|
||||
print("\nAll 6 tests passed!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run_all()
|
||||
@@ -1,41 +1,129 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Test script for knowledge extraction prompt.
|
||||
Validates that the prompt produces consistent, structured output.
|
||||
Test harness for knowledge extraction prompt.
|
||||
Validates output structure, content quality, and hallucination resistance.
|
||||
|
||||
Usage:
|
||||
python3 scripts/test_harvest_prompt.py # Run all tests
|
||||
python3 scripts/test_harvest_prompt.py --transcript FILE # Test against a real transcript
|
||||
python3 scripts/test_harvest_prompt.py --validate FILE # Validate an existing extraction JSON
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
|
||||
def validate_knowledge_item(item):
|
||||
"""Validate a single knowledge item."""
|
||||
required_fields = ["fact", "category", "repo", "confidence"]
|
||||
for field in required_fields:
|
||||
if field not in item:
|
||||
return False, f"Missing field: {field}"
|
||||
|
||||
if not isinstance(item["fact"], str) or len(item["fact"].strip()) == 0:
|
||||
return False, "Fact must be a non-empty string"
|
||||
|
||||
valid_categories = ["fact", "pitfall", "pattern", "tool-quirk", "question"]
|
||||
if item["category"] not in valid_categories:
|
||||
return False, f"Invalid category: {item['category']}"
|
||||
|
||||
if not isinstance(item["repo"], str):
|
||||
return False, "Repo must be a string"
|
||||
|
||||
if not isinstance(item["confidence"], (int, float)):
|
||||
return False, "Confidence must be a number"
|
||||
|
||||
if not (0.0 <= item["confidence"] <= 1.0):
|
||||
return False, "Confidence must be between 0.0 and 1.0"
|
||||
|
||||
return True, "Valid"
|
||||
VALID_CATEGORIES = {"fact", "pitfall", "pattern", "tool-quirk", "question"}
|
||||
REQUIRED_FIELDS = {"fact", "category", "repo", "confidence", "evidence"}
|
||||
REQUIRED_META = {"session_outcome", "tools_used", "repos_touched", "error_count", "knowledge_count"}
|
||||
|
||||
def test_sample_transcript():
|
||||
"""Test with a sample transcript."""
|
||||
sample_transcript = """
|
||||
|
||||
def validate_knowledge_item(item, idx):
|
||||
"""Validate a single knowledge item. Returns list of errors."""
|
||||
errors = []
|
||||
if not isinstance(item, dict):
|
||||
return [f"Item {idx}: not a dict"]
|
||||
for field in REQUIRED_FIELDS:
|
||||
if field not in item:
|
||||
errors.append(f"Item {idx}: missing field '{field}'")
|
||||
if not isinstance(item.get("fact", ""), str) or len(item.get("fact", "").strip()) == 0:
|
||||
errors.append(f"Item {idx}: fact must be a non-empty string")
|
||||
if item.get("category") not in VALID_CATEGORIES:
|
||||
errors.append(f"Item {idx}: invalid category '{item.get('category')}'")
|
||||
if not isinstance(item.get("repo", ""), str) or len(item.get("repo", "").strip()) == 0:
|
||||
errors.append(f"Item {idx}: repo must be a non-empty string")
|
||||
conf = item.get("confidence")
|
||||
if not isinstance(conf, (int, float)) or not (0.0 <= conf <= 1.0):
|
||||
errors.append(f"Item {idx}: confidence must be a number 0.0-1.0, got {conf}")
|
||||
if not isinstance(item.get("evidence", ""), str) or len(item.get("evidence", "").strip()) == 0:
|
||||
errors.append(f"Item {idx}: evidence must be a non-empty string (hallucination check)")
|
||||
return errors
|
||||
|
||||
|
||||
def validate_extraction(data):
|
||||
"""Validate a full extraction result. Returns (is_valid, errors, warnings)."""
|
||||
errors = []
|
||||
warnings = []
|
||||
|
||||
if not isinstance(data, dict):
|
||||
return False, ["Root is not a JSON object"], []
|
||||
|
||||
if "knowledge" not in data:
|
||||
return False, ["Missing 'knowledge' array"], []
|
||||
|
||||
if not isinstance(data["knowledge"], list):
|
||||
return False, ["'knowledge' is not an array"], []
|
||||
|
||||
for i, item in enumerate(data["knowledge"]):
|
||||
errors.extend(validate_knowledge_item(item, i))
|
||||
|
||||
# Meta block validation
|
||||
if "meta" not in data:
|
||||
warnings.append("Missing 'meta' block (session_outcome, tools_used, etc.)")
|
||||
else:
|
||||
meta = data["meta"]
|
||||
for field in REQUIRED_META:
|
||||
if field not in meta:
|
||||
warnings.append(f"Meta missing field '{field}'")
|
||||
|
||||
# Quality checks
|
||||
facts = data["knowledge"]
|
||||
if len(facts) == 0:
|
||||
warnings.append("No knowledge extracted (empty session or extraction failure)")
|
||||
|
||||
# Check for near-duplicate facts
|
||||
seen_facts = set()
|
||||
for item in facts:
|
||||
normalized = item.get("fact", "").lower().strip()[:80]
|
||||
if normalized in seen_facts:
|
||||
warnings.append(f"Duplicate fact detected: '{normalized[:50]}...'")
|
||||
seen_facts.add(normalized)
|
||||
|
||||
# Check confidence distribution
|
||||
confidences = [item.get("confidence", 0) for item in facts]
|
||||
if confidences:
|
||||
avg_conf = sum(confidences) / len(confidences)
|
||||
if avg_conf > 0.9:
|
||||
warnings.append(f"Average confidence {avg_conf:.2f} is suspiciously high (possible hallucination)")
|
||||
if avg_conf < 0.4:
|
||||
warnings.append(f"Average confidence {avg_conf:.2f} is very low (extraction may be too cautious)")
|
||||
|
||||
return len(errors) == 0, errors, warnings
|
||||
|
||||
|
||||
def validate_transcript_coverage(data, transcript):
|
||||
"""
|
||||
Check that extracted facts are actually supported by the transcript.
|
||||
This is a heuristic — checks that key terms from facts appear in transcript.
|
||||
Returns list of potential hallucinations.
|
||||
"""
|
||||
hallucinations = []
|
||||
transcript_lower = transcript.lower()
|
||||
|
||||
for item in data.get("knowledge", []):
|
||||
fact = item.get("fact", "")
|
||||
evidence = item.get("evidence", "")
|
||||
|
||||
# Check if evidence string appears in transcript
|
||||
if evidence and evidence.lower() not in transcript_lower:
|
||||
# Partial match — check if key terms are present
|
||||
evidence_words = set(evidence.lower().split())
|
||||
transcript_words = set(transcript_lower.split())
|
||||
overlap = evidence_words & transcript_words
|
||||
if len(overlap) < min(3, len(evidence_words) * 0.3):
|
||||
hallucinations.append({
|
||||
"fact": fact[:80],
|
||||
"evidence": evidence[:80],
|
||||
"issue": "Evidence not found in transcript — possible hallucination"
|
||||
})
|
||||
|
||||
return hallucinations
|
||||
|
||||
|
||||
# --- Test transcripts ---
|
||||
|
||||
SAMPLE_TRANSCRIPT_SUCCESS = """
|
||||
User: Clone the repo at forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence
|
||||
Agent: Trying git clone... Error: 401 Unauthorized
|
||||
Agent: Checking for token... Found in ~/.config/gitea/token
|
||||
@@ -44,30 +132,253 @@ Agent: Clone successful with token authentication
|
||||
Agent: Now examining the repo structure...
|
||||
Agent: Found knowledge/index.json with 0 facts
|
||||
Agent: Creating harvest-prompt.md in templates/
|
||||
Agent: Done. Files created: templates/harvest-prompt.md, scripts/test_harvest_prompt.py
|
||||
"""
|
||||
|
||||
# This would be replaced with actual prompt execution
|
||||
print("Sample transcript processed")
|
||||
print("Expected categories: fact, pitfall, pattern, tool-quirk, question")
|
||||
return True
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("Testing knowledge extraction prompt...")
|
||||
|
||||
# Test 1: Validate prompt file exists
|
||||
SAMPLE_TRANSCRIPT_FAILURE = """
|
||||
User: Deploy the cron jobs to production
|
||||
Agent: Connecting to VPS... SSH timeout after 30s
|
||||
Agent: Retrying with verbose output... Connection refused on port 22
|
||||
Agent: Checking firewall rules... iptables shows port 22 is open
|
||||
Agent: Checking sshd config... sshd is not running
|
||||
Agent: Attempting to start sshd... systemctl start sshd failed: unit not found
|
||||
Agent: ERROR: OpenSSH server not installed on this VPS
|
||||
Agent: Cannot proceed without SSH access. Deployment blocked.
|
||||
"""
|
||||
|
||||
SAMPLE_TRANSCRIPT_API_QUIRK = """
|
||||
User: List all open issues in hermes-agent
|
||||
Agent: Querying Gitea API... endpoint: /repos/Timmy_Foundation/hermes-agent/issues?state=open&limit=50
|
||||
Agent: Response: 0 items. That's odd.
|
||||
Agent: Trying without state filter... endpoint: /repos/Timmy_Foundation/hermes-agent/issues?limit=50
|
||||
Agent: Response: 50 items returned, but all have pull_request field. These are PRs, not issues.
|
||||
Agent: Gitea issues API returns PRs alongside issues. Must filter by presence of pull_request key.
|
||||
Agent: Also discovered: state=open parameter returns empty when used with limit=50, but works with limit=5.
|
||||
Agent: This appears to be a Gitea API quirk — pagination interacts with state filter unexpectedly.
|
||||
Agent: Workaround: fetch without state filter, then filter client-side for state=='open' and 'pull_request' not in item.
|
||||
"""
|
||||
|
||||
|
||||
def run_tests():
|
||||
"""Run the built-in test suite."""
|
||||
tests_passed = 0
|
||||
tests_failed = 0
|
||||
|
||||
print("=" * 60)
|
||||
print("KNOWLEDGE EXTRACTION PROMPT — TEST SUITE")
|
||||
print("=" * 60)
|
||||
|
||||
# Test 1: Prompt file exists and is under 2k tokens (~8k chars)
|
||||
print("\n[Test 1] Prompt file size constraint")
|
||||
prompt_path = Path("templates/harvest-prompt.md")
|
||||
if not prompt_path.exists():
|
||||
print("ERROR: harvest-prompt.md not found")
|
||||
sys.exit(1)
|
||||
|
||||
print(f"OK: Prompt file exists: {prompt_path}")
|
||||
|
||||
# Test 2: Check prompt size
|
||||
prompt_size = prompt_path.stat().st_size
|
||||
print(f"OK: Prompt size: {prompt_size} bytes")
|
||||
|
||||
# Test 3: Test sample transcript processing
|
||||
if test_sample_transcript():
|
||||
print("OK: Sample transcript test passed")
|
||||
|
||||
print("\nAll tests passed!")
|
||||
print(" FAIL: harvest-prompt.md not found")
|
||||
tests_failed += 1
|
||||
else:
|
||||
size = prompt_path.stat().st_size
|
||||
# Rough token estimate: ~4 chars per token
|
||||
est_tokens = size / 4
|
||||
print(f" Prompt size: {size} bytes (~{est_tokens:.0f} tokens)")
|
||||
if est_tokens > 2000:
|
||||
print(f" WARN: Prompt exceeds ~1500 tokens (target: ~1000)")
|
||||
else:
|
||||
print(f" PASS: Within token budget")
|
||||
tests_passed += 1
|
||||
|
||||
# Test 2: Validate a well-formed extraction
|
||||
print("\n[Test 2] Valid extraction passes validation")
|
||||
valid_extraction = {
|
||||
"knowledge": [
|
||||
{
|
||||
"fact": "Gitea auth token is at ~/.config/gitea/token",
|
||||
"category": "tool-quirk",
|
||||
"repo": "global",
|
||||
"confidence": 0.9,
|
||||
"evidence": "Found in ~/.config/gitea/token"
|
||||
},
|
||||
{
|
||||
"fact": "Clone fails with 401 when no token is provided",
|
||||
"category": "pitfall",
|
||||
"repo": "compounding-intelligence",
|
||||
"confidence": 0.9,
|
||||
"evidence": "Error: 401 Unauthorized"
|
||||
}
|
||||
],
|
||||
"meta": {
|
||||
"session_outcome": "success",
|
||||
"tools_used": ["git"],
|
||||
"repos_touched": ["compounding-intelligence"],
|
||||
"error_count": 1,
|
||||
"knowledge_count": 2
|
||||
}
|
||||
}
|
||||
is_valid, errors, warnings = validate_extraction(valid_extraction)
|
||||
if is_valid:
|
||||
print(f" PASS: Valid extraction accepted ({len(warnings)} warnings)")
|
||||
tests_passed += 1
|
||||
else:
|
||||
print(f" FAIL: Valid extraction rejected: {errors}")
|
||||
tests_failed += 1
|
||||
|
||||
# Test 3: Reject missing fields
|
||||
print("\n[Test 3] Missing fields are rejected")
|
||||
bad_extraction = {
|
||||
"knowledge": [
|
||||
{"fact": "Something learned", "category": "fact"} # Missing repo, confidence, evidence
|
||||
]
|
||||
}
|
||||
is_valid, errors, warnings = validate_extraction(bad_extraction)
|
||||
if not is_valid:
|
||||
print(f" PASS: Rejected with {len(errors)} errors")
|
||||
tests_passed += 1
|
||||
else:
|
||||
print(f" FAIL: Should have rejected missing fields")
|
||||
tests_failed += 1
|
||||
|
||||
# Test 4: Reject invalid category
|
||||
print("\n[Test 4] Invalid category is rejected")
|
||||
bad_cat = {
|
||||
"knowledge": [
|
||||
{"fact": "Test", "category": "discovery", "repo": "x", "confidence": 0.8, "evidence": "test"}
|
||||
]
|
||||
}
|
||||
is_valid, errors, warnings = validate_extraction(bad_cat)
|
||||
if not is_valid and any("category" in e for e in errors):
|
||||
print(f" PASS: Invalid category 'discovery' rejected")
|
||||
tests_passed += 1
|
||||
else:
|
||||
print(f" FAIL: Should have rejected invalid category")
|
||||
tests_failed += 1
|
||||
|
||||
# Test 5: Detect near-duplicates
|
||||
print("\n[Test 5] Duplicate detection")
|
||||
dup_extraction = {
|
||||
"knowledge": [
|
||||
{"fact": "Token is at ~/.config/gitea/token", "category": "fact", "repo": "x", "confidence": 0.9, "evidence": "a"},
|
||||
{"fact": "Token is at ~/.config/gitea/token", "category": "fact", "repo": "x", "confidence": 0.9, "evidence": "b"}
|
||||
],
|
||||
"meta": {"session_outcome": "success", "tools_used": [], "repos_touched": [], "error_count": 0, "knowledge_count": 2}
|
||||
}
|
||||
is_valid, errors, warnings = validate_extraction(dup_extraction)
|
||||
if any("Duplicate" in w for w in warnings):
|
||||
print(f" PASS: Duplicate detected")
|
||||
tests_passed += 1
|
||||
else:
|
||||
print(f" FAIL: Should have detected duplicate")
|
||||
tests_failed += 1
|
||||
|
||||
# Test 6: Hallucination check against transcript
|
||||
print("\n[Test 6] Hallucination detection")
|
||||
hallucinated = {
|
||||
"knowledge": [
|
||||
{
|
||||
"fact": "Database port is 5433",
|
||||
"category": "fact",
|
||||
"repo": "x",
|
||||
"confidence": 0.9,
|
||||
"evidence": "PostgreSQL listening on port 5433"
|
||||
}
|
||||
],
|
||||
"meta": {"session_outcome": "success", "tools_used": [], "repos_touched": [], "error_count": 0, "knowledge_count": 1}
|
||||
}
|
||||
hallucinations = validate_transcript_coverage(hallucinated, SAMPLE_TRANSCRIPT_SUCCESS)
|
||||
if hallucinations:
|
||||
print(f" PASS: Hallucination detected ({len(hallucinations)} items)")
|
||||
tests_passed += 1
|
||||
else:
|
||||
print(f" FAIL: Should have detected hallucinated evidence")
|
||||
tests_failed += 1
|
||||
|
||||
# Test 7: Failed session should extract pitfalls
|
||||
print("\n[Test 7] Failed session extraction shape")
|
||||
failed_extraction = {
|
||||
"knowledge": [
|
||||
{
|
||||
"fact": "SSH server not installed on target VPS",
|
||||
"category": "pitfall",
|
||||
"repo": "global",
|
||||
"confidence": 0.9,
|
||||
"evidence": "ERROR: OpenSSH server not installed on this VPS"
|
||||
},
|
||||
{
|
||||
"fact": "VPS blocks deployment without SSH access",
|
||||
"category": "question",
|
||||
"repo": "global",
|
||||
"confidence": 0.7,
|
||||
"evidence": "Cannot proceed without SSH access. Deployment blocked."
|
||||
}
|
||||
],
|
||||
"meta": {
|
||||
"session_outcome": "failed",
|
||||
"tools_used": ["ssh", "systemctl"],
|
||||
"repos_touched": [],
|
||||
"error_count": 3,
|
||||
"knowledge_count": 2
|
||||
}
|
||||
}
|
||||
is_valid, errors, warnings = validate_extraction(failed_extraction)
|
||||
if is_valid:
|
||||
categories = [item["category"] for item in failed_extraction["knowledge"]]
|
||||
if "pitfall" in categories:
|
||||
print(f" PASS: Failed session extracted {len(categories)} items including pitfalls")
|
||||
tests_passed += 1
|
||||
else:
|
||||
print(f" FAIL: Failed session should extract pitfalls")
|
||||
tests_failed += 1
|
||||
else:
|
||||
print(f" FAIL: {errors}")
|
||||
tests_failed += 1
|
||||
|
||||
# Test 8: Empty extraction is warned
|
||||
print("\n[Test 8] Empty extraction warning")
|
||||
empty = {"knowledge": [], "meta": {"session_outcome": "success", "tools_used": [], "repos_touched": [], "error_count": 0, "knowledge_count": 0}}
|
||||
is_valid, errors, warnings = validate_extraction(empty)
|
||||
if any("No knowledge" in w for w in warnings):
|
||||
print(f" PASS: Empty extraction warned")
|
||||
tests_passed += 1
|
||||
else:
|
||||
print(f" FAIL: Should warn on empty extraction")
|
||||
tests_failed += 1
|
||||
|
||||
# Summary
|
||||
print(f"\n{'=' * 60}")
|
||||
print(f"Results: {tests_passed} passed, {tests_failed} failed")
|
||||
print(f"{'=' * 60}")
|
||||
return tests_failed == 0
|
||||
|
||||
|
||||
def validate_file(filepath):
|
||||
"""Validate an existing extraction JSON file."""
|
||||
path = Path(filepath)
|
||||
if not path.exists():
|
||||
print(f"ERROR: {filepath} not found")
|
||||
return False
|
||||
|
||||
data = json.loads(path.read_text())
|
||||
is_valid, errors, warnings = validate_extraction(data)
|
||||
|
||||
print(f"Validation of {filepath}:")
|
||||
print(f" Knowledge items: {len(data.get('knowledge', []))}")
|
||||
print(f" Errors: {len(errors)}")
|
||||
print(f" Warnings: {len(warnings)}")
|
||||
|
||||
for e in errors:
|
||||
print(f" ERROR: {e}")
|
||||
for w in warnings:
|
||||
print(f" WARN: {w}")
|
||||
|
||||
return is_valid
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Test knowledge extraction prompt")
|
||||
parser.add_argument("--validate", help="Validate an existing extraction JSON file")
|
||||
parser.add_argument("--transcript", help="Test against a real transcript file (informational)")
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.validate:
|
||||
success = validate_file(args.validate)
|
||||
sys.exit(0 if success else 1)
|
||||
else:
|
||||
success = run_tests()
|
||||
sys.exit(0 if success else 1)
|
||||
|
||||
72
scripts/test_harvest_prompt_comprehensive.py
Normal file
72
scripts/test_harvest_prompt_comprehensive.py
Normal file
@@ -0,0 +1,72 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Comprehensive tests for knowledge extraction prompt."""
|
||||
import json, re
|
||||
from pathlib import Path
|
||||
|
||||
def check_prompt_structure():
|
||||
p = Path("templates/harvest-prompt.md")
|
||||
if not p.exists(): return False, "harvest-prompt.md not found"
|
||||
c = p.read_text()
|
||||
for s in ["System Prompt","Instructions","Categories","Output Format","Confidence Scoring","Constraints","Example"]:
|
||||
if s.lower() not in c.lower(): return False, f"Missing section: {s}"
|
||||
for cat in ["fact","pitfall","pattern","tool-quirk","question"]:
|
||||
if cat not in c: return False, f"Missing category: {cat}"
|
||||
if len(c) > 5000: return False, f"Too large: {len(c)}"
|
||||
if len(c) < 1000: return False, f"Too small: {len(c)}"
|
||||
return True, "Prompt structure is valid"
|
||||
|
||||
def check_confidence_scoring():
|
||||
c = Path("templates/harvest-prompt.md").read_text()
|
||||
for l in ["0.9-1.0","0.7-0.8","0.5-0.6","0.3-0.4","0.1-0.2"]:
|
||||
if l not in c: return False, f"Missing level: {l}"
|
||||
return True, "Confidence scoring defined"
|
||||
|
||||
def check_example_quality():
|
||||
c = Path("templates/harvest-prompt.md").read_text()
|
||||
if "example" not in c.lower(): return False, "No examples"
|
||||
m = re.search(r'"knowledge"', c[c.lower().find("example"):])
|
||||
if not m: return False, "No JSON example"
|
||||
return True, "Examples present"
|
||||
|
||||
def check_constraint_coverage():
|
||||
c = Path("templates/harvest-prompt.md").read_text()
|
||||
for x in ["no hallucination","explicitly","partial","failed sessions"]:
|
||||
if x not in c.lower(): return False, f"Missing: {x}"
|
||||
return True, "Constraints covered"
|
||||
|
||||
def check_test_sessions():
|
||||
d = Path("test_sessions")
|
||||
if not d.exists(): return False, "test_sessions/ not found"
|
||||
files = list(d.glob("*.jsonl"))
|
||||
if len(files) < 5: return False, f"Only {len(files)} sessions"
|
||||
for f in files:
|
||||
for i, line in enumerate(f.read_text().strip().split("\n"), 1):
|
||||
try: json.loads(line)
|
||||
except json.JSONDecodeError as e: return False, f"{f.name}:{i}: {e}"
|
||||
return True, f"{len(files)} valid sessions"
|
||||
|
||||
def test_prompt_structure():
|
||||
passed, msg = check_prompt_structure()
|
||||
assert passed, msg
|
||||
|
||||
def test_confidence_scoring():
|
||||
passed, msg = check_confidence_scoring()
|
||||
assert passed, msg
|
||||
|
||||
def test_example_quality():
|
||||
passed, msg = check_example_quality()
|
||||
assert passed, msg
|
||||
|
||||
def test_constraint_coverage():
|
||||
passed, msg = check_constraint_coverage()
|
||||
assert passed, msg
|
||||
|
||||
def test_test_sessions():
|
||||
passed, msg = check_test_sessions()
|
||||
assert passed, msg
|
||||
|
||||
if __name__ == "__main__":
|
||||
checks = [check_prompt_structure, check_confidence_scoring, check_example_quality, check_constraint_coverage, check_test_sessions]
|
||||
for fn in checks:
|
||||
ok, msg = fn()
|
||||
print(f"{'PASS' if ok else 'FAIL'}: {fn.__name__} -- {msg}")
|
||||
162
scripts/test_harvester_pipeline.py
Normal file
162
scripts/test_harvester_pipeline.py
Normal file
@@ -0,0 +1,162 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Smoke test for harvester pipeline — verifies the full chain:
|
||||
session_reader -> prompt -> LLM (mocked) -> validate -> deduplicate -> store
|
||||
|
||||
Does NOT call the real LLM. Tests plumbing only.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import tempfile
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
# Setup path
|
||||
SCRIPT_DIR = Path(__file__).parent.absolute()
|
||||
sys.path.insert(0, str(SCRIPT_DIR))
|
||||
|
||||
from session_reader import read_session, extract_conversation, truncate_for_context, messages_to_text
|
||||
from harvester import validate_fact, deduplicate, load_existing_knowledge, fact_fingerprint
|
||||
|
||||
|
||||
def test_session_reader():
|
||||
"""Test that session_reader parses JSONL correctly."""
|
||||
with tempfile.NamedTemporaryFile(mode='w', suffix='.jsonl', delete=False) as f:
|
||||
f.write('{"role": "user", "content": "Hello", "timestamp": "2026-04-13T10:00:00Z"}\n')
|
||||
f.write('{"role": "assistant", "content": "Hi there", "timestamp": "2026-04-13T10:00:01Z"}\n')
|
||||
f.write('{"role": "user", "content": "Clone the repo", "timestamp": "2026-04-13T10:00:02Z"}\n')
|
||||
f.write('{"role": "assistant", "content": "Cloned successfully", "timestamp": "2026-04-13T10:00:05Z"}\n')
|
||||
path = f.name
|
||||
|
||||
messages = read_session(path)
|
||||
assert len(messages) == 4, f"Expected 4 messages, got {len(messages)}"
|
||||
|
||||
conv = extract_conversation(messages)
|
||||
assert len(conv) == 4, f"Expected 4 conversation turns, got {len(conv)}"
|
||||
|
||||
text = messages_to_text(conv)
|
||||
assert "USER: Hello" in text
|
||||
assert "ASSISTANT: Hi there" in text
|
||||
|
||||
truncated = truncate_for_context(conv, head=2, tail=2)
|
||||
assert len(truncated) == 4 # 4 <= head+tail, so no truncation
|
||||
|
||||
os.unlink(path)
|
||||
print(" [PASS] session_reader pipeline works")
|
||||
|
||||
|
||||
def test_validate_fact():
|
||||
"""Test fact validation."""
|
||||
good = {"fact": "Gitea token is at ~/.config/gitea/token", "category": "tool-quirk", "repo": "global", "confidence": 0.9}
|
||||
assert validate_fact(good), "Valid fact should pass"
|
||||
|
||||
bad_missing = {"fact": "Something", "category": "fact"}
|
||||
assert not validate_fact(bad_missing), "Missing fields should fail"
|
||||
|
||||
bad_category = {"fact": "Something", "category": "nonsense", "repo": "x", "confidence": 0.5}
|
||||
assert not validate_fact(bad_category), "Bad category should fail"
|
||||
|
||||
bad_conf = {"fact": "Something", "category": "fact", "repo": "x", "confidence": 1.5}
|
||||
assert not validate_fact(bad_conf), "Confidence > 1.0 should fail"
|
||||
|
||||
print(" [PASS] fact validation works")
|
||||
|
||||
|
||||
def test_deduplicate():
|
||||
"""Test deduplication."""
|
||||
existing = [
|
||||
{"fact": "Token is at ~/.config/gitea/token", "category": "tool-quirk", "repo": "global", "confidence": 0.9}
|
||||
]
|
||||
new = [
|
||||
{"fact": "Token is at ~/.config/gitea/token", "category": "tool-quirk", "repo": "global", "confidence": 0.9}, # exact dup
|
||||
{"fact": "Deploy uses Ansible on port 22", "category": "pattern", "repo": "fleet", "confidence": 0.8}, # unique
|
||||
]
|
||||
result = deduplicate(new, existing)
|
||||
assert len(result) == 1, f"Expected 1 unique, got {len(result)}"
|
||||
assert result[0]["fact"] == "Deploy uses Ansible on port 22"
|
||||
print(" [PASS] deduplication works")
|
||||
|
||||
|
||||
def test_knowledge_store_roundtrip():
|
||||
"""Test loading and writing knowledge index."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Load empty index
|
||||
index = load_existing_knowledge(tmpdir)
|
||||
assert index["total_facts"] == 0
|
||||
|
||||
# Write a fact
|
||||
new_facts = [{"fact": "Test fact", "category": "fact", "repo": "test", "confidence": 0.9}]
|
||||
|
||||
# Use harvester's write function
|
||||
from harvester import write_knowledge
|
||||
write_knowledge(index, new_facts, tmpdir, source_session="test.jsonl")
|
||||
|
||||
# Reload and verify
|
||||
index2 = load_existing_knowledge(tmpdir)
|
||||
assert index2["total_facts"] == 1
|
||||
assert index2["facts"][0]["fact"] == "Test fact"
|
||||
assert index2["facts"][0]["source_session"] == "test.jsonl"
|
||||
|
||||
# Check markdown was written
|
||||
md_path = Path(tmpdir) / "repos" / "test.md"
|
||||
assert md_path.exists(), "Markdown file should be created"
|
||||
|
||||
print(" [PASS] knowledge store roundtrip works")
|
||||
|
||||
|
||||
def test_full_chain_no_llm():
|
||||
"""Test the full pipeline minus the LLM call."""
|
||||
with tempfile.NamedTemporaryFile(mode='w', suffix='.jsonl', delete=False) as f:
|
||||
f.write('{"role": "user", "content": "Clone compounding-intelligence", "timestamp": "2026-04-13T10:00:00Z"}\n')
|
||||
f.write('{"role": "assistant", "content": "Cloned successfully", "timestamp": "2026-04-13T10:00:05Z"}\n')
|
||||
session_path = f.name
|
||||
|
||||
with tempfile.TemporaryDirectory() as knowledge_dir:
|
||||
# Step 1: Read
|
||||
messages = read_session(session_path)
|
||||
assert len(messages) == 2
|
||||
|
||||
# Step 2: Extract conversation
|
||||
conv = extract_conversation(messages)
|
||||
assert len(conv) == 2
|
||||
|
||||
# Step 3: Truncate
|
||||
truncated = truncate_for_context(conv, head=50, tail=50)
|
||||
|
||||
# Step 4: Convert to text (this goes to the LLM)
|
||||
transcript = messages_to_text(truncated)
|
||||
assert "Clone compounding-intelligence" in transcript
|
||||
|
||||
# Step 5-7: Would be LLM call, validate, deduplicate
|
||||
# We simulate LLM output here
|
||||
mock_facts = [
|
||||
{"fact": "compounding-intelligence repo was cloned", "category": "fact", "repo": "compounding-intelligence", "confidence": 0.9}
|
||||
]
|
||||
valid = [f for f in mock_facts if validate_fact(f)]
|
||||
|
||||
# Step 6: Deduplicate
|
||||
index = load_existing_knowledge(knowledge_dir)
|
||||
new_facts = deduplicate(valid, index.get("facts", []))
|
||||
assert len(new_facts) == 1
|
||||
|
||||
# Step 7: Store
|
||||
from harvester import write_knowledge
|
||||
write_knowledge(index, new_facts, knowledge_dir, source_session=session_path)
|
||||
|
||||
# Verify
|
||||
index2 = load_existing_knowledge(knowledge_dir)
|
||||
assert index2["total_facts"] == 1
|
||||
|
||||
os.unlink(session_path)
|
||||
print(" [PASS] full chain (reader -> validate -> dedup -> store) works")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("Running harvester pipeline smoke tests...")
|
||||
test_session_reader()
|
||||
test_validate_fact()
|
||||
test_deduplicate()
|
||||
test_knowledge_store_roundtrip()
|
||||
test_full_chain_no_llm()
|
||||
print("\nAll tests passed.")
|
||||
282
scripts/test_improvement_proposals.py
Normal file
282
scripts/test_improvement_proposals.py
Normal file
@@ -0,0 +1,282 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Tests for scripts/improvement_proposals.py — 15 tests."""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import tempfile
|
||||
|
||||
sys.path.insert(0, os.path.dirname(__file__) or ".")
|
||||
import importlib.util
|
||||
|
||||
spec = importlib.util.spec_from_file_location(
|
||||
"ip", os.path.join(os.path.dirname(__file__) or ".", "improvement_proposals.py")
|
||||
)
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(mod)
|
||||
|
||||
analyze_sessions = mod.analyze_sessions
|
||||
generate_proposals = mod.generate_proposals
|
||||
format_proposals_markdown = mod.format_proposals_markdown
|
||||
format_proposals_json = mod.format_proposals_json
|
||||
_normalize_error = mod._normalize_error
|
||||
|
||||
|
||||
# ── Helper to build test sessions ─────────────────────────────
|
||||
|
||||
def _make_session(session_id, repo="test-repo", errors=None, tool_calls=None, duration=30):
|
||||
return {
|
||||
"session_id": session_id,
|
||||
"repo": repo,
|
||||
"errors": [{"message": e} for e in (errors or [])],
|
||||
"tool_calls": tool_calls or [],
|
||||
"duration_minutes": duration,
|
||||
"messages": [],
|
||||
}
|
||||
|
||||
|
||||
def _make_tool_calls(repeats):
|
||||
"""Create tool call list with repeated tools."""
|
||||
calls = []
|
||||
for tool, count in repeats:
|
||||
for _ in range(count):
|
||||
calls.append({"tool": tool, "latency_ms": 100})
|
||||
return calls
|
||||
|
||||
|
||||
# ── Tests ─────────────────────────────────────────────────────
|
||||
|
||||
def test_empty_sessions():
|
||||
patterns = analyze_sessions([])
|
||||
assert patterns == []
|
||||
print("PASS: test_empty_sessions")
|
||||
|
||||
|
||||
def test_no_patterns_on_clean_sessions():
|
||||
sessions = [
|
||||
_make_session("s1", tool_calls=[{"tool": "read_file", "latency_ms": 50}]),
|
||||
_make_session("s2", tool_calls=[{"tool": "write_file", "latency_ms": 80}]),
|
||||
]
|
||||
patterns = analyze_sessions(sessions)
|
||||
# No repeated errors, no slow tools, no retries
|
||||
assert len(patterns) == 0
|
||||
print("PASS: test_no_patterns_on_clean_sessions")
|
||||
|
||||
|
||||
def test_repeated_error_detection():
|
||||
"""Same error across 3+ sessions triggers pattern."""
|
||||
sessions = [
|
||||
_make_session(f"s{i}", errors=["ModuleNotFoundError: No module named bannerlord_trace"])
|
||||
for i in range(4)
|
||||
]
|
||||
patterns = analyze_sessions(sessions)
|
||||
repeated = [p for p in patterns if p.pattern_type == "repeated_error"]
|
||||
assert len(repeated) == 1
|
||||
assert repeated[0].occurrences == 4
|
||||
print("PASS: test_repeated_error_detection")
|
||||
|
||||
|
||||
def test_repeated_error_threshold():
|
||||
"""2 occurrences should NOT trigger (threshold is 3)."""
|
||||
sessions = [
|
||||
_make_session("s1", errors=["TimeoutError: connection timed out"]),
|
||||
_make_session("s2", errors=["TimeoutError: connection timed out"]),
|
||||
]
|
||||
patterns = analyze_sessions(sessions)
|
||||
repeated = [p for p in patterns if p.pattern_type == "repeated_error"]
|
||||
assert len(repeated) == 0
|
||||
print("PASS: test_repeated_error_threshold")
|
||||
|
||||
|
||||
def test_slow_tool_detection():
|
||||
"""Tool with avg latency > 5000ms across 5+ calls."""
|
||||
calls = [{"tool": "git_push", "latency_ms": 8000} for _ in range(10)]
|
||||
sessions = [_make_session("s1", tool_calls=calls)]
|
||||
patterns = analyze_sessions(sessions)
|
||||
slow = [p for p in patterns if p.pattern_type == "slow_tool"]
|
||||
assert len(slow) == 1
|
||||
assert "git_push" in slow[0].description
|
||||
print("PASS: test_slow_tool_detection")
|
||||
|
||||
|
||||
def test_fast_tool_not_flagged():
|
||||
"""Tool under 5000ms avg should not trigger."""
|
||||
calls = [{"tool": "read_file", "latency_ms": 50} for _ in range(10)]
|
||||
sessions = [_make_session("s1", tool_calls=calls)]
|
||||
patterns = analyze_sessions(sessions)
|
||||
slow = [p for p in patterns if p.pattern_type == "slow_tool"]
|
||||
assert len(slow) == 0
|
||||
print("PASS: test_fast_tool_not_flagged")
|
||||
|
||||
|
||||
def test_failed_retry_detection():
|
||||
"""3+ consecutive calls to same tool triggers retry pattern."""
|
||||
calls = _make_tool_calls([("execute_code", 5)])
|
||||
sessions = [_make_session("s1", tool_calls=calls)]
|
||||
sessions.extend([
|
||||
_make_session(f"s{i}", tool_calls=_make_tool_calls([("execute_code", 4)]))
|
||||
for i in range(2, 5)
|
||||
])
|
||||
patterns = analyze_sessions(sessions)
|
||||
retries = [p for p in patterns if p.pattern_type == "failed_retry"]
|
||||
assert len(retries) >= 1
|
||||
print("PASS: test_failed_retry_detection")
|
||||
|
||||
|
||||
def test_manual_process_detection():
|
||||
"""10+ tool calls with <= 3 unique tools."""
|
||||
calls = _make_tool_calls([("terminal", 8), ("read_file", 5)])
|
||||
sessions = [_make_session("s1", tool_calls=calls, duration=25)]
|
||||
patterns = analyze_sessions(sessions)
|
||||
manual = [p for p in patterns if p.pattern_type == "manual_process"]
|
||||
assert len(manual) == 1
|
||||
print("PASS: test_manual_process_detection")
|
||||
|
||||
|
||||
def test_generate_proposals_from_patterns():
|
||||
"""Proposals generated from waste patterns."""
|
||||
sessions = [
|
||||
_make_session(f"s{i}", errors=["Error: push timeout"])
|
||||
for i in range(5)
|
||||
]
|
||||
patterns = analyze_sessions(sessions)
|
||||
proposals = generate_proposals(patterns)
|
||||
assert len(proposals) >= 1
|
||||
assert proposals[0].estimated_monthly_hours_saved > 0
|
||||
assert proposals[0].priority in ("critical", "high", "medium", "low")
|
||||
print("PASS: test_generate_proposals_from_patterns")
|
||||
|
||||
|
||||
def test_proposal_roi_positive():
|
||||
"""ROI weeks should be a positive number for recoverable time."""
|
||||
patterns = [mod.WastePattern(
|
||||
pattern_type="repeated_error",
|
||||
description="Test error",
|
||||
occurrences=10,
|
||||
total_time_hours=5.0,
|
||||
affected_repos=["test"],
|
||||
)]
|
||||
proposals = generate_proposals(patterns)
|
||||
assert len(proposals) == 1
|
||||
assert proposals[0].roi_weeks > 0
|
||||
assert proposals[0].roi_weeks < 100
|
||||
print("PASS: test_proposal_roi_positive")
|
||||
|
||||
|
||||
def test_proposals_sorted_by_impact():
|
||||
"""Proposals should be sorted by monthly hours saved (descending)."""
|
||||
sessions = [
|
||||
_make_session("s1", errors=["Minor warning"] * 3, duration=5),
|
||||
_make_session("s2", errors=["Critical failure: deploy crashed"] * 5, duration=60),
|
||||
]
|
||||
# Add more sessions to cross threshold
|
||||
for i in range(3, 7):
|
||||
sessions.append(_make_session(f"s{i}", errors=["Critical failure: deploy crashed"]))
|
||||
|
||||
patterns = analyze_sessions(sessions)
|
||||
proposals = generate_proposals(patterns)
|
||||
if len(proposals) >= 2:
|
||||
for i in range(len(proposals) - 1):
|
||||
assert proposals[i].estimated_monthly_hours_saved >= proposals[i + 1].estimated_monthly_hours_saved
|
||||
print("PASS: test_proposals_sorted_by_impact")
|
||||
|
||||
|
||||
def test_format_markdown():
|
||||
"""Markdown output should contain expected sections."""
|
||||
patterns = [mod.WastePattern(
|
||||
pattern_type="repeated_error", description="Test", occurrences=5,
|
||||
total_time_hours=2.5, affected_repos=["repo"],
|
||||
)]
|
||||
proposals = generate_proposals(patterns)
|
||||
md = format_proposals_markdown(proposals, patterns, "2026-04-15T00:00:00Z")
|
||||
assert "# Improvement Proposals" in md
|
||||
assert "## Summary" in md
|
||||
assert "### Problem" in md
|
||||
assert "### ROI Estimate" in md
|
||||
assert "## Appendix" in md
|
||||
print("PASS: test_format_markdown")
|
||||
|
||||
|
||||
def test_format_json():
|
||||
"""JSON output should be valid and parseable."""
|
||||
patterns = [mod.WastePattern(
|
||||
pattern_type="slow_tool", description="Slow", occurrences=10,
|
||||
total_time_hours=3.0, affected_repos=["global"],
|
||||
)]
|
||||
proposals = generate_proposals(patterns)
|
||||
output = format_proposals_json(proposals)
|
||||
parsed = json.loads(output)
|
||||
assert isinstance(parsed, list)
|
||||
assert len(parsed) == len(proposals)
|
||||
assert "title" in parsed[0]
|
||||
assert "roi_weeks" in parsed[0]
|
||||
print("PASS: test_format_json")
|
||||
|
||||
|
||||
def test_normalize_error():
|
||||
"""Error normalization should remove paths and hashes."""
|
||||
err1 = _normalize_error("Failed to clone /Users/apayne/repo with token abc123def456")
|
||||
assert "/PATH" in err1
|
||||
assert "HASH" in err1
|
||||
assert "/Users/apayne" not in err1
|
||||
|
||||
err2 = _normalize_error("")
|
||||
assert err2 == ""
|
||||
|
||||
err3 = _normalize_error("Simple error message")
|
||||
assert "simple error" in err3
|
||||
print("PASS: test_normalize_error")
|
||||
|
||||
|
||||
def test_cli_integration():
|
||||
"""End-to-end test: write input JSON, run script, check output."""
|
||||
import subprocess
|
||||
|
||||
sessions = [
|
||||
_make_session(f"s{i}", errors=["Connection refused: port 8080"])
|
||||
for i in range(5)
|
||||
]
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
input_path = os.path.join(tmpdir, "analytics.json")
|
||||
output_path = os.path.join(tmpdir, "proposals.md")
|
||||
|
||||
with open(input_path, "w") as f:
|
||||
json.dump({"sessions": sessions}, f)
|
||||
|
||||
script = os.path.join(os.path.dirname(__file__) or ".", "improvement_proposals.py")
|
||||
result = subprocess.run(
|
||||
[sys.executable, script, "--input", input_path, "--output", output_path],
|
||||
capture_output=True, text=True, timeout=10,
|
||||
)
|
||||
|
||||
assert result.returncode == 0, f"CLI failed: {result.stderr}"
|
||||
assert os.path.exists(output_path)
|
||||
|
||||
with open(output_path) as f:
|
||||
content = f.read()
|
||||
assert "# Improvement Proposals" in content
|
||||
print("PASS: test_cli_integration")
|
||||
|
||||
|
||||
def run_all():
|
||||
test_empty_sessions()
|
||||
test_no_patterns_on_clean_sessions()
|
||||
test_repeated_error_detection()
|
||||
test_repeated_error_threshold()
|
||||
test_slow_tool_detection()
|
||||
test_fast_tool_not_flagged()
|
||||
test_failed_retry_detection()
|
||||
test_manual_process_detection()
|
||||
test_generate_proposals_from_patterns()
|
||||
test_proposal_roi_positive()
|
||||
test_proposals_sorted_by_impact()
|
||||
test_format_markdown()
|
||||
test_format_json()
|
||||
test_normalize_error()
|
||||
test_cli_integration()
|
||||
print("\nAll 15 tests passed!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run_all()
|
||||
129
scripts/test_knowledge_staleness.py
Normal file
129
scripts/test_knowledge_staleness.py
Normal file
@@ -0,0 +1,129 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Tests for scripts/knowledge_staleness_check.py — 8 tests."""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import tempfile
|
||||
|
||||
sys.path.insert(0, os.path.dirname(__file__) or ".")
|
||||
import importlib.util
|
||||
spec = importlib.util.spec_from_file_location("ks", os.path.join(os.path.dirname(__file__) or ".", "knowledge_staleness_check.py"))
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(mod)
|
||||
check_staleness = mod.check_staleness
|
||||
fix_hashes = mod.fix_hashes
|
||||
compute_file_hash = mod.compute_file_hash
|
||||
|
||||
|
||||
def test_fresh_entry():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
src = os.path.join(tmpdir, "source.py")
|
||||
with open(src, "w") as f:
|
||||
f.write("print('hello')")
|
||||
h = compute_file_hash(src)
|
||||
idx = os.path.join(tmpdir, "index.json")
|
||||
with open(idx, "w") as f:
|
||||
json.dump({"facts": [{"fact": "hello", "source_file": "source.py", "source_hash": h}]}, f)
|
||||
results = check_staleness(idx, tmpdir)
|
||||
assert results[0]["status"] == "fresh"
|
||||
print("PASS: test_fresh_entry")
|
||||
|
||||
|
||||
def test_stale_entry():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
src = os.path.join(tmpdir, "source.py")
|
||||
with open(src, "w") as f:
|
||||
f.write("original content")
|
||||
idx = os.path.join(tmpdir, "index.json")
|
||||
with open(idx, "w") as f:
|
||||
json.dump({"facts": [{"fact": "old", "source_file": "source.py", "source_hash": "sha256:wrong"}]}, f)
|
||||
# Now change the source
|
||||
with open(src, "w") as f:
|
||||
f.write("modified content")
|
||||
results = check_staleness(idx, tmpdir)
|
||||
assert results[0]["status"] == "stale"
|
||||
print("PASS: test_stale_entry")
|
||||
|
||||
|
||||
def test_missing_source():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
idx = os.path.join(tmpdir, "index.json")
|
||||
with open(idx, "w") as f:
|
||||
json.dump({"facts": [{"fact": "gone", "source_file": "nonexistent.py", "source_hash": "sha256:abc"}]}, f)
|
||||
results = check_staleness(idx, tmpdir)
|
||||
assert results[0]["status"] == "missing_source"
|
||||
print("PASS: test_missing_source")
|
||||
|
||||
|
||||
def test_no_hash():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
src = os.path.join(tmpdir, "source.py")
|
||||
with open(src, "w") as f:
|
||||
f.write("content")
|
||||
idx = os.path.join(tmpdir, "index.json")
|
||||
with open(idx, "w") as f:
|
||||
json.dump({"facts": [{"fact": "no hash", "source_file": "source.py"}]}, f)
|
||||
results = check_staleness(idx, tmpdir)
|
||||
assert results[0]["status"] == "no_hash"
|
||||
assert results[0]["current_hash"].startswith("sha256:")
|
||||
print("PASS: test_no_hash")
|
||||
|
||||
|
||||
def test_no_source_field():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
idx = os.path.join(tmpdir, "index.json")
|
||||
with open(idx, "w") as f:
|
||||
json.dump({"facts": [{"fact": "orphan"}]}, f)
|
||||
results = check_staleness(idx, tmpdir)
|
||||
assert results[0]["status"] == "no_source"
|
||||
print("PASS: test_no_source_field")
|
||||
|
||||
|
||||
def test_fix_hashes():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
src = os.path.join(tmpdir, "source.py")
|
||||
with open(src, "w") as f:
|
||||
f.write("content for hashing")
|
||||
idx = os.path.join(tmpdir, "index.json")
|
||||
with open(idx, "w") as f:
|
||||
json.dump({"facts": [{"fact": "needs hash", "source_file": "source.py"}]}, f)
|
||||
fixed = fix_hashes(idx, tmpdir)
|
||||
assert fixed == 1
|
||||
# Verify hash was added
|
||||
with open(idx) as f:
|
||||
data = json.load(f)
|
||||
assert data["facts"][0]["source_hash"].startswith("sha256:")
|
||||
print("PASS: test_fix_hashes")
|
||||
|
||||
|
||||
def test_empty_index():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
idx = os.path.join(tmpdir, "index.json")
|
||||
with open(idx, "w") as f:
|
||||
json.dump({"facts": []}, f)
|
||||
results = check_staleness(idx, tmpdir)
|
||||
assert results == []
|
||||
print("PASS: test_empty_index")
|
||||
|
||||
|
||||
def test_compute_hash_nonexistent():
|
||||
h = compute_file_hash("/nonexistent/path/file.py")
|
||||
assert h is None
|
||||
print("PASS: test_compute_hash_nonexistent")
|
||||
|
||||
|
||||
def run_all():
|
||||
test_fresh_entry()
|
||||
test_stale_entry()
|
||||
test_missing_source()
|
||||
test_no_hash()
|
||||
test_no_source_field()
|
||||
test_fix_hashes()
|
||||
test_empty_index()
|
||||
test_compute_hash_nonexistent()
|
||||
print("\nAll 8 tests passed!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run_all()
|
||||
170
scripts/test_pr_complexity_scorer.py
Normal file
170
scripts/test_pr_complexity_scorer.py
Normal file
@@ -0,0 +1,170 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Tests for PR Complexity Scorer — unit tests for the scoring logic.
|
||||
"""
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
|
||||
from pr_complexity_scorer import (
|
||||
score_pr,
|
||||
is_dependency_file,
|
||||
is_test_file,
|
||||
TIME_PER_POINT,
|
||||
SMALL_FILES,
|
||||
MEDIUM_FILES,
|
||||
LARGE_FILES,
|
||||
SMALL_LINES,
|
||||
MEDIUM_LINES,
|
||||
LARGE_LINES,
|
||||
)
|
||||
|
||||
PASS = 0
|
||||
FAIL = 0
|
||||
|
||||
def test(name):
|
||||
def decorator(fn):
|
||||
global PASS, FAIL
|
||||
try:
|
||||
fn()
|
||||
PASS += 1
|
||||
print(f" [PASS] {name}")
|
||||
except AssertionError as e:
|
||||
FAIL += 1
|
||||
print(f" [FAIL] {name}: {e}")
|
||||
except Exception as e:
|
||||
FAIL += 1
|
||||
print(f" [FAIL] {name}: Unexpected error: {e}")
|
||||
return decorator
|
||||
|
||||
def assert_eq(a, b, msg=""):
|
||||
if a != b:
|
||||
raise AssertionError(f"{msg} expected {b!r}, got {a!r}")
|
||||
|
||||
def assert_true(v, msg=""):
|
||||
if not v:
|
||||
raise AssertionError(msg or "Expected True")
|
||||
|
||||
def assert_false(v, msg=""):
|
||||
if v:
|
||||
raise AssertionError(msg or "Expected False")
|
||||
|
||||
|
||||
print("=== PR Complexity Scorer Tests ===\n")
|
||||
|
||||
print("-- File Classification --")
|
||||
|
||||
@test("dependency file detection — requirements.txt")
|
||||
def _():
|
||||
assert_true(is_dependency_file("requirements.txt"))
|
||||
assert_true(is_dependency_file("src/requirements.txt"))
|
||||
assert_false(is_dependency_file("requirements_test.txt"))
|
||||
|
||||
@test("dependency file detection — pyproject.toml")
|
||||
def _():
|
||||
assert_true(is_dependency_file("pyproject.toml"))
|
||||
assert_false(is_dependency_file("myproject.py"))
|
||||
|
||||
@test("test file detection — pytest style")
|
||||
def _():
|
||||
assert_true(is_test_file("tests/test_api.py"))
|
||||
assert_true(is_test_file("test_module.py"))
|
||||
assert_true(is_test_file("src/module_test.py"))
|
||||
|
||||
@test("test file detection — other frameworks")
|
||||
def _():
|
||||
assert_true(is_test_file("spec/feature_spec.rb"))
|
||||
assert_true(is_test_file("__tests__/component.test.js"))
|
||||
assert_false(is_test_file("testfixtures/helper.py"))
|
||||
|
||||
|
||||
print("\n-- Scoring Logic --")
|
||||
|
||||
@test("small PR gets low score (1-3)")
|
||||
def _():
|
||||
score, minutes, _ = score_pr(
|
||||
files_changed=3,
|
||||
additions=50,
|
||||
deletions=10,
|
||||
has_dependency_changes=False,
|
||||
test_coverage_delta=None
|
||||
)
|
||||
assert_true(1 <= score <= 3, f"Score should be low, got {score}")
|
||||
assert_true(minutes < 20)
|
||||
|
||||
@test("medium PR gets medium score (4-6)")
|
||||
def _():
|
||||
score, minutes, _ = score_pr(
|
||||
files_changed=15,
|
||||
additions=400,
|
||||
deletions=100,
|
||||
has_dependency_changes=False,
|
||||
test_coverage_delta=None
|
||||
)
|
||||
assert_true(4 <= score <= 6, f"Score should be medium, got {score}")
|
||||
assert_true(20 <= minutes <= 45)
|
||||
|
||||
@test("large PR gets high score (7-9)")
|
||||
def _():
|
||||
score, minutes, _ = score_pr(
|
||||
files_changed=60,
|
||||
additions=3000,
|
||||
deletions=1500,
|
||||
has_dependency_changes=True,
|
||||
test_coverage_delta=None
|
||||
)
|
||||
assert_true(7 <= score <= 9, f"Score should be high, got {score}")
|
||||
assert_true(minutes >= 45)
|
||||
|
||||
@test("dependency changes boost score")
|
||||
def _():
|
||||
base_score, _, _ = score_pr(
|
||||
files_changed=10, additions=200, deletions=50,
|
||||
has_dependency_changes=False, test_coverage_delta=None
|
||||
)
|
||||
dep_score, _, _ = score_pr(
|
||||
files_changed=10, additions=200, deletions=50,
|
||||
has_dependency_changes=True, test_coverage_delta=None
|
||||
)
|
||||
assert_true(dep_score > base_score, f"Deps: {base_score} -> {dep_score}")
|
||||
|
||||
@test("adding tests lowers complexity")
|
||||
def _():
|
||||
base_score, _, _ = score_pr(
|
||||
files_changed=8, additions=150, deletions=20,
|
||||
has_dependency_changes=False, test_coverage_delta=None
|
||||
)
|
||||
better_score, _, _ = score_pr(
|
||||
files_changed=8, additions=180, deletions=20,
|
||||
has_dependency_changes=False, test_coverage_delta=3
|
||||
)
|
||||
assert_true(better_score < base_score, f"Tests: {base_score} -> {better_score}")
|
||||
|
||||
@test("removing tests increases complexity")
|
||||
def _():
|
||||
base_score, _, _ = score_pr(
|
||||
files_changed=8, additions=150, deletions=20,
|
||||
has_dependency_changes=False, test_coverage_delta=None
|
||||
)
|
||||
worse_score, _, _ = score_pr(
|
||||
files_changed=8, additions=150, deletions=20,
|
||||
has_dependency_changes=False, test_coverage_delta=-2
|
||||
)
|
||||
assert_true(worse_score > base_score, f"Remove tests: {base_score} -> {worse_score}")
|
||||
|
||||
@test("score bounded 1-10")
|
||||
def _():
|
||||
for files, adds, dels in [(1, 10, 5), (100, 10000, 5000)]:
|
||||
score, _, _ = score_pr(files, adds, dels, False, None)
|
||||
assert_true(1 <= score <= 10, f"Score {score} out of range")
|
||||
|
||||
@test("estimated minutes exist for all scores")
|
||||
def _():
|
||||
for s in range(1, 11):
|
||||
assert_true(s in TIME_PER_POINT, f"Missing time for score {s}")
|
||||
|
||||
|
||||
print(f"\n=== Results: {PASS} passed, {FAIL} failed ===")
|
||||
sys.exit(0 if FAIL == 0 else 1)
|
||||
305
scripts/test_priority_rebalancer.py
Normal file
305
scripts/test_priority_rebalancer.py
Normal file
@@ -0,0 +1,305 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Tests for Priority Rebalancer
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import tempfile
|
||||
from datetime import datetime, timedelta
|
||||
from pathlib import Path
|
||||
|
||||
# Add script dir to path
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
|
||||
from priority_rebalancer import (
|
||||
GiteaClient,
|
||||
IssueScore,
|
||||
PipelineSignal,
|
||||
compute_issue_score,
|
||||
collect_knowledge_signals,
|
||||
collect_metrics_signals,
|
||||
extract_priority,
|
||||
generate_report,
|
||||
generate_markdown_report,
|
||||
PRIORITY_LEVELS,
|
||||
)
|
||||
|
||||
# ============================================================
|
||||
# Test Helpers
|
||||
# ============================================================
|
||||
|
||||
PASS = 0
|
||||
FAIL = 0
|
||||
|
||||
def test(name):
|
||||
def decorator(fn):
|
||||
global PASS, FAIL
|
||||
try:
|
||||
fn()
|
||||
PASS += 1
|
||||
print(f" [PASS] {name}")
|
||||
except Exception as e:
|
||||
FAIL += 1
|
||||
print(f" [FAIL] {name}: {e}")
|
||||
return decorator
|
||||
|
||||
def assert_eq(a, b, msg=""):
|
||||
if a != b:
|
||||
raise AssertionError(f"{msg} expected {b!r}, got {a!r}")
|
||||
|
||||
def assert_true(v, msg=""):
|
||||
if not v:
|
||||
raise AssertionError(msg or "Expected True")
|
||||
|
||||
def assert_false(v, msg=""):
|
||||
if v:
|
||||
raise AssertionError(msg or "Expected False")
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Priority Extraction Tests
|
||||
# ============================================================
|
||||
|
||||
print("=== Priority Rebalancer Tests ===\n")
|
||||
|
||||
print("-- Priority Extraction --")
|
||||
|
||||
@test("extract P0 from label")
|
||||
def _():
|
||||
assert_eq(extract_priority(["P0", "bug"]), "P0")
|
||||
|
||||
@test("extract P1 from priority:high")
|
||||
def _():
|
||||
assert_eq(extract_priority(["priority:high"]), "P1")
|
||||
|
||||
@test("extract P2 from priority:medium")
|
||||
def _():
|
||||
assert_eq(extract_priority(["priority:medium"]), "P2")
|
||||
|
||||
@test("extract P3 from priority:low")
|
||||
def _():
|
||||
assert_eq(extract_priority(["priority:low"]), "P3")
|
||||
|
||||
@test("returns None for no priority")
|
||||
def _():
|
||||
assert_eq(extract_priority(["bug", "enhancement"]), None)
|
||||
|
||||
@test("case insensitive")
|
||||
def _():
|
||||
assert_eq(extract_priority(["p1"]), "P1")
|
||||
assert_eq(extract_priority(["PRIORITY:CRITICAL"]), "P0")
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Issue Scoring Tests
|
||||
# ============================================================
|
||||
|
||||
print("\n-- Issue Scoring --")
|
||||
|
||||
def make_issue(**kwargs):
|
||||
defaults = {
|
||||
"number": 1,
|
||||
"title": "Test issue",
|
||||
"labels": [],
|
||||
"created_at": (datetime.utcnow() - timedelta(days=5)).isoformat() + "Z",
|
||||
"comments": 0,
|
||||
"assignees": None,
|
||||
}
|
||||
defaults.update(kwargs)
|
||||
return defaults
|
||||
|
||||
@test("bug gets score boost")
|
||||
def _():
|
||||
issue = make_issue(title="Incorrect output format", labels=["bug"])
|
||||
score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
|
||||
assert_true(score.score > 0, f"Bug should boost score, got {score.score}")
|
||||
# Bug label alone should be P2 or P3 (not P0)
|
||||
assert_true(score.suggested_priority in ("P2", "P3"),
|
||||
f"Bug label alone should be P2/P3, got {score.suggested_priority}")
|
||||
|
||||
@test("security gets high score")
|
||||
def _():
|
||||
issue = make_issue(title="Security: auth bypass", labels=["bug"])
|
||||
score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
|
||||
assert_true(score.score >= 25, f"Security should score high, got {score.score}")
|
||||
|
||||
@test("old dormant issue gets penalized")
|
||||
def _():
|
||||
issue = make_issue(
|
||||
title="Some old feature",
|
||||
created_at=(datetime.utcnow() - timedelta(days=120)).isoformat() + "Z",
|
||||
comments=0
|
||||
)
|
||||
score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
|
||||
assert_true(score.score < 0, f"Old dormant should be negative, got {score.score}")
|
||||
assert_true(any("Dormant" in r for r in score.reasons), "Should mention dormancy")
|
||||
|
||||
@test("active discussion boosts score")
|
||||
def _():
|
||||
issue = make_issue(title="Important fix", comments=8)
|
||||
score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
|
||||
assert_true(score.score > 5, f"Active discussion should boost, got {score.score}")
|
||||
assert_true(any("Active" in r for r in score.reasons))
|
||||
|
||||
@test("unassigned gets slight boost")
|
||||
def _():
|
||||
issue = make_issue(title="Fix bug", assignees=None)
|
||||
score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
|
||||
assert_true(any("Unassigned" in r for r in score.reasons))
|
||||
|
||||
@test("assigned issue notes assignee")
|
||||
def _():
|
||||
issue = make_issue(title="Fix bug", assignees=[{"login": "alice"}])
|
||||
score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
|
||||
assert_eq(score.assignee, "alice")
|
||||
|
||||
@test("nice-to-have gets penalized")
|
||||
def _():
|
||||
issue = make_issue(title="Nice to have: fancy animation")
|
||||
score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
|
||||
assert_true(score.score < 0, f"Nice-to-have should be negative, got {score.score}")
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Pipeline Signal Tests
|
||||
# ============================================================
|
||||
|
||||
print("\n-- Pipeline Signals --")
|
||||
|
||||
@test("signal alignment boosts matching issues")
|
||||
def _():
|
||||
signals = [PipelineSignal(
|
||||
source="knowledge",
|
||||
signal_type="stale_knowledge",
|
||||
weight=0.8,
|
||||
detail="20 stale facts"
|
||||
)]
|
||||
issue = make_issue(title="Fix stale knowledge entries")
|
||||
score = compute_issue_score(issue, "test-repo", signals, datetime.utcnow())
|
||||
assert_true(any("Matches signal" in r for r in score.reasons))
|
||||
|
||||
@test("empty knowledge boosts harvester issues")
|
||||
def _():
|
||||
signals = [PipelineSignal(
|
||||
source="knowledge",
|
||||
signal_type="empty_knowledge",
|
||||
weight=0.7,
|
||||
detail="0 facts"
|
||||
)]
|
||||
issue = make_issue(title="Implement harvester pipeline")
|
||||
score = compute_issue_score(issue, "test-repo", signals, datetime.utcnow())
|
||||
assert_true(any("Critical gap" in r for r in score.reasons))
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Knowledge Signal Collection Tests
|
||||
# ============================================================
|
||||
|
||||
print("\n-- Knowledge Signal Collection --")
|
||||
|
||||
@test("missing index generates signal")
|
||||
def _():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
signals = collect_knowledge_signals(tmpdir)
|
||||
assert_true(len(signals) > 0)
|
||||
assert_eq(signals[0].signal_type, "missing_index")
|
||||
|
||||
@test("empty knowledge generates signal")
|
||||
def _():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
idx = os.path.join(tmpdir, "index.json")
|
||||
with open(idx, "w") as f:
|
||||
json.dump({"facts": []}, f)
|
||||
signals = collect_knowledge_signals(tmpdir)
|
||||
assert_true(any(s.signal_type == "empty_knowledge" for s in signals))
|
||||
|
||||
@test("corrupt index generates signal")
|
||||
def _():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
idx = os.path.join(tmpdir, "index.json")
|
||||
with open(idx, "w") as f:
|
||||
f.write("not json {{{")
|
||||
signals = collect_knowledge_signals(tmpdir)
|
||||
assert_true(any(s.signal_type == "corrupt_index" for s in signals))
|
||||
|
||||
@test("knowledge with facts passes")
|
||||
def _():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
idx = os.path.join(tmpdir, "index.json")
|
||||
with open(idx, "w") as f:
|
||||
json.dump({"facts": [
|
||||
{"id": 1, "repo": "test", "status": "fresh"},
|
||||
{"id": 2, "repo": "test", "status": "fresh"},
|
||||
]}, f)
|
||||
signals = collect_knowledge_signals(tmpdir)
|
||||
# Should not generate missing_index or empty_knowledge
|
||||
assert_false(any(s.signal_type in ("missing_index", "empty_knowledge") for s in signals))
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Metrics Signal Collection Tests
|
||||
# ============================================================
|
||||
|
||||
print("\n-- Metrics Signal Collection --")
|
||||
|
||||
@test("empty metrics dir generates signal")
|
||||
def _():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
signals = collect_metrics_signals(tmpdir)
|
||||
assert_true(any(s.signal_type == "no_metrics" for s in signals))
|
||||
|
||||
@test("metrics with files passes")
|
||||
def _():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Create files (simulating real metrics dir with .gitkeep + actual files)
|
||||
with open(os.path.join(tmpdir, ".gitkeep"), "w") as f:
|
||||
f.write("")
|
||||
with open(os.path.join(tmpdir, "report.json"), "w") as f:
|
||||
f.write("{}")
|
||||
signals = collect_metrics_signals(tmpdir)
|
||||
assert_false(any(s.signal_type == "no_metrics" for s in signals))
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Report Generation Tests
|
||||
# ============================================================
|
||||
|
||||
print("\n-- Report Generation --")
|
||||
|
||||
@test("report has correct structure")
|
||||
def _():
|
||||
scores = [
|
||||
IssueScore(1, "repo1", "Bug fix", ["bug"], None, "P1", 30.0, ["test"], 5, 3, None),
|
||||
IssueScore(2, "repo1", "Feature", ["enhancement"], "P3", None, -5.0, ["test"], 60, 0, "alice"),
|
||||
]
|
||||
signals = [PipelineSignal("knowledge", "stale_knowledge", 0.5, "10 stale")]
|
||||
report = generate_report(scores, signals, "test-org", ["repo1"])
|
||||
|
||||
assert_eq(report["org"], "test-org")
|
||||
assert_eq(report["total_issues"], 2)
|
||||
assert_true("generated_at" in report)
|
||||
assert_true("summary" in report)
|
||||
assert_true("top_priority" in report)
|
||||
assert_eq(report["summary"]["suggested_new_priorities"], 1)
|
||||
|
||||
@test("markdown report is non-empty")
|
||||
def _():
|
||||
scores = [IssueScore(1, "repo1", "Test", ["bug"], None, "P2", 15.0, ["reason"], 5, 0, None)]
|
||||
report = generate_report(scores, [], "test-org", ["repo1"])
|
||||
md = generate_markdown_report(report)
|
||||
assert_true(len(md) > 100)
|
||||
assert_true("Priority Rebalancer Report" in md)
|
||||
assert_true("Top 10" in md)
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Summary
|
||||
# ============================================================
|
||||
|
||||
print(f"\n=== Summary ===")
|
||||
print(f"Total: {PASS + FAIL} | Passed: {PASS} | Failed: {FAIL}")
|
||||
|
||||
if FAIL > 0:
|
||||
sys.exit(1)
|
||||
242
scripts/test_refactoring_opportunity_finder.py
Normal file
242
scripts/test_refactoring_opportunity_finder.py
Normal file
@@ -0,0 +1,242 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Tests for scripts/refactoring_opportunity_finder.py — 10 tests."""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import tempfile
|
||||
|
||||
sys.path.insert(0, os.path.dirname(__file__) or ".")
|
||||
import importlib.util
|
||||
spec = importlib.util.spec_from_file_location(
|
||||
"rof", os.path.join(os.path.dirname(__file__) or ".", "refactoring_opportunity_finder.py"))
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(mod)
|
||||
|
||||
compute_file_complexity = mod.compute_file_complexity
|
||||
calculate_refactoring_score = mod.calculate_refactoring_score
|
||||
FileMetrics = mod.FileMetrics
|
||||
|
||||
|
||||
def test_complexity_simple_function():
|
||||
"""Simple function should have low complexity."""
|
||||
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
|
||||
f.write("""
|
||||
def simple():
|
||||
return 42
|
||||
""")
|
||||
f.flush()
|
||||
avg, max_c, funcs, classes, lines = compute_file_complexity(f.name)
|
||||
assert avg == 1.0, f"Expected 1.0, got {avg}"
|
||||
assert max_c == 1, f"Expected 1, got {max_c}"
|
||||
assert funcs == 1, f"Expected 1, got {funcs}"
|
||||
assert classes == 0, f"Expected 0, got {classes}"
|
||||
os.unlink(f.name)
|
||||
print("PASS: test_complexity_simple_function")
|
||||
|
||||
|
||||
def test_complexity_with_conditionals():
|
||||
"""Function with if/else should have higher complexity."""
|
||||
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
|
||||
f.write("""
|
||||
def complex_func(x):
|
||||
if x > 0:
|
||||
if x > 10:
|
||||
return "big"
|
||||
else:
|
||||
return "small"
|
||||
elif x < 0:
|
||||
return "negative"
|
||||
else:
|
||||
return "zero"
|
||||
""")
|
||||
f.flush()
|
||||
avg, max_c, funcs, classes, lines = compute_file_complexity(f.name)
|
||||
# Base 1 + 3 if/elif + 1 nested if = 5
|
||||
assert max_c >= 4, f"Expected max_c >= 4, got {max_c}"
|
||||
assert funcs == 1, f"Expected 1, got {funcs}"
|
||||
os.unlink(f.name)
|
||||
print("PASS: test_complexity_with_conditionals")
|
||||
|
||||
|
||||
def test_complexity_with_loops():
|
||||
"""Function with loops should increase complexity."""
|
||||
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
|
||||
f.write("""
|
||||
def loop_func(items):
|
||||
result = []
|
||||
for item in items:
|
||||
if item > 0:
|
||||
result.append(item)
|
||||
while len(result) > 10:
|
||||
result.pop()
|
||||
return result
|
||||
""")
|
||||
f.flush()
|
||||
avg, max_c, funcs, classes, lines = compute_file_complexity(f.name)
|
||||
# Base 1 + 1 for + 1 if + 1 while = 4
|
||||
assert max_c >= 3, f"Expected max_c >= 3, got {max_c}"
|
||||
os.unlink(f.name)
|
||||
print("PASS: test_complexity_with_loops")
|
||||
|
||||
|
||||
def test_complexity_with_class():
|
||||
"""Class with methods should count both."""
|
||||
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
|
||||
f.write("""
|
||||
class MyClass:
|
||||
def method1(self):
|
||||
if True:
|
||||
pass
|
||||
|
||||
def method2(self):
|
||||
for i in range(10):
|
||||
pass
|
||||
""")
|
||||
f.flush()
|
||||
avg, max_c, funcs, classes, lines = compute_file_complexity(f.name)
|
||||
assert classes == 1, f"Expected 1 class, got {classes}"
|
||||
assert funcs == 2, f"Expected 2 functions, got {funcs}"
|
||||
os.unlink(f.name)
|
||||
print("PASS: test_complexity_with_class")
|
||||
|
||||
|
||||
def test_complexity_syntax_error():
|
||||
"""File with syntax error should return zeros."""
|
||||
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
|
||||
f.write("def broken(:\n pass")
|
||||
f.flush()
|
||||
avg, max_c, funcs, classes, lines = compute_file_complexity(f.name)
|
||||
assert avg == 0.0, f"Expected 0.0, got {avg}"
|
||||
assert funcs == 0, f"Expected 0, got {funcs}"
|
||||
os.unlink(f.name)
|
||||
print("PASS: test_complexity_syntax_error")
|
||||
|
||||
|
||||
def test_refactoring_score_high_complexity():
|
||||
"""High complexity should give high score."""
|
||||
metrics = FileMetrics(
|
||||
path="test.py",
|
||||
lines=200,
|
||||
complexity=15.0,
|
||||
max_complexity=25,
|
||||
functions=10,
|
||||
classes=2,
|
||||
churn_30d=5,
|
||||
churn_90d=15,
|
||||
test_coverage=0.3,
|
||||
refactoring_score=0.0
|
||||
)
|
||||
score = calculate_refactoring_score(metrics)
|
||||
assert score > 50, f"Expected score > 50, got {score}"
|
||||
print("PASS: test_refactoring_score_high_complexity")
|
||||
|
||||
|
||||
def test_refactoring_score_low_complexity():
|
||||
"""Low complexity should give lower score."""
|
||||
metrics = FileMetrics(
|
||||
path="test.py",
|
||||
lines=50,
|
||||
complexity=2.0,
|
||||
max_complexity=3,
|
||||
functions=3,
|
||||
classes=0,
|
||||
churn_30d=0,
|
||||
churn_90d=1,
|
||||
test_coverage=0.9,
|
||||
refactoring_score=0.0
|
||||
)
|
||||
score = calculate_refactoring_score(metrics)
|
||||
assert score < 30, f"Expected score < 30, got {score}"
|
||||
print("PASS: test_refactoring_score_low_complexity")
|
||||
|
||||
|
||||
def test_refactoring_score_high_churn():
|
||||
"""High churn should increase score."""
|
||||
metrics = FileMetrics(
|
||||
path="test.py",
|
||||
lines=100,
|
||||
complexity=5.0,
|
||||
max_complexity=8,
|
||||
functions=5,
|
||||
classes=0,
|
||||
churn_30d=10,
|
||||
churn_90d=20,
|
||||
test_coverage=0.5,
|
||||
refactoring_score=0.0
|
||||
)
|
||||
score = calculate_refactoring_score(metrics)
|
||||
# Churn should contribute significantly
|
||||
assert score > 40, f"Expected score > 40 for high churn, got {score}"
|
||||
print("PASS: test_refactoring_score_high_churn")
|
||||
|
||||
|
||||
def test_refactoring_score_no_coverage():
|
||||
"""No coverage data should assume medium risk."""
|
||||
metrics = FileMetrics(
|
||||
path="test.py",
|
||||
lines=100,
|
||||
complexity=5.0,
|
||||
max_complexity=8,
|
||||
functions=5,
|
||||
classes=0,
|
||||
churn_30d=1,
|
||||
churn_90d=2,
|
||||
test_coverage=None,
|
||||
refactoring_score=0.0
|
||||
)
|
||||
score = calculate_refactoring_score(metrics)
|
||||
# Should have some score from the 5-point coverage component
|
||||
assert score > 0, f"Expected positive score, got {score}"
|
||||
print("PASS: test_refactoring_score_no_coverage")
|
||||
|
||||
|
||||
def test_refactoring_score_large_file():
|
||||
"""Large files should score higher."""
|
||||
metrics_small = FileMetrics(
|
||||
path="small.py",
|
||||
lines=50,
|
||||
complexity=5.0,
|
||||
max_complexity=8,
|
||||
functions=3,
|
||||
classes=0,
|
||||
churn_30d=1,
|
||||
churn_90d=2,
|
||||
test_coverage=0.8,
|
||||
refactoring_score=0.0
|
||||
)
|
||||
metrics_large = FileMetrics(
|
||||
path="large.py",
|
||||
lines=1000,
|
||||
complexity=5.0,
|
||||
max_complexity=8,
|
||||
functions=3,
|
||||
classes=0,
|
||||
churn_30d=1,
|
||||
churn_90d=2,
|
||||
test_coverage=0.8,
|
||||
refactoring_score=0.0
|
||||
)
|
||||
score_small = calculate_refactoring_score(metrics_small)
|
||||
score_large = calculate_refactoring_score(metrics_large)
|
||||
assert score_large > score_small, \
|
||||
f"Large file ({score_large}) should score higher than small ({score_small})"
|
||||
print("PASS: test_refactoring_score_large_file")
|
||||
|
||||
|
||||
def run_all():
|
||||
test_complexity_simple_function()
|
||||
test_complexity_with_conditionals()
|
||||
test_complexity_with_loops()
|
||||
test_complexity_with_class()
|
||||
test_complexity_syntax_error()
|
||||
test_refactoring_score_high_complexity()
|
||||
test_refactoring_score_low_complexity()
|
||||
test_refactoring_score_high_churn()
|
||||
test_refactoring_score_no_coverage()
|
||||
test_refactoring_score_large_file()
|
||||
print("\nAll 10 tests passed!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run_all()
|
||||
90
scripts/test_session_pair_harvester.py
Normal file
90
scripts/test_session_pair_harvester.py
Normal file
@@ -0,0 +1,90 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Tests for session_pair_harvester."""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import os
|
||||
import tempfile
|
||||
|
||||
sys.path.insert(0, os.path.dirname(__file__))
|
||||
from session_pair_harvester import extract_pairs_from_session, deduplicate_pairs, compute_hash
|
||||
|
||||
|
||||
def test_basic_extraction():
|
||||
session = {
|
||||
"id": "test_001",
|
||||
"model": "test-model",
|
||||
"conversations": [
|
||||
{"from": "system", "value": "You are helpful."},
|
||||
{"from": "human", "value": "What is Python?"},
|
||||
{"from": "gpt", "value": "Python is a high-level programming language known for its readability and versatility. It supports multiple paradigms including procedural, object-oriented, and functional programming. Python is widely used in web development, data science, machine learning, and automation."},
|
||||
]
|
||||
}
|
||||
pairs = extract_pairs_from_session(session, min_ratio=1.5, min_response_words=10)
|
||||
assert len(pairs) == 1
|
||||
assert pairs[0]["terse"] == "What is Python?"
|
||||
assert "programming language" in pairs[0]["rich"]
|
||||
assert pairs[0]["source"] == "test_001"
|
||||
print("PASS: test_basic_extraction")
|
||||
|
||||
|
||||
def test_filters_short_responses():
|
||||
session = {
|
||||
"id": "test_002",
|
||||
"model": "test",
|
||||
"conversations": [
|
||||
{"from": "human", "value": "Hi"},
|
||||
{"from": "gpt", "value": "Hello!"},
|
||||
]
|
||||
}
|
||||
pairs = extract_pairs_from_session(session, min_ratio=1.5, min_response_words=20)
|
||||
assert len(pairs) == 0
|
||||
print("PASS: test_filters_short_responses")
|
||||
|
||||
|
||||
def test_skips_tool_results():
|
||||
session = {
|
||||
"id": "test_003",
|
||||
"model": "test",
|
||||
"conversations": [
|
||||
{"from": "human", "value": '{"output": "file content", "exit_code": 0}'},
|
||||
{"from": "gpt", "value": "The file was read successfully. Now let me analyze the content and provide a detailed summary of what was found in the file system."},
|
||||
]
|
||||
}
|
||||
pairs = extract_pairs_from_session(session, min_ratio=1.5, min_response_words=10)
|
||||
assert len(pairs) == 0
|
||||
print("PASS: test_skips_tool_results")
|
||||
|
||||
|
||||
def test_deduplication():
|
||||
pairs = [
|
||||
{"terse": "What is X?", "rich": "X is Y.", "source": "s1", "model": "m"},
|
||||
{"terse": "What is X?", "rich": "X is Y.", "source": "s2", "model": "m"},
|
||||
{"terse": "What is Z?", "rich": "Z is W.", "source": "s1", "model": "m"},
|
||||
]
|
||||
unique = deduplicate_pairs(pairs)
|
||||
assert len(unique) == 2
|
||||
print("PASS: test_deduplication")
|
||||
|
||||
|
||||
def test_ratio_filter():
|
||||
session = {
|
||||
"id": "test_005",
|
||||
"model": "test",
|
||||
"conversations": [
|
||||
{"from": "human", "value": "Explain quantum computing in detail with examples and applications"},
|
||||
{"from": "gpt", "value": "OK."},
|
||||
]
|
||||
}
|
||||
pairs = extract_pairs_from_session(session, min_ratio=1.5, min_response_words=10)
|
||||
assert len(pairs) == 0 # response too short relative to prompt
|
||||
print("PASS: test_ratio_filter")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
test_basic_extraction()
|
||||
test_filters_short_responses()
|
||||
test_skips_tool_results()
|
||||
test_deduplication()
|
||||
test_ratio_filter()
|
||||
print("\nAll tests passed.")
|
||||
@@ -2,98 +2,107 @@
|
||||
|
||||
## System Prompt
|
||||
|
||||
You are a knowledge extraction engine. Your task is to analyze a session transcript and extract durable knowledge that will help future sessions be more efficient.
|
||||
You are a knowledge extraction engine. You read session transcripts and output ONLY structured JSON. You never infer. You never assume. You extract only what the transcript explicitly states.
|
||||
|
||||
## Instructions
|
||||
## Prompt
|
||||
|
||||
Read the session transcript carefully. Extract ONLY information that is explicitly stated in the transcript. Do NOT infer, assume, or hallucinate information.
|
||||
|
||||
### Categories
|
||||
|
||||
Extract knowledge into these categories:
|
||||
|
||||
1. **fact**: Concrete, verifiable information learned (e.g., "Repository X has 5 files", "API returns JSON with field Y")
|
||||
2. **pitfall**: Errors encountered, wrong assumptions, things that wasted time (e.g., "Assumed API token was in env var GITEA_TOKEN, but it's in ~/.config/gitea/token")
|
||||
3. **pattern**: Successful sequences of actions (e.g., "To deploy: 1. Run tests 2. Build 3. Push to Gitea 4. Trigger webhook")
|
||||
4. **tool-quirk**: Environment-specific behaviors (e.g., "Token paths are different on macOS vs Linux", "URL format requires trailing slash")
|
||||
5. **question**: Things identified but not answered (e.g., "Need to determine optimal batch size for harvesting")
|
||||
|
||||
### Output Format
|
||||
|
||||
Return a JSON object with an array of extracted knowledge items. Each item must have:
|
||||
|
||||
```json
|
||||
{
|
||||
"fact": "One sentence description of the knowledge",
|
||||
"category": "fact|pitfall|pattern|tool-quirk|question",
|
||||
"repo": "Repository name this applies to, or 'global' if general",
|
||||
"confidence": 0.0-1.0
|
||||
}
|
||||
```
|
||||
TASK: Extract durable knowledge from this session transcript.
|
||||
|
||||
### Confidence Scoring
|
||||
RULES:
|
||||
1. Extract ONLY information explicitly stated in the transcript.
|
||||
2. Do NOT infer, assume, or hallucinate.
|
||||
3. Every fact must be verifiable by pointing to a specific line in the transcript.
|
||||
4. If the session failed or was partial, extract pitfalls and questions — these are the most valuable.
|
||||
5. Be specific. "Gitea API is slow" is worthless. "Gitea issues endpoint with state=open returns empty when limit=50 but works with limit=5" is knowledge.
|
||||
|
||||
- 0.9-1.0: Explicitly stated with verification (e.g., "Error message shows X")
|
||||
- 0.7-0.8: Clearly implied by multiple data points
|
||||
- 0.5-0.6: Suggested but not fully verified
|
||||
- 0.3-0.4: Inferred from limited data
|
||||
- 0.1-0.2: Speculative or uncertain
|
||||
CATEGORIES (assign exactly one per item):
|
||||
- fact: Concrete, verifiable thing learned (paths, formats, counts, configs)
|
||||
- pitfall: Error hit, wrong assumption, time wasted, thing that didn't work
|
||||
- pattern: Successful sequence that should be reused (deploy steps, debug flow)
|
||||
- tool-quirk: Environment-specific behavior (token paths, URL formats, API gotchas)
|
||||
- question: Something identified but not answered — the NEXT agent should investigate
|
||||
|
||||
### Constraints
|
||||
CONFIDENCE:
|
||||
- 0.9: Directly observed with error output or explicit verification
|
||||
- 0.7: Multiple data points confirm, but not explicitly verified
|
||||
- 0.5: Suggested by context, not tested
|
||||
- 0.3: Inferred from limited evidence
|
||||
|
||||
1. **No hallucination**: Only extract what's explicitly in the transcript
|
||||
2. **Specificity**: Each fact must be specific and actionable
|
||||
3. **Relevance**: Only extract knowledge that would help future sessions
|
||||
4. **Brevity**: One sentence per fact
|
||||
5. **Partial sessions**: Even failed or incomplete sessions may contain valuable pitfalls
|
||||
|
||||
### Example Input/Output
|
||||
|
||||
**Input Transcript (excerpt):**
|
||||
```
|
||||
User: Clone the repo at forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence
|
||||
Agent: Trying git clone... Error: 401 Unauthorized
|
||||
Agent: Checking for token... Found in ~/.config/gitea/token
|
||||
Agent: Token is gitea_token format, not OAuth
|
||||
Agent: Clone successful with token authentication
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```json
|
||||
OUTPUT FORMAT (valid JSON only, no markdown, no explanation):
|
||||
{
|
||||
"knowledge": [
|
||||
{
|
||||
"fact": "Gitea repo at forge.alexanderwhitestone.com requires authentication for cloning",
|
||||
"category": "fact",
|
||||
"repo": "compounding-intelligence",
|
||||
"confidence": 0.9
|
||||
},
|
||||
{
|
||||
"fact": "Gitea authentication token is stored at ~/.config/gitea/token",
|
||||
"category": "tool-quirk",
|
||||
"repo": "global",
|
||||
"confidence": 0.9
|
||||
},
|
||||
{
|
||||
"fact": "Gitea uses gitea_token format, not OAuth for API access",
|
||||
"category": "tool-quirk",
|
||||
"repo": "global",
|
||||
"confidence": 0.8
|
||||
},
|
||||
{
|
||||
"fact": "Clone fails with 401 when no token is provided",
|
||||
"category": "pitfall",
|
||||
"repo": "compounding-intelligence",
|
||||
"confidence": 0.9
|
||||
"fact": "One specific sentence of knowledge",
|
||||
"category": "fact|pitfall|pattern|tool-quirk|question",
|
||||
"repo": "repo-name or global",
|
||||
"confidence": 0.0-1.0,
|
||||
"evidence": "Brief quote or reference from transcript that supports this"
|
||||
}
|
||||
]
|
||||
],
|
||||
"meta": {
|
||||
"session_outcome": "success|partial|failed",
|
||||
"tools_used": ["tool1", "tool2"],
|
||||
"repos_touched": ["repo1"],
|
||||
"error_count": 0,
|
||||
"knowledge_count": 0
|
||||
}
|
||||
}
|
||||
|
||||
TRANSCRIPT:
|
||||
{{transcript}}
|
||||
```
|
||||
|
||||
## Final Notes
|
||||
## Design Notes
|
||||
|
||||
- Process the entire transcript, not just the beginning
|
||||
- Pay special attention to errors and corrections
|
||||
- Note any environment-specific details
|
||||
- Track tool-specific behaviors and quirks
|
||||
- If the session failed, focus on pitfalls and questions
|
||||
### Why this works with mimo-v2-pro
|
||||
|
||||
Mimo needs:
|
||||
- Explicit format constraints ("valid JSON only, no markdown")
|
||||
- Clear category definitions with concrete examples
|
||||
- Hard rules before soft guidance
|
||||
- The transcript at the END (so it reads all instructions first)
|
||||
|
||||
This prompt front-loads all rules, then gives the transcript last. Mimo follows the pattern.
|
||||
|
||||
### Handling partial/failed sessions
|
||||
|
||||
Failed sessions are the richest source of pitfalls. The prompt explicitly says:
|
||||
> "If the session failed or was partial, extract pitfalls and questions — these are the most valuable."
|
||||
|
||||
This reframes failure as valuable output, not noise to discard.
|
||||
|
||||
### The `evidence` field
|
||||
|
||||
Added to the original spec. Every extracted item must cite where in the transcript it came from. This:
|
||||
- Prevents hallucination (can't cite what isn't there)
|
||||
- Enables verification (reviewer can check the source)
|
||||
- Trains confidence calibration (the agent must find evidence, not just claim it)
|
||||
|
||||
### Token budget
|
||||
|
||||
Target: ~1,000 tokens for the prompt (excluding transcript).
|
||||
|
||||
```
|
||||
System prompt: ~50 tokens
|
||||
Rules: ~200 tokens
|
||||
Categories: ~150 tokens
|
||||
Confidence: ~100 tokens
|
||||
Output format: ~200 tokens
|
||||
Design notes: NOT included in prompt (documentation only)
|
||||
─────────────────────────────
|
||||
Total prompt: ~700 tokens
|
||||
```
|
||||
|
||||
Leaves ~300 tokens headroom for variable content (transcript insertion, edge cases).
|
||||
|
||||
### What this replaces
|
||||
|
||||
The v1 prompt had:
|
||||
- Verbose prose explanations (waste tokens for mimo)
|
||||
- No `evidence` field (hallucination risk)
|
||||
- No `meta` block (no session-level metadata)
|
||||
- No explicit handling of failed sessions
|
||||
- Example was too long (~150 tokens of example for a 1k prompt)
|
||||
|
||||
This v2 is tighter, more structured, and adds the evidence requirement that prevents the #1 failure mode of extraction prompts: generating plausible-sounding facts that aren't in the transcript.
|
||||
|
||||
18
test_sessions/session_failure.jsonl
Normal file
18
test_sessions/session_failure.jsonl
Normal file
@@ -0,0 +1,18 @@
|
||||
{"role": "user", "content": "Deploy the application to production"}
|
||||
{"role": "assistant", "content": "I'll deploy the application to production.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ssh deploy@prod-server 'cd /app && git pull && docker-compose up -d'"}}]}
|
||||
{"role": "tool", "content": "ssh: connect to host prod-server port 22: Connection refused"}
|
||||
{"role": "assistant", "content": "Connection refused. Let me check the server status.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ping -c 3 prod-server"}}]}
|
||||
{"role": "tool", "content": "PING prod-server (192.168.1.100): 56 data bytes\nRequest timeout for icmp_seq 0\nRequest timeout for icmp_seq 1\nRequest timeout for icmp_seq 2\n--- prod-server ping statistics ---\n3 packets transmitted, 0 packets received, 100.0% packet loss"}
|
||||
{"role": "assistant", "content": "The server appears to be down. Let me check if there's an alternative deployment method."}
|
||||
{"role": "user", "content": "Try using the VPS instead"}
|
||||
{"role": "assistant", "content": "I'll try deploying to the VPS.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ssh deploy@vps-server 'cd /app && git pull && docker-compose up -d'"}}]}
|
||||
{"role": "tool", "content": "Permission denied (publickey)."}
|
||||
{"role": "assistant", "content": "Permission denied. The SSH key might not be set up correctly. Let me check the SSH configuration.", "tool_calls": [{"function": "terminal", "arguments": {"command": "cat ~/.ssh/config"}}]}
|
||||
{"role": "tool", "content": "Host vps-server\n HostName 167.99.126.228\n User deploy\n IdentityFile ~/.ssh/id_rsa_vps"}
|
||||
{"role": "assistant", "content": "The SSH config looks correct. Let me check if the key exists.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ls -la ~/.ssh/id_rsa_vps"}}]}
|
||||
{"role": "tool", "content": "ls: /Users/user/.ssh/id_rsa_vps: No such file or directory"}
|
||||
{"role": "assistant", "content": "The SSH key doesn't exist. This is a pitfall - we need to generate the SSH key first."}
|
||||
{"role": "user", "content": "Generate a new SSH key for the VPS"}
|
||||
{"role": "assistant", "content": "I'll generate a new SSH key.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa_vps -N ''"}}]}
|
||||
{"role": "tool", "content": "Generating public/private rsa key pair.\nYour identification has been saved in /Users/user/.ssh/id_rsa_vps\nYour public key has been saved in /Users/user/.ssh/id_rsa_vps.pub\nThe key fingerprint is:\nSHA256:abc123... user@machine"}
|
||||
{"role": "assistant", "content": "SSH key generated. Now I need to add it to the VPS."}
|
||||
14
test_sessions/session_partial.jsonl
Normal file
14
test_sessions/session_partial.jsonl
Normal file
@@ -0,0 +1,14 @@
|
||||
{"role": "user", "content": "Check the Gitea API for open issues"}
|
||||
{"role": "assistant", "content": "I'll check the Gitea API for open issues.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -H 'Authorization: token $GITEA_TOKEN' https://forge.alexanderwhitestone.com/api/v1/repos/Timmy_Foundation/compounding-intelligence/issues?state=open"}}]}
|
||||
{"role": "tool", "content": "curl: (6) Could not resolve host: forge.alexanderwhitestone.com"}
|
||||
{"role": "assistant", "content": "DNS resolution failed. Let me check the network connectivity.", "tool_calls": [{"function": "terminal", "arguments": {"command": "nslookup forge.alexanderwhitestone.com"}}]}
|
||||
{"role": "tool", "content": "Server: 8.8.8.8\nAddress: 8.8.8.8#53\n\n** server can't find forge.alexanderwhitestone.com: NXDOMAIN"}
|
||||
{"role": "assistant", "content": "The domain doesn't resolve. This might be a DNS issue. Let me try using the IP address directly.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -H 'Authorization: token $GITEA_TOKEN' https://167.99.126.228/api/v1/repos/Timmy_Foundation/compounding-intelligence/issues?state=open"}}]}
|
||||
{"role": "tool", "content": "curl: (60) SSL certificate problem: self signed certificate"}
|
||||
{"role": "assistant", "content": "SSL certificate issue. The server is using a self-signed certificate. Let me try with -k flag to skip SSL verification.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -k -H 'Authorization: token $GITEA_TOKEN' https://167.99.126.228/api/v1/repos/Timmy_Foundation/compounding-intelligence/issues?state=open"}}]}
|
||||
{"role": "tool", "content": "[]"}
|
||||
{"role": "assistant", "content": "No open issues found. The API returned an empty array."}
|
||||
{"role": "user", "content": "Check if there are any pull requests"}
|
||||
{"role": "assistant", "content": "Let me check for pull requests.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -k -H 'Authorization: token $GITEA_TOKEN' https://167.99.126.228/api/v1/repos/Timmy_Foundation/compounding-intelligence/pulls?state=open"}}]}
|
||||
{"role": "tool", "content": "[{"id": 20, "title": "feat: add harvester.py + session_reader.py — session knowledge extractor (closes #8)", "state": "open", "user": {"login": "Rockachopa"}, "created_at": "2026-04-14T17:28:20Z"}]"}
|
||||
{"role": "assistant", "content": "Found 1 open pull request: PR #20 - "feat: add harvester.py + session_reader.py — session knowledge extractor (closes #8)" by Rockachopa."}
|
||||
10
test_sessions/session_patterns.jsonl
Normal file
10
test_sessions/session_patterns.jsonl
Normal file
@@ -0,0 +1,10 @@
|
||||
{"role": "user", "content": "How do I deploy a new version of the application?"}
|
||||
{"role": "assistant", "content": "Here's the deployment process:\n1. Run tests: pytest\n2. Build Docker image: docker build -t app:latest .\n3. Push to registry: docker push registry.example.com/app:latest\n4. Update production: ssh prod 'cd /app && docker-compose pull && docker-compose up -d'\n5. Verify: curl https://app.example.com/health"}
|
||||
{"role": "user", "content": "What if tests fail?"}
|
||||
{"role": "assistant", "content": "If tests fail:\n1. Check the error logs: pytest -v\n2. Fix the failing tests\n3. Re-run tests: pytest\n4. If still failing, check recent changes: git log --oneline -10\n5. Consider reverting: git revert HEAD"}
|
||||
{"role": "user", "content": "What if the Docker build fails?"}
|
||||
{"role": "assistant", "content": "If Docker build fails:\n1. Check Dockerfile syntax: docker build --no-cache -t app:latest .\n2. Check for missing dependencies: cat requirements.txt\n3. Check disk space: df -h\n4. Clean up Docker: docker system prune -f\n5. Try building with verbose output: docker build --progress=plain -t app:latest ."}
|
||||
{"role": "user", "content": "What if the push to registry fails?"}
|
||||
{"role": "assistant", "content": "If push to registry fails:\n1. Check registry credentials: docker login registry.example.com\n2. Check registry space: curl https://registry.example.com/v2/_catalog\n3. Check image size: docker images app:latest\n4. Try tagging with version: docker tag app:latest registry.example.com/app:v1.0.0\n5. Push with version tag: docker push registry.example.com/app:v1.0.0"}
|
||||
{"role": "user", "content": "What if the production update fails?"}
|
||||
{"role": "assistant", "content": "If production update fails:\n1. Check SSH connection: ssh prod 'echo connected'\n2. Check disk space on prod: ssh prod 'df -h'\n3. Check Docker status: ssh prod 'docker ps -a'\n4. Check logs: ssh prod 'docker-compose logs --tail=100'\n5. Rollback: ssh prod 'cd /app && git checkout HEAD~1 && docker-compose up -d'"}
|
||||
12
test_sessions/session_questions.jsonl
Normal file
12
test_sessions/session_questions.jsonl
Normal file
@@ -0,0 +1,12 @@
|
||||
{"role": "user", "content": "I need to implement a new feature for the application"}
|
||||
{"role": "assistant", "content": "I can help you implement a new feature. What feature would you like to add?"}
|
||||
{"role": "user", "content": "I want to add user authentication"}
|
||||
{"role": "assistant", "content": "User authentication is a good feature. I have a few questions:\n1. What authentication method do you prefer? (JWT, OAuth, Session-based)\n2. Do you need social login? (Google, GitHub, etc.)\n3. What user data do you need to store?\n4. Do you need role-based access control?\n5. What's your database? (PostgreSQL, MySQL, SQLite)"}
|
||||
{"role": "user", "content": "I prefer JWT, no social login, just email and password, no roles, PostgreSQL"}
|
||||
{"role": "assistant", "content": "Great choices. Here's the implementation plan:\n1. Create user model with email and password hash\n2. Create JWT token generation and validation\n3. Create login endpoint: POST /api/auth/login\n4. Create register endpoint: POST /api/auth/register\n5. Create middleware for protected routes\n6. Add password hashing with bcrypt\n7. Add rate limiting for login attempts\n8. Add password reset functionality\n9. Add email verification\n10. Test all endpoints"}
|
||||
{"role": "user", "content": "What about password reset?"}
|
||||
{"role": "assistant", "content": "For password reset:\n1. Create reset token table\n2. Create forgot password endpoint: POST /api/auth/forgot-password\n3. Send reset email with token\n4. Create reset password endpoint: POST /api/auth/reset-password\n5. Validate reset token and expiration\n6. Update password and delete token\n7. Send confirmation email"}
|
||||
{"role": "user", "content": "What about email verification?"}
|
||||
{"role": "assistant", "content": "For email verification:\n1. Add email_verified field to user model\n2. Create verification token table\n3. Send verification email on registration\n4. Create verify email endpoint: GET /api/auth/verify-email?token=...\n5. Validate token and update email_verified\n6. Add middleware to check email_verified for certain routes\n7. Add resend verification email endpoint"}
|
||||
{"role": "user", "content": "What about rate limiting?"}
|
||||
{"role": "assistant", "content": "For rate limiting:\n1. Use Redis for rate limit storage\n2. Create rate limit middleware\n3. Set limits: 5 login attempts per minute, 3 password resets per hour\n4. Return 429 Too Many Requests when limit exceeded\n5. Add X-RateLimit headers to responses\n6. Log rate limit violations\n7. Add whitelist for trusted IPs"}
|
||||
13
test_sessions/session_success.jsonl
Normal file
13
test_sessions/session_success.jsonl
Normal file
@@ -0,0 +1,13 @@
|
||||
{"role": "user", "content": "Clone the repo at forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence"}
|
||||
{"role": "assistant", "content": "I'll clone the repository for you.", "tool_calls": [{"function": "terminal", "arguments": {"command": "git clone https://forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence.git"}}]}
|
||||
{"role": "tool", "content": "Cloning into 'compounding-intelligence'...\nremote: Enumerating objects: 15, done.\nremote: Counting objects: 100% (15/15), done.\nremote: Compressing objects: 100% (12/12), done.\nremote: Total 15 (delta 2), reused 0 (delta 0), pack-reused 0\nReceiving objects: 100% (15/15), done.\nResolving deltas: 100% (2/2), done."}
|
||||
{"role": "assistant", "content": "Repository cloned successfully. Let me check the structure.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ls -la compounding-intelligence/"}}]}
|
||||
{"role": "tool", "content": "total 8\ndrwxr-xr-x 5 user staff 160 Apr 14 17:21 .\ndrwxr-xr-x 12 user staff 384 Apr 14 17:21 ..\ndrwxr-xr-x 13 user staff 416 Apr 14 17:21 .git\n-rw-r--r-- 1 user staff 1234 Apr 14 17:21 README.md\ndrwxr-xr-x 3 user staff 96 Apr 14 17:21 knowledge\n-rw-r--r-- 1 user staff 456 Apr 14 17:21 metrics\ndrwxr-xr-x 4 user staff 128 Apr 14 17:21 scripts\ndrwxr-xr-x 3 user staff 96 Apr 14 17:21 templates"}
|
||||
{"role": "user", "content": "What files are in the templates directory?"}
|
||||
{"role": "assistant", "content": "Let me check the templates directory.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ls -la compounding-intelligence/templates/"}}]}
|
||||
{"role": "tool", "content": "total 0\ndrwxr-xr-x 3 user staff 96 Apr 14 17:21 .\ndrwxr-xr-x 5 user staff 160 Apr 14 17:21 ..\n-rw-r--r-- 1 user staff 0 Apr 14 17:21 .gitkeep"}
|
||||
{"role": "assistant", "content": "The templates directory currently only contains a .gitkeep file. No templates have been added yet."}
|
||||
{"role": "user", "content": "Create a harvest-prompt.md template in the templates directory."}
|
||||
{"role": "assistant", "content": "I'll create the harvest-prompt.md template for you.", "tool_calls": [{"function": "terminal", "arguments": {"command": "cat > compounding-intelligence/templates/harvest-prompt.md << 'EOF'\n# Knowledge Extraction Prompt\n\nExtract knowledge from session transcripts.\nEOF"}}]}
|
||||
{"role": "tool", "content": ""}
|
||||
{"role": "assistant", "content": "Template created successfully. The harvest-prompt.md file is now in the templates directory."}
|
||||
19
tests/test_ci_config.py
Normal file
19
tests/test_ci_config.py
Normal file
@@ -0,0 +1,19 @@
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def test_requirements_makefile_and_workflow_exist() -> None:
|
||||
assert Path("requirements.txt").exists()
|
||||
assert Path("Makefile").exists()
|
||||
assert Path(".gitea/workflows/test.yml").exists()
|
||||
|
||||
|
||||
def test_ci_workflow_runs_project_test_command() -> None:
|
||||
workflow = Path(".gitea/workflows/test.yml").read_text(encoding="utf-8")
|
||||
requirements = Path("requirements.txt").read_text(encoding="utf-8")
|
||||
makefile = Path("Makefile").read_text(encoding="utf-8")
|
||||
|
||||
assert "pytest" in requirements
|
||||
assert "test:" in makefile
|
||||
assert "python3 -m pytest tests/test_ci_config.py scripts/test_*.py -v" in makefile
|
||||
assert "pip install -r requirements.txt" in workflow
|
||||
assert "make test" in workflow
|
||||
207
tests/test_dedup.py
Normal file
207
tests/test_dedup.py
Normal file
@@ -0,0 +1,207 @@
|
||||
"""Tests for knowledge deduplication module (Issue #196)."""
|
||||
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
|
||||
|
||||
from dedup import (
|
||||
normalize_text,
|
||||
content_hash,
|
||||
tokenize,
|
||||
token_similarity,
|
||||
quality_score,
|
||||
merge_facts,
|
||||
dedup_facts,
|
||||
generate_test_duplicates,
|
||||
)
|
||||
|
||||
|
||||
class TestNormalize:
|
||||
def test_lowercases(self):
|
||||
assert normalize_text("Hello World") == "hello world"
|
||||
|
||||
def test_collapses_whitespace(self):
|
||||
assert normalize_text(" hello world ") == "hello world"
|
||||
|
||||
def test_strips(self):
|
||||
assert normalize_text(" text ") == "text"
|
||||
|
||||
|
||||
class TestContentHash:
|
||||
def test_deterministic(self):
|
||||
h1 = content_hash("Hello World")
|
||||
h2 = content_hash("hello world")
|
||||
h3 = content_hash(" Hello World ")
|
||||
assert h1 == h2 == h3
|
||||
|
||||
def test_different_texts(self):
|
||||
h1 = content_hash("Hello")
|
||||
h2 = content_hash("World")
|
||||
assert h1 != h2
|
||||
|
||||
def test_returns_hex(self):
|
||||
h = content_hash("test")
|
||||
assert len(h) == 64 # SHA256
|
||||
assert all(c in '0123456789abcdef' for c in h)
|
||||
|
||||
|
||||
class TestTokenize:
|
||||
def test_extracts_words(self):
|
||||
tokens = tokenize("Hello World Test")
|
||||
assert "hello" in tokens
|
||||
assert "world" in tokens
|
||||
assert "test" in tokens
|
||||
|
||||
def test_skips_short_words(self):
|
||||
tokens = tokenize("a to is the hello")
|
||||
assert "a" not in tokens
|
||||
assert "to" not in tokens
|
||||
assert "hello" in tokens
|
||||
|
||||
def test_returns_set(self):
|
||||
tokens = tokenize("hello hello world")
|
||||
assert isinstance(tokens, set)
|
||||
assert len(tokens) == 2
|
||||
|
||||
|
||||
class TestTokenSimilarity:
|
||||
def test_identical(self):
|
||||
assert token_similarity("hello world", "hello world") == 1.0
|
||||
|
||||
def test_no_overlap(self):
|
||||
assert token_similarity("alpha beta", "gamma delta") == 0.0
|
||||
|
||||
def test_partial_overlap(self):
|
||||
sim = token_similarity("hello world test", "hello universe test")
|
||||
assert 0.3 < sim < 0.7
|
||||
|
||||
def test_empty(self):
|
||||
assert token_similarity("", "hello") == 0.0
|
||||
assert token_similarity("hello", "") == 0.0
|
||||
|
||||
def test_symmetric(self):
|
||||
a = "hello world test"
|
||||
b = "hello universe test"
|
||||
assert token_similarity(a, b) == token_similarity(b, a)
|
||||
|
||||
|
||||
class TestQualityScore:
|
||||
def test_high_confidence(self):
|
||||
fact = {"confidence": 0.95, "source_count": 5, "tags": ["test"], "related": ["x"]}
|
||||
score = quality_score(fact)
|
||||
assert score > 0.7
|
||||
|
||||
def test_low_confidence(self):
|
||||
fact = {"confidence": 0.3, "source_count": 1}
|
||||
score = quality_score(fact)
|
||||
assert score < 0.5
|
||||
|
||||
def test_defaults(self):
|
||||
score = quality_score({})
|
||||
assert 0 < score < 1
|
||||
|
||||
|
||||
class TestMergeFacts:
|
||||
def test_merges_tags(self):
|
||||
keep = {"id": "a", "fact": "test", "tags": ["git"], "confidence": 0.9}
|
||||
drop = {"id": "b", "fact": "test", "tags": ["python"], "confidence": 0.8}
|
||||
merged = merge_facts(keep, drop)
|
||||
assert "git" in merged["tags"]
|
||||
assert "python" in merged["tags"]
|
||||
|
||||
def test_merges_source_count(self):
|
||||
keep = {"id": "a", "fact": "test", "source_count": 3}
|
||||
drop = {"id": "b", "fact": "test", "source_count": 2}
|
||||
merged = merge_facts(keep, drop)
|
||||
assert merged["source_count"] == 5
|
||||
|
||||
def test_keeps_higher_confidence(self):
|
||||
keep = {"id": "a", "fact": "test", "confidence": 0.7}
|
||||
drop = {"id": "b", "fact": "test", "confidence": 0.9}
|
||||
merged = merge_facts(keep, drop)
|
||||
assert merged["confidence"] == 0.9
|
||||
|
||||
def test_tracks_merged_from(self):
|
||||
keep = {"id": "a", "fact": "test"}
|
||||
drop = {"id": "b", "fact": "test"}
|
||||
merged = merge_facts(keep, drop)
|
||||
assert "b" in merged["_merged_from"]
|
||||
|
||||
|
||||
class TestDedupFacts:
|
||||
def test_removes_exact_dupes(self):
|
||||
facts = [
|
||||
{"id": "1", "fact": "Always use git rebase"},
|
||||
{"id": "2", "fact": "Always use git rebase"}, # exact dupe
|
||||
{"id": "3", "fact": "Check logs first"},
|
||||
]
|
||||
deduped, stats = dedup_facts(facts)
|
||||
assert stats["exact_dupes"] == 1
|
||||
assert stats["unique"] == 2
|
||||
|
||||
def test_removes_near_dupes(self):
|
||||
facts = [
|
||||
{"id": "1", "fact": "Always check logs before deploying to production server"},
|
||||
{"id": "2", "fact": "Always check logs before deploying to production environment"},
|
||||
{"id": "3", "fact": "Use docker compose for local development environments"},
|
||||
]
|
||||
deduped, stats = dedup_facts(facts, near_threshold=0.5)
|
||||
assert stats["near_dupes"] >= 1
|
||||
assert stats["unique"] == 2
|
||||
|
||||
def test_preserves_unique(self):
|
||||
facts = [
|
||||
{"id": "1", "fact": "Use git rebase for clean history"},
|
||||
{"id": "2", "fact": "Docker containers should be stateless"},
|
||||
{"id": "3", "fact": "Always write tests before code"},
|
||||
]
|
||||
deduped, stats = dedup_facts(facts)
|
||||
assert stats["unique"] == 3
|
||||
assert stats["removed"] == 0
|
||||
|
||||
def test_empty_input(self):
|
||||
deduped, stats = dedup_facts([])
|
||||
assert stats["total"] == 0
|
||||
assert stats["unique"] == 0
|
||||
|
||||
def test_keeps_higher_quality_near_dup(self):
|
||||
facts = [
|
||||
{"id": "1", "fact": "Check logs before deploying to production server", "confidence": 0.5, "source_count": 1},
|
||||
{"id": "2", "fact": "Check logs before deploying to production environment", "confidence": 0.9, "source_count": 5, "tags": ["ops"]},
|
||||
]
|
||||
deduped, stats = dedup_facts(facts, near_threshold=0.5)
|
||||
assert stats["unique"] == 1
|
||||
# Higher quality fact should be kept
|
||||
assert deduped[0]["confidence"] == 0.9
|
||||
|
||||
def test_dry_run_does_not_modify(self):
|
||||
facts = [
|
||||
{"id": "1", "fact": "Same text"},
|
||||
{"id": "2", "fact": "Same text"},
|
||||
]
|
||||
deduped, stats = dedup_facts(facts, dry_run=True)
|
||||
assert stats["exact_dupes"] == 1
|
||||
# In dry_run, merge_facts is skipped so facts aren't modified
|
||||
assert len(deduped) == 1
|
||||
|
||||
|
||||
class TestGenerateTestDuplicates:
|
||||
def test_generates_correct_count(self):
|
||||
facts = generate_test_duplicates(20)
|
||||
assert len(facts) > 20 # 20 unique + duplicates
|
||||
|
||||
def test_has_exact_dupes(self):
|
||||
facts = generate_test_duplicates(20)
|
||||
hashes = [content_hash(f["fact"]) for f in facts]
|
||||
# Should have some duplicate hashes
|
||||
assert len(hashes) != len(set(hashes))
|
||||
|
||||
def test_dedup_removes_dupes(self):
|
||||
facts = generate_test_duplicates(20)
|
||||
deduped, stats = dedup_facts(facts)
|
||||
assert stats["unique"] <= 20
|
||||
assert stats["removed"] > 0
|
||||
227
tests/test_freshness.py
Normal file
227
tests/test_freshness.py
Normal file
@@ -0,0 +1,227 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Tests for scripts/freshness.py — 8 tests."""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import tempfile
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__) or ".", ".."))
|
||||
import importlib.util
|
||||
spec = importlib.util.spec_from_file_location(
|
||||
"freshness", os.path.join(os.path.dirname(__file__) or ".", "..", "scripts", "freshness.py"))
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(mod)
|
||||
|
||||
compute_file_hash = mod.compute_file_hash
|
||||
check_freshness = mod.check_freshness
|
||||
load_knowledge_entries = mod.load_knowledge_entries
|
||||
|
||||
|
||||
def test_compute_file_hash():
|
||||
"""File hash should be computed correctly."""
|
||||
with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
|
||||
f.write("test content")
|
||||
f.flush()
|
||||
h = compute_file_hash(f.name)
|
||||
assert h is not None
|
||||
assert h.startswith("sha256:")
|
||||
os.unlink(f.name)
|
||||
print("PASS: test_compute_file_hash")
|
||||
|
||||
|
||||
def test_compute_file_hash_nonexistent():
|
||||
"""Nonexistent file should return None."""
|
||||
h = compute_file_hash("/nonexistent/file.txt")
|
||||
assert h is None
|
||||
print("PASS: test_compute_file_hash_nonexistent")
|
||||
|
||||
|
||||
def test_load_knowledge_entries_empty():
|
||||
"""Empty knowledge dir should return empty list."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
entries = load_knowledge_entries(tmpdir)
|
||||
assert entries == []
|
||||
print("PASS: test_load_knowledge_entries_empty")
|
||||
|
||||
|
||||
def test_load_knowledge_entries_from_index():
|
||||
"""Should load entries from index.json."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Create index.json
|
||||
index_path = os.path.join(tmpdir, "index.json")
|
||||
with open(index_path, "w") as f:
|
||||
json.dump({
|
||||
"facts": [
|
||||
{
|
||||
"fact": "Test fact",
|
||||
"source_file": "test.py",
|
||||
"source_hash": "sha256:abc123",
|
||||
"category": "fact",
|
||||
"confidence": 0.9
|
||||
}
|
||||
]
|
||||
}, f)
|
||||
|
||||
entries = load_knowledge_entries(tmpdir)
|
||||
assert len(entries) == 1
|
||||
assert entries[0]["fact"] == "Test fact"
|
||||
assert entries[0]["source_file"] == "test.py"
|
||||
print("PASS: test_load_knowledge_entries_from_index")
|
||||
|
||||
|
||||
def test_load_knowledge_entries_from_yaml():
|
||||
"""Should load entries from YAML files."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Create global directory
|
||||
global_dir = os.path.join(tmpdir, "global")
|
||||
os.makedirs(global_dir)
|
||||
|
||||
# Create YAML file
|
||||
yaml_path = os.path.join(global_dir, "test.yaml")
|
||||
with open(yaml_path, "w") as f:
|
||||
f.write("""
|
||||
pitfalls:
|
||||
- description: "Test pitfall"
|
||||
source_file: "test.py"
|
||||
source_hash: "sha256:def456"
|
||||
category: "pitfall"
|
||||
confidence: 0.8
|
||||
""")
|
||||
|
||||
entries = load_knowledge_entries(tmpdir)
|
||||
assert len(entries) == 1
|
||||
assert entries[0]["fact"] == "Test pitfall"
|
||||
assert entries[0]["category"] == "pitfall"
|
||||
print("PASS: test_load_knowledge_entries_from_yaml")
|
||||
|
||||
|
||||
def test_check_freshness_no_changes():
|
||||
"""With no source file reference, entries should be counted correctly."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Create knowledge dir
|
||||
knowledge_dir = os.path.join(tmpdir, "knowledge")
|
||||
os.makedirs(knowledge_dir)
|
||||
|
||||
# Create repo dir
|
||||
repo_dir = os.path.join(tmpdir, "repo")
|
||||
os.makedirs(repo_dir)
|
||||
|
||||
# Create index.json with entry that has no source_file
|
||||
index_path = os.path.join(knowledge_dir, "index.json")
|
||||
with open(index_path, "w") as f:
|
||||
json.dump({
|
||||
"facts": [
|
||||
{
|
||||
"fact": "General knowledge",
|
||||
"category": "fact",
|
||||
"confidence": 0.9
|
||||
# No source_file or source_hash
|
||||
}
|
||||
]
|
||||
}, f)
|
||||
|
||||
result = check_freshness(knowledge_dir, repo_dir, days=1)
|
||||
|
||||
# Entry without source_file should be counted as "fresh" (no_source status)
|
||||
assert result["summary"]["total"] == 1
|
||||
assert result["summary"]["stale"] == 0
|
||||
assert result["summary"]["fresh"] == 1
|
||||
assert result["fresh_entries"][0]["status"] == "no_source"
|
||||
print("PASS: test_check_freshness_no_changes")
|
||||
|
||||
|
||||
def test_check_freshness_with_hash_mismatch():
|
||||
"""Hash mismatch should mark entry as stale."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Create knowledge dir
|
||||
knowledge_dir = os.path.join(tmpdir, "knowledge")
|
||||
os.makedirs(knowledge_dir)
|
||||
|
||||
# Create repo dir with a file
|
||||
repo_dir = os.path.join(tmpdir, "repo")
|
||||
os.makedirs(repo_dir)
|
||||
|
||||
test_file = os.path.join(repo_dir, "test.py")
|
||||
with open(test_file, "w") as f:
|
||||
f.write("print('hello')")
|
||||
|
||||
# Create index.json with wrong hash
|
||||
index_path = os.path.join(knowledge_dir, "index.json")
|
||||
with open(index_path, "w") as f:
|
||||
json.dump({
|
||||
"facts": [
|
||||
{
|
||||
"fact": "Test fact",
|
||||
"source_file": "test.py",
|
||||
"source_hash": "sha256:wronghash",
|
||||
"category": "fact",
|
||||
"confidence": 0.9
|
||||
}
|
||||
]
|
||||
}, f)
|
||||
|
||||
# Initialize git repo
|
||||
os.system(f"cd {repo_dir} && git init && git add . && git commit -m 'init' 2>/dev/null")
|
||||
|
||||
result = check_freshness(knowledge_dir, repo_dir, days=1)
|
||||
|
||||
assert result["summary"]["total"] == 1
|
||||
assert result["summary"]["stale"] == 1
|
||||
assert result["summary"]["fresh"] == 0
|
||||
assert result["stale_entries"][0]["reason"] == "hash_mismatch"
|
||||
print("PASS: test_check_freshness_with_hash_mismatch")
|
||||
|
||||
|
||||
def test_check_freshness_missing_source():
|
||||
"""Missing source file should mark entry as stale."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Create knowledge dir
|
||||
knowledge_dir = os.path.join(tmpdir, "knowledge")
|
||||
os.makedirs(knowledge_dir)
|
||||
|
||||
# Create repo dir (without the referenced file)
|
||||
repo_dir = os.path.join(tmpdir, "repo")
|
||||
os.makedirs(repo_dir)
|
||||
|
||||
# Create index.json referencing nonexistent file
|
||||
index_path = os.path.join(knowledge_dir, "index.json")
|
||||
with open(index_path, "w") as f:
|
||||
json.dump({
|
||||
"facts": [
|
||||
{
|
||||
"fact": "Test fact",
|
||||
"source_file": "nonexistent.py",
|
||||
"source_hash": "sha256:abc123",
|
||||
"category": "fact",
|
||||
"confidence": 0.9
|
||||
}
|
||||
]
|
||||
}, f)
|
||||
|
||||
# Initialize git repo
|
||||
os.system(f"cd {repo_dir} && git init && git add . && git commit -m 'init' 2>/dev/null")
|
||||
|
||||
result = check_freshness(knowledge_dir, repo_dir, days=1)
|
||||
|
||||
assert result["summary"]["total"] == 1
|
||||
assert result["summary"]["stale"] == 1
|
||||
assert result["summary"]["fresh"] == 0
|
||||
assert result["stale_entries"][0]["reason"] == "source_missing"
|
||||
print("PASS: test_check_freshness_missing_source")
|
||||
|
||||
|
||||
def run_all():
|
||||
test_compute_file_hash()
|
||||
test_compute_file_hash_nonexistent()
|
||||
test_load_knowledge_entries_empty()
|
||||
test_load_knowledge_entries_from_index()
|
||||
test_load_knowledge_entries_from_yaml()
|
||||
test_check_freshness_no_changes()
|
||||
test_check_freshness_with_hash_mismatch()
|
||||
test_check_freshness_missing_source()
|
||||
print("\nAll 8 tests passed!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run_all()
|
||||
141
tests/test_knowledge_gap_identifier.py
Normal file
141
tests/test_knowledge_gap_identifier.py
Normal file
@@ -0,0 +1,141 @@
|
||||
"""Tests for knowledge_gap_identifier module."""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import tempfile
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'scripts'))
|
||||
|
||||
from knowledge_gap_identifier import KnowledgeGapIdentifier, GapType, GapSeverity
|
||||
|
||||
|
||||
def _make_repo(tmpdir, structure):
|
||||
"""Create a test repo from a dict of {path: content}."""
|
||||
for rel_path, content in structure.items():
|
||||
p = Path(tmpdir) / rel_path
|
||||
p.parent.mkdir(parents=True, exist_ok=True)
|
||||
p.write_text(content)
|
||||
|
||||
|
||||
def test_undocumented_symbol():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
_make_repo(tmpdir, {
|
||||
"src/calculator.py": "def add(a, b):\n return a + b\n",
|
||||
"README.md": "# Calculator\n",
|
||||
})
|
||||
report = KnowledgeGapIdentifier().analyze(tmpdir)
|
||||
undocumented = [g for g in report.gaps if g.gap_type == GapType.UNDOCUMENTED]
|
||||
assert any(g.name == "add" for g in undocumented), "add should be undocumented"
|
||||
|
||||
|
||||
def test_documented_symbol_no_gap():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
_make_repo(tmpdir, {
|
||||
"src/calculator.py": "def add(a, b):\n return a + b\n",
|
||||
"README.md": "# Calculator\nUse `add()` to add numbers.\n",
|
||||
})
|
||||
report = KnowledgeGapIdentifier().analyze(tmpdir)
|
||||
undocumented = [g for g in report.gaps
|
||||
if g.gap_type == GapType.UNDOCUMENTED and g.name == "add"]
|
||||
assert len(undocumented) == 0, "add is documented, should not be flagged"
|
||||
|
||||
|
||||
def test_untested_module():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
_make_repo(tmpdir, {
|
||||
"src/calculator.py": "def add(a, b):\n return a + b\n",
|
||||
"src/helper.py": "def format(x):\n return str(x)\n",
|
||||
"tests/test_calculator.py": "from src.calculator import add\nassert add(1,2) == 3\n",
|
||||
})
|
||||
report = KnowledgeGapIdentifier().analyze(tmpdir)
|
||||
untested = [g for g in report.gaps if g.gap_type == GapType.UNTESTED]
|
||||
assert any("helper" in g.name for g in untested), "helper should be untested"
|
||||
|
||||
|
||||
def test_tested_module_no_gap():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
_make_repo(tmpdir, {
|
||||
"src/calculator.py": "def add(a, b):\n return a + b\n",
|
||||
"tests/test_calculator.py": "def test_add():\n assert True\n",
|
||||
})
|
||||
report = KnowledgeGapIdentifier().analyze(tmpdir)
|
||||
untested = [g for g in report.gaps
|
||||
if g.gap_type == GapType.UNTESTED and "calculator" in g.name]
|
||||
assert len(untested) == 0, "calculator has tests, should not be flagged"
|
||||
|
||||
|
||||
def test_missing_implementation():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
_make_repo(tmpdir, {
|
||||
"src/app.py": "def run():\n pass\n",
|
||||
"docs/api.md": "# API\nUse `NonExistentClass` to do things.\n",
|
||||
})
|
||||
report = KnowledgeGapIdentifier().analyze(tmpdir)
|
||||
missing = [g for g in report.gaps if g.gap_type == GapType.MISSING_IMPLEMENTATION]
|
||||
assert any(g.name == "NonExistentClass" for g in missing)
|
||||
|
||||
|
||||
def test_private_symbols_skipped():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
_make_repo(tmpdir, {
|
||||
"src/app.py": "def _internal():\n pass\ndef public():\n pass\n",
|
||||
"README.md": "# App\n",
|
||||
})
|
||||
report = KnowledgeGapIdentifier().analyze(tmpdir)
|
||||
undocumented_names = [g.name for g in report.gaps if g.gap_type == GapType.UNDOCUMENTED]
|
||||
assert "_internal" not in undocumented_names, "Private symbols should be skipped"
|
||||
assert "public" in undocumented_names
|
||||
|
||||
|
||||
def test_empty_repo():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
report = KnowledgeGapIdentifier().analyze(tmpdir)
|
||||
assert len(report.gaps) == 0
|
||||
|
||||
|
||||
def test_invalid_path():
|
||||
report = KnowledgeGapIdentifier().analyze("/nonexistent/path/xyz")
|
||||
assert len(report.gaps) == 1
|
||||
assert report.gaps[0].severity == GapSeverity.ERROR
|
||||
|
||||
|
||||
def test_report_summary():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
_make_repo(tmpdir, {
|
||||
"src/app.py": "class MyService:\n def handle(self):\n pass\n",
|
||||
"README.md": "# App\n",
|
||||
})
|
||||
report = KnowledgeGapIdentifier().analyze(tmpdir)
|
||||
summary = report.summary()
|
||||
assert "UNDOCUMENTED" in summary
|
||||
assert "MyService" in summary
|
||||
|
||||
|
||||
def test_report_to_dict():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
_make_repo(tmpdir, {
|
||||
"src/app.py": "def hello():\n pass\n",
|
||||
"README.md": "# App\n",
|
||||
})
|
||||
report = KnowledgeGapIdentifier().analyze(tmpdir)
|
||||
d = report.to_dict()
|
||||
assert "total_gaps" in d
|
||||
assert "gaps" in d
|
||||
assert isinstance(d["gaps"], list)
|
||||
assert d["total_gaps"] > 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
test_undocumented_symbol()
|
||||
test_documented_symbol_no_gap()
|
||||
test_untested_module()
|
||||
test_tested_module_no_gap()
|
||||
test_missing_implementation()
|
||||
test_private_symbols_skipped()
|
||||
test_empty_repo()
|
||||
test_invalid_path()
|
||||
test_report_summary()
|
||||
test_report_to_dict()
|
||||
print("All 10 tests passed.")
|
||||
319
tests/test_perf_bottleneck_finder.py
Normal file
319
tests/test_perf_bottleneck_finder.py
Normal file
@@ -0,0 +1,319 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Tests for Performance Bottleneck Finder.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import tempfile
|
||||
import textwrap
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
# Add scripts to path
|
||||
import sys
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "scripts"))
|
||||
|
||||
from perf_bottleneck_finder import (
|
||||
Bottleneck,
|
||||
PerfReport,
|
||||
find_slow_tests_by_scan,
|
||||
analyze_build_artifacts,
|
||||
analyze_makefile_targets,
|
||||
find_slow_imports,
|
||||
generate_report,
|
||||
format_markdown,
|
||||
severity_sort_key,
|
||||
)
|
||||
|
||||
|
||||
class TestBottleneck:
|
||||
"""Test Bottleneck dataclass."""
|
||||
|
||||
def test_creation(self):
|
||||
b = Bottleneck(
|
||||
category="test",
|
||||
name="test_foo",
|
||||
duration_s=5.0,
|
||||
severity="warning",
|
||||
recommendation="Mock it"
|
||||
)
|
||||
assert b.category == "test"
|
||||
assert b.name == "test_foo"
|
||||
assert b.duration_s == 5.0
|
||||
assert b.severity == "warning"
|
||||
assert b.recommendation == "Mock it"
|
||||
assert b.file_path is None
|
||||
assert b.line_number is None
|
||||
|
||||
def test_with_location(self):
|
||||
b = Bottleneck(
|
||||
category="test",
|
||||
name="test_bar",
|
||||
duration_s=2.0,
|
||||
severity="info",
|
||||
recommendation="Consider",
|
||||
file_path="tests/test_bar.py",
|
||||
line_number=42
|
||||
)
|
||||
assert b.file_path == "tests/test_bar.py"
|
||||
assert b.line_number == 42
|
||||
|
||||
def test_to_dict(self):
|
||||
b = Bottleneck("test", "x", 1.0, "info", "y")
|
||||
d = b.__dict__
|
||||
assert "category" in d
|
||||
assert "duration_s" in d
|
||||
|
||||
|
||||
class TestPerfReport:
|
||||
"""Test PerfReport dataclass."""
|
||||
|
||||
def test_creation(self):
|
||||
report = PerfReport(
|
||||
timestamp="2026-01-01T00:00:00Z",
|
||||
repo_path="/tmp/repo"
|
||||
)
|
||||
assert report.timestamp == "2026-01-01T00:00:00Z"
|
||||
assert report.bottlenecks == []
|
||||
assert report.summary == {}
|
||||
|
||||
def test_to_dict(self):
|
||||
report = PerfReport(
|
||||
timestamp="2026-01-01T00:00:00Z",
|
||||
repo_path="/tmp/repo",
|
||||
bottlenecks=[Bottleneck("test", "x", 1.0, "info", "y")]
|
||||
)
|
||||
d = report.to_dict()
|
||||
assert "bottlenecks" in d
|
||||
assert len(d["bottlenecks"]) == 1
|
||||
|
||||
|
||||
class TestSeveritySort:
|
||||
"""Test severity sorting."""
|
||||
|
||||
def test_critical_first(self):
|
||||
items = [
|
||||
Bottleneck("test", "a", 1.0, "info", ""),
|
||||
Bottleneck("test", "b", 0.5, "critical", ""),
|
||||
Bottleneck("test", "c", 2.0, "warning", ""),
|
||||
]
|
||||
items.sort(key=severity_sort_key)
|
||||
assert items[0].severity == "critical"
|
||||
assert items[1].severity == "warning"
|
||||
assert items[2].severity == "info"
|
||||
|
||||
def test_duration_within_severity(self):
|
||||
items = [
|
||||
Bottleneck("test", "slow", 10.0, "warning", ""),
|
||||
Bottleneck("test", "fast", 1.0, "warning", ""),
|
||||
]
|
||||
items.sort(key=severity_sort_key)
|
||||
assert items[0].name == "slow" # Higher duration first within same severity
|
||||
|
||||
|
||||
class TestSlowTestScan:
|
||||
"""Test slow test pattern scanning."""
|
||||
|
||||
def test_finds_sleep(self, tmp_path):
|
||||
test_file = tmp_path / "test_sleepy.py"
|
||||
test_file.write_text(textwrap.dedent('''
|
||||
import time
|
||||
|
||||
def test_slow():
|
||||
time.sleep(5)
|
||||
assert True
|
||||
'''))
|
||||
|
||||
bottlenecks = find_slow_tests_by_scan(str(tmp_path))
|
||||
assert len(bottlenecks) >= 1
|
||||
assert any("sleep" in b.recommendation.lower() for b in bottlenecks)
|
||||
|
||||
def test_finds_http_calls(self, tmp_path):
|
||||
test_file = tmp_path / "test_http.py"
|
||||
test_file.write_text(textwrap.dedent('''
|
||||
import requests
|
||||
|
||||
def test_api():
|
||||
resp = requests.get("https://example.com")
|
||||
assert resp.status_code == 200
|
||||
'''))
|
||||
|
||||
bottlenecks = find_slow_tests_by_scan(str(tmp_path))
|
||||
assert len(bottlenecks) >= 1
|
||||
assert any("HTTP" in b.recommendation or "mock" in b.recommendation.lower() for b in bottlenecks)
|
||||
|
||||
def test_skips_non_test_files(self, tmp_path):
|
||||
src_file = tmp_path / "main.py"
|
||||
src_file.write_text("import time\ntime.sleep(10)\n")
|
||||
|
||||
bottlenecks = find_slow_tests_by_scan(str(tmp_path))
|
||||
assert len(bottlenecks) == 0
|
||||
|
||||
def test_handles_missing_dir(self):
|
||||
bottlenecks = find_slow_tests_by_scan("/nonexistent/path")
|
||||
assert bottlenecks == []
|
||||
|
||||
def test_file_path_populated(self, tmp_path):
|
||||
test_file = tmp_path / "test_example.py"
|
||||
test_file.write_text("import time\n\ndef test_it():\n time.sleep(2)\n")
|
||||
|
||||
bottlenecks = find_slow_tests_by_scan(str(tmp_path))
|
||||
assert len(bottlenecks) >= 1
|
||||
assert bottlenecks[0].file_path is not None
|
||||
assert bottlenecks[0].line_number is not None
|
||||
|
||||
|
||||
class TestBuildArtifacts:
|
||||
"""Test build artifact analysis."""
|
||||
|
||||
def test_finds_large_node_modules(self, tmp_path):
|
||||
nm = tmp_path / "node_modules"
|
||||
nm.mkdir()
|
||||
# Create a file > 10MB
|
||||
big_file = nm / "big.txt"
|
||||
big_file.write_bytes(b"x" * (11 * 1024 * 1024))
|
||||
|
||||
bottlenecks = analyze_build_artifacts(str(tmp_path))
|
||||
assert len(bottlenecks) >= 1
|
||||
assert any("node_modules" in b.name for b in bottlenecks)
|
||||
|
||||
def test_ignores_small_dirs(self, tmp_path):
|
||||
nm = tmp_path / "node_modules"
|
||||
nm.mkdir()
|
||||
small_file = nm / "small.txt"
|
||||
small_file.write_bytes(b"x" * 100)
|
||||
|
||||
bottlenecks = analyze_build_artifacts(str(tmp_path))
|
||||
assert not any("node_modules" in b.name for b in bottlenecks)
|
||||
|
||||
def test_finds_pycache(self, tmp_path):
|
||||
cache = tmp_path / "__pycache__"
|
||||
cache.mkdir()
|
||||
big_file = cache / "big.pyc"
|
||||
big_file.write_bytes(b"x" * (11 * 1024 * 1024))
|
||||
|
||||
bottlenecks = analyze_build_artifacts(str(tmp_path))
|
||||
assert any("__pycache__" in b.name for b in bottlenecks)
|
||||
|
||||
|
||||
class TestMakefileAnalysis:
|
||||
"""Test Makefile analysis."""
|
||||
|
||||
def test_finds_pip_install(self, tmp_path):
|
||||
makefile = tmp_path / "Makefile"
|
||||
makefile.write_text(textwrap.dedent('''
|
||||
install:
|
||||
pip install -r requirements.txt
|
||||
|
||||
test:
|
||||
pytest
|
||||
'''))
|
||||
|
||||
bottlenecks = analyze_makefile_targets(str(tmp_path))
|
||||
assert len(bottlenecks) >= 1
|
||||
|
||||
def test_no_makefile(self, tmp_path):
|
||||
bottlenecks = analyze_makefile_targets(str(tmp_path))
|
||||
assert bottlenecks == []
|
||||
|
||||
|
||||
class TestImportAnalysis:
|
||||
"""Test heavy import detection."""
|
||||
|
||||
def test_finds_pandas(self, tmp_path):
|
||||
src = tmp_path / "analysis.py"
|
||||
src.write_text("import pandas as pd\n")
|
||||
|
||||
bottlenecks = find_slow_imports(str(tmp_path))
|
||||
assert len(bottlenecks) >= 1
|
||||
assert any("pandas" in b.name for b in bottlenecks)
|
||||
|
||||
def test_finds_torch(self, tmp_path):
|
||||
src = tmp_path / "model.py"
|
||||
src.write_text("import torch\n")
|
||||
|
||||
bottlenecks = find_slow_imports(str(tmp_path))
|
||||
assert any("torch" in b.name for b in bottlenecks)
|
||||
|
||||
def test_skips_light_imports(self, tmp_path):
|
||||
src = tmp_path / "utils.py"
|
||||
src.write_text("import json\nimport os\nimport sys\n")
|
||||
|
||||
bottlenecks = find_slow_imports(str(tmp_path))
|
||||
assert len(bottlenecks) == 0
|
||||
|
||||
|
||||
class TestGenerateReport:
|
||||
"""Test full report generation."""
|
||||
|
||||
def test_empty_repo(self, tmp_path):
|
||||
report = generate_report(str(tmp_path))
|
||||
assert report.summary["total_bottlenecks"] >= 0
|
||||
assert "critical" in report.summary
|
||||
assert "warning" in report.summary
|
||||
|
||||
def test_with_findings(self, tmp_path):
|
||||
# Create a test file with issues
|
||||
test_file = tmp_path / "test_slow.py"
|
||||
test_file.write_text(textwrap.dedent('''
|
||||
import time
|
||||
import requests
|
||||
|
||||
def test_sleepy():
|
||||
time.sleep(3)
|
||||
|
||||
def test_http():
|
||||
requests.get("https://example.com")
|
||||
'''))
|
||||
|
||||
report = generate_report(str(tmp_path))
|
||||
assert report.summary["total_bottlenecks"] >= 2
|
||||
assert len(report.bottlenecks) > 0
|
||||
|
||||
def test_summary_categories(self, tmp_path):
|
||||
report = generate_report(str(tmp_path))
|
||||
assert "by_category" in report.summary
|
||||
|
||||
|
||||
class TestMarkdownReport:
|
||||
"""Test markdown output."""
|
||||
|
||||
def test_format(self):
|
||||
report = PerfReport(
|
||||
timestamp="2026-01-01T00:00:00Z",
|
||||
repo_path="/tmp/repo",
|
||||
bottlenecks=[
|
||||
Bottleneck("test", "slow_test", 5.0, "critical", "Fix it")
|
||||
],
|
||||
summary={
|
||||
"total_bottlenecks": 1,
|
||||
"critical": 1,
|
||||
"warning": 0,
|
||||
"info": 0,
|
||||
"estimated_total_slowdown_s": 5.0,
|
||||
"by_category": {"test": 1},
|
||||
}
|
||||
)
|
||||
md = format_markdown(report)
|
||||
assert "# Performance Bottleneck Report" in md
|
||||
assert "slow_test" in md
|
||||
assert "🔴" in md
|
||||
assert "Fix it" in md
|
||||
|
||||
def test_empty_report(self):
|
||||
report = PerfReport(
|
||||
timestamp="2026-01-01T00:00:00Z",
|
||||
repo_path="/tmp/repo",
|
||||
summary={
|
||||
"total_bottlenecks": 0,
|
||||
"critical": 0,
|
||||
"warning": 0,
|
||||
"info": 0,
|
||||
"estimated_total_slowdown_s": 0,
|
||||
"by_category": {},
|
||||
}
|
||||
)
|
||||
md = format_markdown(report)
|
||||
assert "Total bottlenecks:** 0" in md
|
||||
108
tests/test_quality_gate.py
Normal file
108
tests/test_quality_gate.py
Normal file
@@ -0,0 +1,108 @@
|
||||
"""
|
||||
Tests for quality_gate.py — Knowledge entry quality scoring.
|
||||
"""
|
||||
|
||||
import unittest
|
||||
from datetime import datetime, timezone, timedelta
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||
|
||||
from quality_gate import (
|
||||
score_specificity,
|
||||
score_actionability,
|
||||
score_freshness,
|
||||
score_source_quality,
|
||||
score_entry,
|
||||
filter_entries,
|
||||
)
|
||||
|
||||
|
||||
class TestScoreSpecificity(unittest.TestCase):
|
||||
def test_specific_content_scores_high(self):
|
||||
content = "Run `python3 deploy.py --env prod` on 2026-04-15. Example: step 1 configure nginx."
|
||||
score = score_specificity(content)
|
||||
self.assertGreater(score, 0.6)
|
||||
|
||||
def test_vague_content_scores_low(self):
|
||||
content = "It generally depends. Various factors might affect this. Basically, it varies."
|
||||
score = score_specificity(content)
|
||||
self.assertLess(score, 0.5)
|
||||
|
||||
def test_empty_scores_baseline(self):
|
||||
score = score_specificity("")
|
||||
self.assertAlmostEqual(score, 0.5, delta=0.1)
|
||||
|
||||
|
||||
class TestScoreActionability(unittest.TestCase):
|
||||
def test_actionable_content_scores_high(self):
|
||||
content = "1. Run `pip install -r requirements.txt`\n2. Execute `python3 train.py`\n3. Verify with `pytest`"
|
||||
score = score_actionability(content)
|
||||
self.assertGreater(score, 0.6)
|
||||
|
||||
def test_abstract_content_scores_low(self):
|
||||
content = "The concept of intelligence is fascinating and multifaceted."
|
||||
score = score_actionability(content)
|
||||
self.assertLess(score, 0.5)
|
||||
|
||||
|
||||
class TestScoreFreshness(unittest.TestCase):
|
||||
def test_recent_timestamp_scores_high(self):
|
||||
recent = datetime.now(timezone.utc).isoformat()
|
||||
score = score_freshness(recent)
|
||||
self.assertGreater(score, 0.9)
|
||||
|
||||
def test_old_timestamp_scores_low(self):
|
||||
old = (datetime.now(timezone.utc) - timedelta(days=365)).isoformat()
|
||||
score = score_freshness(old)
|
||||
self.assertLess(score, 0.2)
|
||||
|
||||
def test_none_returns_baseline(self):
|
||||
score = score_freshness(None)
|
||||
self.assertEqual(score, 0.5)
|
||||
|
||||
|
||||
class TestScoreSourceQuality(unittest.TestCase):
|
||||
def test_claude_scores_high(self):
|
||||
self.assertGreater(score_source_quality("claude-sonnet"), 0.85)
|
||||
|
||||
def test_ollama_scores_lower(self):
|
||||
self.assertLess(score_source_quality("ollama"), 0.7)
|
||||
|
||||
def test_unknown_returns_default(self):
|
||||
self.assertEqual(score_source_quality("unknown"), 0.5)
|
||||
|
||||
|
||||
class TestScoreEntry(unittest.TestCase):
|
||||
def test_good_entry_scores_high(self):
|
||||
entry = {
|
||||
"content": "To deploy: run `kubectl apply -f deployment.yaml`. Verify with `kubectl get pods`.",
|
||||
"model": "claude-sonnet",
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
}
|
||||
score = score_entry(entry)
|
||||
self.assertGreater(score, 0.6)
|
||||
|
||||
def test_poor_entry_scores_low(self):
|
||||
entry = {
|
||||
"content": "It depends. Various things might happen.",
|
||||
"model": "unknown",
|
||||
}
|
||||
score = score_entry(entry)
|
||||
self.assertLess(score, 0.5)
|
||||
|
||||
|
||||
class TestFilterEntries(unittest.TestCase):
|
||||
def test_filters_low_quality(self):
|
||||
entries = [
|
||||
{"content": "Run `deploy.py` to fix the issue.", "model": "claude"},
|
||||
{"content": "It might work sometimes.", "model": "unknown"},
|
||||
{"content": "Configure nginx: step 1 edit nginx.conf", "model": "gpt-4"},
|
||||
]
|
||||
filtered = filter_entries(entries, threshold=0.5)
|
||||
self.assertGreaterEqual(len(filtered), 2)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
Reference in New Issue
Block a user