Files
compounding-intelligence/docs/API.md
step35 5a6f41689f
Some checks failed
Test / pytest (pull_request) Failing after 7s
feat: add API Doc Generator — issue #98
- scripts/api_doc_generator.py: AST-based scanner for scripts/ Python modules
- docs/API.md: generated API reference (33 modules, ~500 lines)
- tests/test_api_doc_generator.py: 12 smoke tests (all passing)

The generator extracts module docstrings and public function signatures (name, args, summary) and produces a markdown table per script. One consolidated document per repo (docs/API.md).

Closes #98
2026-04-26 07:02:49 -04:00

27 KiB

Compounding Intelligence — Scripts API Reference

Generated: 2026-04-26 11:02 UTC

This document auto-documents the public API surface of all scripts in scripts/. Each section covers one script: module purpose, public functions, and their signatures.


scripts/api_doc_generator.py

API Doc Generator — Issue #98

Function Signature Description
extract_functions_from_ast extract_functions_from_ast(tree, file_rel) Extract public function names, signatures, and first-line doc summaries.
parse_module parse_module(filepath) Parse a Python file and return its module-level docstring and public functions.
scan_scripts_dir scan_scripts_dir(scripts_dir) Scan all .py files in scripts/ and extract API info.
render_markdown render_markdown(modules) Generate full docs/API.md content from the scanned modules.
render_json render_json(modules) Emit machine-readable JSON version of the API reference.
main main() -

scripts/automation_opportunity_finder.py

Automation Opportunity Finder — Scan fleet for manual processes that could be automated.

Function Signature Description
analyze_cron_jobs analyze_cron_jobs(hermes_home) Analyze cron job definitions for automation gaps.
analyze_documents analyze_documents(root_dirs) Scan documentation for manual step patterns.
analyze_scripts analyze_scripts(root_dirs) Detect repeated command sequences in scripts.
analyze_session_transcripts analyze_session_transcripts(session_dirs) Find repeated tool-call patterns in session transcripts.
analyze_shell_history analyze_shell_history(root_dirs) Find repeated shell commands from history files.
deduplicate_proposals deduplicate_proposals(proposals) Remove duplicate proposals based on title similarity.
rank_proposals rank_proposals(proposals) Sort proposals by impact * confidence (highest first).
format_text_report format_text_report(proposals) Format proposals as human-readable text.
main main() -

scripts/bootstrapper.py

Bootstrapper — assemble pre-session context from knowledge store.

Function Signature Description
load_index load_index(index_path) Load and validate the knowledge index.
filter_facts filter_facts(facts, repo, agent, include_global) Filter facts by repo, agent, and global scope.
sort_facts sort_facts(facts) Sort facts by: confidence (desc), then category priority, then fact text.
load_repo_knowledge load_repo_knowledge(repo) Load per-repo knowledge markdown if it exists.
load_agent_knowledge load_agent_knowledge(agent) Load per-agent knowledge markdown if it exists.
load_global_knowledge load_global_knowledge() Load all global knowledge markdown files.
render_facts_section render_facts_section(facts, category, label) Render a section of facts for a single category.
estimate_tokens estimate_tokens(text) Rough token estimate.
truncate_to_tokens truncate_to_tokens(text, max_tokens) Truncate text to approximately max_tokens, cutting at line boundaries.
build_bootstrap_context build_bootstrap_context(repo, agent, include_global, max_tokens, index_path) Build the full bootstrap context block.
main main() -

scripts/dead_code_detector.py

Dead Code Detector for Python Codebases

Function Signature Description
is_safe_unused is_safe_unused(name, filepath) Check if an unused name is expected to be unused.
get_git_blame get_git_blame(filepath, lineno) Get last author of a line via git blame.
analyze_file analyze_file(filepath) Analyze a single Python file for dead code.
scan_repo scan_repo(repo_path, exclude_patterns) Scan an entire repo for dead code.
main main() -

scripts/dedup.py

dedup.py — Knowledge deduplication: content hash + semantic similarity.

Function Signature Description
normalize_text normalize_text(text) Normalize text for hashing: lowercase, collapse whitespace, strip.
content_hash content_hash(text) SHA256 hash of normalized text for exact dedup.
tokenize tokenize(text) Simple tokenizer: lowercase words, 3+ chars.
token_similarity token_similarity(a, b) Token-based Jaccard similarity (0.0-1.0).
quality_score quality_score(fact) Compute quality score for merge ranking.
merge_facts merge_facts(keep, drop) Merge two near-duplicate facts, keeping higher-quality fields.
dedup_facts dedup_facts(facts, exact_threshold, near_threshold, dry_run) Deduplicate a list of knowledge facts.
dedup_index_file dedup_index_file(input_path, output_path, near_threshold, dry_run) Deduplicate an index.json file.
generate_test_duplicates generate_test_duplicates(n) Generate test facts with intentional duplicates for testing.
main main() -

scripts/dependency_graph.py

Cross-Repo Dependency Graph Builder

Function Signature Description
normalize_repo_name normalize_repo_name(name) Normalize a repo name for comparison.
scan_file_for_deps scan_file_for_deps(filepath, content, own_repo) Scan a file's content for references to other repos.
scan_repo scan_repo(repo_path, repo_name) Scan a repo directory for dependencies.
detect_cycles detect_cycles(graph) Detect circular dependencies using DFS.
to_dot to_dot(graph) Generate DOT format output.
to_mermaid to_mermaid(graph) Generate Mermaid format output.
main main() -

scripts/diff_analyzer.py

Diff Analyzer — Parse unified diffs and categorize every change.

(no public functions — script runs as main() only)

scripts/freshness.py

Knowledge Freshness Cron — Detect stale entries from code changes (Issue #200)

Function Signature Description
compute_file_hash compute_file_hash(filepath) Compute SHA-256 hash of a file. Returns None if file doesn't exist.
get_git_file_changes get_git_file_changes(repo_path, days) Get files changed in git in the last N days.
load_knowledge_entries load_knowledge_entries(knowledge_dir) Load knowledge entries from YAML files in the knowledge directory.
check_freshness check_freshness(knowledge_dir, repo_root, days) Check freshness of knowledge entries against recent code changes.
update_stale_hashes update_stale_hashes(knowledge_dir, repo_root) Update hashes for stale entries. Returns count of updated entries.
format_report format_report(result, max_items) Format freshness check results as a human-readable report.
main main() -

scripts/gitea_issue_parser.py

Gitea Issue Body Parser — Extract structured data from markdown issue bodies.

Function Signature Description
parse_issue_body parse_issue_body(body, title, labels) Parse a Gitea issue markdown body into structured JSON.
fetch_issue_from_url fetch_issue_from_url(url) Fetch an issue from a Gitea API URL and parse it.
main main() -

scripts/harvester.py

harvester.py — Extract durable knowledge from Hermes session transcripts.

Function Signature Description
find_api_key find_api_key() Find API key from common locations.
load_extraction_prompt load_extraction_prompt() Load the extraction prompt template.
call_llm call_llm(prompt, transcript, api_base, api_key, model) Call the LLM API to extract knowledge from a transcript.
parse_extraction_response parse_extraction_response(content) Parse the LLM response to extract knowledge items.
load_existing_knowledge load_existing_knowledge(knowledge_dir) Load the existing knowledge index.
fact_fingerprint fact_fingerprint(fact) Generate a deduplication fingerprint for a fact.
deduplicate deduplicate(new_facts, existing, similarity_threshold) Remove duplicate facts from new_facts that already exist in the knowledge store.
validate_fact validate_fact(fact) Validate a single knowledge item has required fields.
write_knowledge write_knowledge(index, new_facts, knowledge_dir, source_session) Write new facts to the knowledge store.
harvest_session harvest_session(session_path, knowledge_dir, api_base, api_key, model, dry_run, min_confidence) Harvest knowledge from a single session.
batch_harvest batch_harvest(sessions_dir, knowledge_dir, api_base, api_key, model, since, limit, dry_run) Harvest knowledge from multiple sessions in batch.
main main() -

scripts/improvement_proposals.py

Improvement Proposal Generator for compounding-intelligence.

Function Signature Description
analyze_sessions analyze_sessions(sessions) Analyze session data to find waste patterns.
generate_proposals generate_proposals(patterns, hourly_rate, implementation_overhead) Generate improvement proposals from waste patterns.
format_proposals_markdown format_proposals_markdown(proposals, patterns, generated_at) Format proposals as a markdown document.
format_proposals_json format_proposals_json(proposals) Format proposals as JSON.
main main() -

scripts/knowledge_gap_identifier.py

Knowledge Gap Identifier — Pipeline 10.7

(no public functions — script runs as main() only)

scripts/knowledge_staleness_check.py

Knowledge Store Staleness Detector — Detect stale knowledge entries by comparing source file hashes.

Function Signature Description
compute_file_hash compute_file_hash(filepath) Compute SHA-256 hash of a file. Returns None if file doesn't exist.
check_staleness check_staleness(index_path, repo_root) Check all entries in knowledge index for staleness.
fix_hashes fix_hashes(index_path, repo_root) Add hashes to entries missing them. Returns count of fixed entries.
main main() -

scripts/perf_bottleneck_finder.py

Performance Bottleneck Finder — Identify slow tests, builds, and CI steps.

Function Signature Description
find_slow_tests_pytest find_slow_tests_pytest(repo_path) Run pytest --durations and parse slow tests.
find_slow_tests_by_scan find_slow_tests_by_scan(repo_path) Scan test files for patterns that indicate slow tests.
analyze_build_artifacts analyze_build_artifacts(repo_path) Find large build artifacts that slow down builds.
analyze_makefile_targets analyze_makefile_targets(repo_path) Analyze Makefile for potentially slow targets.
analyze_github_actions analyze_github_actions(repo_path) Analyze GitHub Actions workflow files for inefficiencies.
analyze_gitea_ci analyze_gitea_ci(repo_path) Analyze Gitea/Drone CI config files.
find_slow_imports find_slow_imports(repo_path) Find Python files with heavy import chains.
severity_sort_key severity_sort_key(b) Sort by severity then duration.
generate_report generate_report(repo_path) Run all analyses and generate a performance report.
format_markdown format_markdown(report) Format report as markdown.
main main() -

scripts/priority_rebalancer.py

Priority Rebalancer — Re-evaluate issue priorities based on accumulated data.

Function Signature Description
collect_knowledge_signals collect_knowledge_signals(knowledge_dir) Analyze knowledge store for coverage gaps and staleness.
collect_staleness_signals collect_staleness_signals(scripts_dir, knowledge_dir) Run staleness checker if available.
collect_metrics_signals collect_metrics_signals(metrics_dir) Analyze metrics directory for pipeline health.
extract_priority extract_priority(labels) Extract priority level from issue labels.
compute_issue_score compute_issue_score(issue, repo, signals, now) Compute priority score for a single issue.
generate_report generate_report(scores, signals, org, repos_scanned) Generate the full priority report.
generate_markdown_report generate_markdown_report(report) Generate human-readable markdown report.
main main() -

scripts/refactoring_opportunity_finder.py

Finds refactoring opportunities in codebases

Function Signature Description
compute_file_complexity compute_file_complexity(filepath) Compute cyclomatic complexity for a Python file.
calculate_refactoring_score calculate_refactoring_score(metrics) Calculate a refactoring priority score (0-100) based on file metrics.
scan_directory scan_directory(directory, extensions) Scan directory for source files.
generate_proposals generate_proposals(directory, min_score) Generate refactoring proposals by analyzing source files.
main main() -

scripts/sampler.py

sampler.py — Score and rank sessions by harvest value.

Function Signature Description
scan_session_fast scan_session_fast(path) Extract scoring metadata from a session without parsing the full JSONL.
parse_session_timestamp parse_session_timestamp(filename) Parse timestamp from session filename.
score_session score_session(meta, now, seen_repos) Score a session for harvest value. Returns (score, breakdown).
main main() -

scripts/session_metadata.py

session_metadata.py - Extract structured metadata from Hermes session transcripts.

Function Signature Description
extract_session_metadata extract_session_metadata(file_path) Extract structured metadata from a Hermes session JSONL transcript.
process_session_directory process_session_directory(directory_path, output_file) Process all JSONL files in a directory.
main main() CLI entry point.

scripts/session_pair_harvester.py

Session Transcript → Training Pair Harvester

Function Signature Description
compute_hash compute_hash(text) Content hash for deduplication.
extract_pairs_from_session extract_pairs_from_session(session_data, min_ratio, min_response_words) Extract terse→rich pairs from a single session object.
extract_from_jsonl_file extract_from_jsonl_file(filepath, **kwargs) Extract pairs from a session JSONL file.
deduplicate_pairs deduplicate_pairs(pairs) Remove duplicate pairs across files.
main main() -

scripts/session_reader.py

session_reader.py — Parse Hermes session JSONL transcripts.

Function Signature Description
read_session read_session(path) Read a session JSONL file and return all messages as a list.
read_session_iter read_session_iter(path) Iterate over session messages without loading all into memory.
extract_conversation extract_conversation(messages) Extract user/assistant conversation turns, skipping tool-only messages.
truncate_for_context truncate_for_context(messages, head, tail) Truncate long sessions: keep first N + last N messages.
messages_to_text messages_to_text(messages) Convert message list to plain text for LLM consumption.
get_session_metadata get_session_metadata(path) Extract metadata from a session file (first message often has config info).

scripts/test_automation_opportunity_finder.py

Tests for scripts/automation_opportunity_finder.py — 8 tests.

Function Signature Description
test_analyze_cron_jobs_no_file test_analyze_cron_jobs_no_file() Returns empty list when no cron jobs file exists.
test_analyze_cron_jobs_disabled test_analyze_cron_jobs_disabled() Detects disabled cron jobs.
test_analyze_cron_jobs_errors test_analyze_cron_jobs_errors() Detects cron jobs with error status.
test_analyze_documents_finds_todos test_analyze_documents_finds_todos() Detects TODO markers in documents.
test_analyze_scripts_repeated_commands test_analyze_scripts_repeated_commands() Detects repeated shell commands across scripts.
test_analyze_session_transcripts test_analyze_session_transcripts() Detects repeated tool-call sequences.
test_deduplicate_proposals test_deduplicate_proposals() Deduplicates proposals with similar titles.
test_rank_proposals test_rank_proposals() Ranks proposals by impact * confidence.

scripts/test_bootstrapper.py

Tests for bootstrapper.py — context assembly from knowledge store.

Function Signature Description
make_index make_index(facts, tmp_dir) Create a temporary index.json with given facts.
test_empty_index test_empty_index() Empty knowledge store produces graceful output.
test_filter_by_repo test_filter_by_repo() Filter facts by repository.
test_filter_by_agent test_filter_by_agent() Filter facts by agent type.
test_no_global_flag test_no_global_flag() Excluding global facts works.
test_sort_by_confidence test_sort_by_confidence() Facts sort by confidence descending.
test_sort_pitfalls_first test_sort_pitfalls_first() Pitfalls sort before facts at same confidence.
test_truncate_to_tokens test_truncate_to_tokens() Truncation cuts at line boundary.
test_estimate_tokens test_estimate_tokens() Token estimation is reasonable.
test_build_full_context test_build_full_context() Full context with facts renders correctly.
test_max_tokens_respected test_max_tokens_respected() Output respects max_tokens limit.
test_missing_index_graceful test_missing_index_graceful() Missing index.json doesn't crash.

scripts/test_diff_analyzer.py

Tests for scripts/diff_analyzer.py — 10 tests.

Function Signature Description
test_empty test_empty() -
test_addition test_addition() -
test_deletion test_deletion() -
test_modification test_modification() -
test_rename test_rename() -
test_multiple_files test_multiple_files() -
test_binary test_binary() -
test_to_dict test_to_dict() -
test_context_only test_context_only() -
test_multi_hunk test_multi_hunk() -
run_all run_all() -

scripts/test_gitea_issue_parser.py

Tests for scripts/gitea_issue_parser.py

Function Signature Description
test_basic_parsing test_basic_parsing() -
test_numbered_criteria test_numbered_criteria() -
test_epic_ref_from_body test_epic_ref_from_body() -
test_empty_body test_empty_body() -
test_no_sections test_no_sections() -
test_multiple_sections test_multiple_sections() -
run_all run_all() -

scripts/test_harvest_prompt.py

Test harness for knowledge extraction prompt.

Function Signature Description
validate_knowledge_item validate_knowledge_item(item, idx) Validate a single knowledge item. Returns list of errors.
validate_extraction validate_extraction(data) Validate a full extraction result. Returns (is_valid, errors, warnings).
validate_transcript_coverage validate_transcript_coverage(data, transcript) Check that extracted facts are actually supported by the transcript.
run_tests run_tests() Run the built-in test suite.
validate_file validate_file(filepath) Validate an existing extraction JSON file.

scripts/test_harvest_prompt_comprehensive.py

Comprehensive tests for knowledge extraction prompt.

Function Signature Description
check_prompt_structure check_prompt_structure() -
check_confidence_scoring check_confidence_scoring() -
check_example_quality check_example_quality() -
check_constraint_coverage check_constraint_coverage() -
check_test_sessions check_test_sessions() -
test_prompt_structure test_prompt_structure() -
test_confidence_scoring test_confidence_scoring() -
test_example_quality test_example_quality() -
test_constraint_coverage test_constraint_coverage() -
test_test_sessions test_test_sessions() -

scripts/test_harvester_pipeline.py

Smoke test for harvester pipeline — verifies the full chain:

Function Signature Description
test_session_reader test_session_reader() Test that session_reader parses JSONL correctly.
test_validate_fact test_validate_fact() Test fact validation.
test_deduplicate test_deduplicate() Test deduplication.
test_knowledge_store_roundtrip test_knowledge_store_roundtrip() Test loading and writing knowledge index.
test_full_chain_no_llm test_full_chain_no_llm() Test the full pipeline minus the LLM call.

scripts/test_improvement_proposals.py

Tests for scripts/improvement_proposals.py — 15 tests.

Function Signature Description
test_empty_sessions test_empty_sessions() -
test_no_patterns_on_clean_sessions test_no_patterns_on_clean_sessions() -
test_repeated_error_detection test_repeated_error_detection() Same error across 3+ sessions triggers pattern.
test_repeated_error_threshold test_repeated_error_threshold() 2 occurrences should NOT trigger (threshold is 3).
test_slow_tool_detection test_slow_tool_detection() Tool with avg latency > 5000ms across 5+ calls.
test_fast_tool_not_flagged test_fast_tool_not_flagged() Tool under 5000ms avg should not trigger.
test_failed_retry_detection test_failed_retry_detection() 3+ consecutive calls to same tool triggers retry pattern.
test_manual_process_detection test_manual_process_detection() 10+ tool calls with <= 3 unique tools.
test_generate_proposals_from_patterns test_generate_proposals_from_patterns() Proposals generated from waste patterns.
test_proposal_roi_positive test_proposal_roi_positive() ROI weeks should be a positive number for recoverable time.
test_proposals_sorted_by_impact test_proposals_sorted_by_impact() Proposals should be sorted by monthly hours saved (descending).
test_format_markdown test_format_markdown() Markdown output should contain expected sections.
test_format_json test_format_json() JSON output should be valid and parseable.
test_normalize_error test_normalize_error() Error normalization should remove paths and hashes.
test_cli_integration test_cli_integration() End-to-end test: write input JSON, run script, check output.
run_all run_all() -

scripts/test_knowledge_staleness.py

Tests for scripts/knowledge_staleness_check.py — 8 tests.

Function Signature Description
test_fresh_entry test_fresh_entry() -
test_stale_entry test_stale_entry() -
test_missing_source test_missing_source() -
test_no_hash test_no_hash() -
test_no_source_field test_no_source_field() -
test_fix_hashes test_fix_hashes() -
test_empty_index test_empty_index() -
test_compute_hash_nonexistent test_compute_hash_nonexistent() -
run_all run_all() -

scripts/test_priority_rebalancer.py

Tests for Priority Rebalancer

Function Signature Description
test test(name) -
assert_eq assert_eq(a, b, msg) -
assert_true assert_true(v, msg) -
assert_false assert_false(v, msg) -
make_issue make_issue(**kwargs) -

scripts/test_refactoring_opportunity_finder.py

Tests for scripts/refactoring_opportunity_finder.py — 10 tests.

Function Signature Description
test_complexity_simple_function test_complexity_simple_function() Simple function should have low complexity.
test_complexity_with_conditionals test_complexity_with_conditionals() Function with if/else should have higher complexity.
test_complexity_with_loops test_complexity_with_loops() Function with loops should increase complexity.
test_complexity_with_class test_complexity_with_class() Class with methods should count both.
test_complexity_syntax_error test_complexity_syntax_error() File with syntax error should return zeros.
test_refactoring_score_high_complexity test_refactoring_score_high_complexity() High complexity should give high score.
test_refactoring_score_low_complexity test_refactoring_score_low_complexity() Low complexity should give lower score.
test_refactoring_score_high_churn test_refactoring_score_high_churn() High churn should increase score.
test_refactoring_score_no_coverage test_refactoring_score_no_coverage() No coverage data should assume medium risk.
test_refactoring_score_large_file test_refactoring_score_large_file() Large files should score higher.
run_all run_all() -

scripts/test_session_pair_harvester.py

Tests for session_pair_harvester.

Function Signature Description
test_basic_extraction test_basic_extraction() -
test_filters_short_responses test_filters_short_responses() -
test_skips_tool_results test_skips_tool_results() -
test_deduplication test_deduplication() -
test_ratio_filter test_ratio_filter() -

scripts/validate_knowledge.py

Validate knowledge files and index.json against the schema.

Function Signature Description
validate_fact validate_fact(fact, src) -
main main() -

Total scripts documented: 33

Generated by scripts/api_doc_generator.py (Issue #98)