test: license checker unit tests (#110 )

feat: license checker — Pipeline 5.4 (#110 )
2026-04-15 03:32:58 +00:00 · 2026-04-15 03:31:35 +00:00
3 changed files with 692 additions and 239 deletions
--- a/GENOME.md
+++ b/GENOME.md
@@ -1,239 +0,0 @@
-# GENOME.md — compounding-intelligence
-
-*Auto-generated codebase genome. See timmy-home#676.*
-
---
-
-## Project Overview
-
-**What:** A system that turns 1B+ daily agent tokens into durable, compounding fleet intelligence.
-
-**Why:** Every agent session starts at zero. The same mistakes get made repeatedly — the same HTTP 405 is rediscovered as a branch protection issue, the same token path is searched for from scratch. Intelligence evaporates when the session ends.
-
-**How:** Three pipelines form a compounding loop:
-
-```
-SESSION ENDS → HARVESTER → KNOWLEDGE STORE → BOOTSTRAPPER → NEW SESSION STARTS SMARTER
-                              ↓
-                         MEASURER → Prove it's working
-```
-
-**Status:** Early stage. Template and test scaffolding exist. Core pipeline scripts (harvester.py, bootstrapper.py, measurer.py, session_reader.py) are planned but not yet implemented. The knowledge extraction prompt is complete and validated.
-
---
-
-## Architecture
-
-```mermaid
-graph TD
-    A[Session Transcript<br/>.jsonl] --> B[Harvester]
-    B --> C{Extract Knowledge}
-    C --> D[knowledge/index.json]
-    C --> E[knowledge/global/*.md]
-    C --> F[knowledge/repos/{repo}.md]
-    C --> G[knowledge/agents/{agent}.md]
-    D --> H[Bootstrapper]
-    H --> I[Bootstrap Context<br/>2k token injection]
-    I --> J[New Session<br/>starts smarter]
-    J --> A
-    D --> K[Measurer]
-    K --> L[metrics/dashboard.md]
-    K --> M[Velocity / Hit Rate<br/>Error Reduction]
-```
-
-### Pipeline 1: Harvester
-
-**Status:** Prompt designed. Script not implemented.
-
-Reads finished session transcripts (JSONL). Uses `templates/harvest-prompt.md` to extract durable knowledge into five categories:
-
-| Category | Description | Example |
-|----------|-------------|---------|
-| `fact` | Concrete, verifiable information | "Repository X has 5 files" |
-| `pitfall` | Errors encountered, wrong assumptions | "Token is at ~/.config/gitea/token, not env var" |
-| `pattern` | Successful action sequences | "Deploy: test → build → push → webhook" |
-| `tool-quirk` | Environment-specific behaviors | "URL format requires trailing slash" |
-| `question` | Identified but unanswered | "Need optimal batch size for harvesting" |
-
-Output schema per knowledge item:
-```json
-{
-  "fact": "One sentence description",
-  "category": "fact|pitfall|pattern|tool-quirk|question",
-  "repo": "repo-name or 'global'",
-  "confidence": 0.0-1.0
-}
-```
-
-### Pipeline 2: Bootstrapper
-
-**Status:** Not implemented.
-
-Queries knowledge store before session start. Assembles a compact 2k-token context from relevant facts. Injects into session startup so the agent begins with full situational awareness.
-
-### Pipeline 3: Measurer
-
-**Status:** Not implemented.
-
-Tracks compounding metrics: knowledge velocity (facts/day), error reduction (%), hit rate (knowledge used / knowledge available), task completion improvement.
-
---
-
-## Directory Structure
-
-```
-compounding-intelligence/
-├── README.md                           # Project overview and architecture
-├── GENOME.md                           # This file (codebase genome)
-├── knowledge/                          # [PLANNED] Knowledge store
-│   ├── index.json                      # Machine-readable fact index
-│   ├── global/                         # Cross-repo knowledge
-│   ├── repos/{repo}.md                 # Per-repo knowledge
-│   └── agents/{agent}.md               # Agent-type notes
-├── scripts/
-│   ├── test_harvest_prompt.py          # Basic prompt validation (2.5KB)
-│   └── test_harvest_prompt_comprehensive.py  # Full prompt structure test (6.8KB)
-├── templates/
-│   └── harvest-prompt.md               # Knowledge extraction prompt (3.5KB)
-├── test_sessions/
-│   ├── session_success.jsonl           # Happy path test data
-│   ├── session_failure.jsonl           # Failure path test data
-│   ├── session_partial.jsonl           # Incomplete session test data
-│   ├── session_patterns.jsonl          # Pattern extraction test data
-│   └── session_questions.jsonl         # Question identification test data
-└── metrics/                            # [PLANNED] Compounding metrics
-    └── dashboard.md
-```
-
---
-
-## Entry Points and Data Flow
-
-### Entry Point 1: Knowledge Extraction (Harvester)
-
-```
-Input:  Session transcript (JSONL)
-        ↓
-        templates/harvest-prompt.md (LLM prompt)
-        ↓
-        Knowledge items (JSON array)
-        ↓
-Output: knowledge/index.json + per-repo/per-agent markdown files
-```
-
-### Entry Point 2: Session Bootstrap (Bootstrapper)
-
-```
-Input:  Session context (repo, agent type, task type)
-        ↓
-        knowledge/index.json (query relevant facts)
-        ↓
-        2k-token bootstrap context
-        ↓
-Output: Injected into session startup
-```
-
-### Entry Point 3: Measurement (Measurer)
-
-```
-Input:  knowledge/index.json + session history
-        ↓
-        Velocity, hit rate, error reduction calculations
-        ↓
-Output: metrics/dashboard.md
-```
-
---
-
-## Key Abstractions
-
-### Knowledge Item
-The atomic unit. One sentence, one category, one confidence score. Designed to be small enough that 1000 items fit in a 2k-token bootstrap context.
-
-### Knowledge Store
-A directory structure that mirrors the fleet's mental model:
- `global/` — knowledge that applies everywhere (tool quirks, environment facts)
- `repos/` — knowledge specific to each repo
- `agents/` — knowledge specific to each agent type
-
-### Confidence Score
-0.0–1.0 scale. Defines how certain the harvester is about each extracted fact:
- 0.9–1.0: Explicitly stated with verification
- 0.7–0.8: Clearly implied by multiple data points
- 0.5–0.6: Suggested but not fully verified
- 0.3–0.4: Inferred from limited data
- 0.1–0.2: Speculative or uncertain
-
-### Bootstrap Context
-The 2k-token injection that a new session receives. Assembled from the most relevant knowledge items for the current task, filtered by confidence > 0.7, deduplicated, and compressed.
-
---
-
-## API Surface
-
-### Internal (scripts not yet implemented)
-
-| Script | Input | Output | Status |
-|--------|-------|--------|--------|
-| `harvester.py` | Session JSONL path | Knowledge items JSON | PLANNED |
-| `bootstrapper.py` | Repo + agent type | 2k-token context string | PLANNED |
-| `measurer.py` | Knowledge store path | Metrics JSON | PLANNED |
-| `session_reader.py` | Session JSONL path | Parsed transcript | PLANNED |
-
-### Prompt (templates/harvest-prompt.md)
-
-The extraction prompt is the core "API." It takes a session transcript and returns structured JSON. It defines:
- Five extraction categories
- Output format (JSON array of knowledge items)
- Confidence scoring rubric
- Constraints (no hallucination, specificity, relevance, brevity)
- Example input/output pair
-
---
-
-## Test Coverage
-
-### What Exists
-
-| File | Tests | Coverage |
-|------|-------|----------|
-| `scripts/test_harvest_prompt.py` | 2 tests | Prompt file existence, sample transcript |
-| `scripts/test_harvest_prompt_comprehensive.py` | 5 tests | Prompt structure, categories, fields, confidence scoring, size limits |
-| `test_sessions/*.jsonl` | 5 sessions | Success, failure, partial, patterns, questions |
-
-### What's Missing
-
-1. **Harvester integration test** — Does the prompt actually extract correct knowledge from real transcripts?
-2. **Bootstrapper test** — Does it assemble relevant context correctly?
-3. **Knowledge store test** — Does the index.json maintain consistency?
-4. **Confidence calibration test** — Do high-confidence facts actually prove true in later sessions?
-5. **Deduplication test** — Are duplicate facts across sessions handled?
-6. **Staleness test** — How does the system handle outdated knowledge?
-
---
-
-## Security Considerations
-
-1. **No secrets in knowledge store** — The harvester must filter out API keys, tokens, and credentials from extracted facts. The prompt constraints mention this but there is no automated guard.
-
-2. **Knowledge poisoning** — A malicious or corrupted session could inject false facts. Confidence scoring partially mitigates this, but there is no verification step.
-
-3. **Access control** — The knowledge store has no access control. Any process that can read the directory can read all facts. In a multi-tenant setup, this is a concern.
-
-4. **Transcript privacy** — Session transcripts may contain user data. The harvester must not extract personally identifiable information into the knowledge store.
-
---
-
-## The 100x Path (from README)
-
-```
-Month 1:  15,000 facts, sessions 20% faster
-Month 2:  45,000 facts, sessions 40% faster, first-try success up 30%
-Month 3:  90,000 facts, fleet measurably smarter per token
-```
-
-Each new session is better than the last. The intelligence compounds.
-
---
-
-*Generated by codebase-genome pipeline. Ref: timmy-home#676.*
--- a/scripts/license_checker.py
+++ b/scripts/license_checker.py
@@ -0,0 +1,506 @@
+#!/usr/bin/env python3
+"""
+License Checker — Pipeline 5.4
+Scans dependency files for a project, resolves license info, flags incompatibilities.
+
+Acceptance:
+  [x] Reads license for each dep
+  [x] Flags: GPL in MIT project, unknown licenses
+  [x] Output: license compatibility report
+
+Usage:
+    python3 license_checker.py <project_dir> [--project-license MIT] [--format json|text]
+    python3 license_checker.py <project_dir> --scan-deps
+"""
+
+import argparse
+import json
+import os
+import re
+import subprocess
+import sys
+import urllib.request
+import urllib.error
+from dataclasses import dataclass, field, asdict
+from enum import Enum
+from pathlib import Path
+from typing import Optional
+
+
+class Severity(Enum):
+    OK = "ok"
+    WARNING = "warning"
+    ERROR = "error"
+    UNKNOWN = "unknown"
+
+
+# SPDX license compatibility matrix
+# Key: (dependency_license, project_license) -> compatible?
+# Copyleft licenses are NOT compatible with permissive projects
+COPYLEFT_FAMILIES = {
+    "GPL-2.0", "GPL-2.0-only", "GPL-2.0-or-later",
+    "GPL-3.0", "GPL-3.0-only", "GPL-3.0-or-later",
+    "AGPL-3.0", "AGPL-3.0-only", "AGPL-3.0-or-later",
+    "LGPL-2.0", "LGPL-2.1", "LGPL-3.0",
+    "LGPL-2.0-only", "LGPL-2.1-only", "LGPL-3.0-only",
+    "LGPL-2.0-or-later", "LGPL-2.1-or-later", "LGPL-3.0-or-later",
+    "MPL-2.0",  # Weak copyleft — file-level
+    "EUPL-1.1", "EUPL-1.2",
+    "OSL-3.0",
+    "SSPL-1.0",
+    "CC-BY-SA-4.0", "CC-BY-SA-3.0",
+    "CC-BY-NC-4.0", "CC-BY-NC-3.0",
+}
+
+PERMISSIVE_LICENSES = {
+    "MIT", "BSD-2-Clause", "BSD-3-Clause", "Apache-2.0",
+    "ISC", "Unlicense", "CC0-1.0", "0BSD", "BSL-1.0",
+    "Zlib", "PSF-2.0", "Python-2.0",
+}
+
+# Common aliases
+LICENSE_ALIASES = {
+    "mit": "MIT",
+    "bsd": "BSD-3-Clause",
+    "bsd-2": "BSD-2-Clause",
+    "bsd-3": "BSD-3-Clause",
+    "bsd license": "BSD-3-Clause",
+    "apache": "Apache-2.0",
+    "apache 2.0": "Apache-2.0",
+    "apache-2.0": "Apache-2.0",
+    "apache software license": "Apache-2.0",
+    "apache software license 2.0": "Apache-2.0",
+    "gpl": "GPL-3.0",
+    "gpl-2": "GPL-2.0",
+    "gpl-3": "GPL-3.0",
+    "gplv2": "GPL-2.0",
+    "gplv3": "GPL-3.0",
+    "gnu general public license": "GPL-3.0",
+    "gnu general public license v3": "GPL-3.0",
+    "gnu general public license v2": "GPL-2.0",
+    "gnu lesser general public license v2": "LGPL-2.1",
+    "gnu lesser general public license v3": "LGPL-3.0",
+    "lgpl": "LGPL-3.0",
+    "lgpl-2.1": "LGPL-2.1",
+    "lgpl-3": "LGPL-3.0",
+    "agpl": "AGPL-3.0",
+    "agpl-3.0": "AGPL-3.0",
+    "agplv3": "AGPL-3.0",
+    "isc": "ISC",
+    "mpl": "MPL-2.0",
+    "mpl-2.0": "MPL-2.0",
+    "mozilla public license 2.0": "MPL-2.0",
+    "unlicense": "Unlicense",
+    "public domain": "Unlicense",
+    "cc0": "CC0-1.0",
+    "cc0-1.0": "CC0-1.0",
+    "psf": "PSF-2.0",
+    "python software foundation license": "PSF-2.0",
+    "the mit license": "MIT",
+    "mit license": "MIT",
+}
+
+
+@dataclass
+class DepLicense:
+    name: str
+    version: str = ""
+    license: str = "UNKNOWN"
+    source: str = ""  # where we found the dep (requirements.txt, package.json, etc.)
+    severity: Severity = Severity.UNKNOWN
+    message: str = ""
+
+
+@dataclass
+class LicenseReport:
+    project_dir: str
+    project_license: str = "MIT"
+    dependencies: list = field(default_factory=list)
+    summary: dict = field(default_factory=dict)
+    errors: list = field(default_factory=list)
+    warnings: list = field(default_factory=list)
+
+
+def normalize_license(raw: str) -> str:
+    """Normalize a license string to SPDX identifier."""
+    if not raw or raw.strip() in ("UNKNOWN", "UNKNOWN:", ""):
+        return "UNKNOWN"
+    cleaned = raw.strip().lower()
+    # Remove version specifiers like "MIT License (MIT)"
+    cleaned = re.sub(r"\(.*?\)", "", cleaned).strip()
+    cleaned = re.sub(r"\s+license$", "", cleaned).strip()
+    cleaned = re.sub(r"^the\s+", "", cleaned).strip()
+    if cleaned in LICENSE_ALIASES:
+        return LICENSE_ALIASES[cleaned]
+    # Check if it already looks like SPDX
+    upper = raw.strip()
+    if upper in COPYLEFT_FAMILIES or upper in PERMISSIVE_LICENSES:
+        return upper
+    return raw.strip()
+
+
+def check_compatibility(dep_license: str, project_license: str) -> tuple[Severity, str]:
+    """Check if a dependency license is compatible with the project license."""
+    if dep_license == "UNKNOWN":
+        return Severity.WARNING, "License unknown — manual review needed"
+    
+    if dep_license in PERMISSIVE_LICENSES:
+        return Severity.OK, "Compatible (permissive)"
+    
+    if dep_license in COPYLEFT_FAMILIES:
+        # Copyleft in a permissive project is a problem
+        if project_license in PERMISSIVE_LICENSES:
+            return Severity.ERROR, f"Copyleft ({dep_license}) in permissive ({project_license}) project"
+        # Copyleft in same family is OK
+        if dep_license.startswith(project_license.split("-")[0]):
+            return Severity.OK, "Compatible (same copyleft family)"
+        return Severity.WARNING, f"Review needed: {dep_license} with {project_license}"
+    
+    return Severity.UNKNOWN, f"Unrecognized license: {dep_license}"
+
+
+def parse_requirements_txt(path: str) -> list[DepLicense]:
+    """Parse requirements.txt format."""
+    deps = []
+    with open(path) as f:
+        for line in f:
+            line = line.strip()
+            if not line or line.startswith("#") or line.startswith("-"):
+                continue
+            # Parse name==version or name>=version etc.
+            match = re.match(r"^([a-zA-Z0-9_.-]+)(?:[>=<!~].*)?$", line)
+            if match:
+                deps.append(DepLicense(name=match.group(1), source="requirements.txt"))
+    return deps
+
+
+def parse_pyproject_toml(path: str) -> list[DepLicense]:
+    """Parse pyproject.toml dependencies."""
+    deps = []
+    try:
+        # Use tomllib (Python 3.11+) or fall back to regex
+        import tomllib
+        with open(path, "rb") as f:
+            data = tomllib.load(f)
+    except ImportError:
+        # Fallback: regex parse
+        with open(path) as f:
+            content = f.read()
+        # Find [project.dependencies] section
+        match = re.search(r"\[project\]\s*dependencies\s*=\s*\[(.*?)\]", content, re.DOTALL)
+        if match:
+            for dep_str in re.findall(r'"([^"]+)"', match.group(1)):
+                name = re.match(r"^([a-zA-Z0-9_.-]+)", dep_str)
+                if name:
+                    deps.append(DepLicense(name=name.group(1), source="pyproject.toml"))
+        return deps
+
+    project_deps = data.get("project", {}).get("dependencies", [])
+    for dep_str in project_deps:
+        name = re.match(r"^([a-zA-Z0-9_.-]+)", dep_str)
+        if name:
+            deps.append(DepLicense(name=name.group(1), source="pyproject.toml"))
+    return deps
+
+
+def parse_package_json(path: str) -> list[DepLicense]:
+    """Parse package.json dependencies."""
+    deps = []
+    with open(path) as f:
+        data = json.load(f)
+    for section in ("dependencies", "devDependencies"):
+        for name, version in data.get(section, {}).items():
+            deps.append(DepLicense(name=name, version=version, source="package.json"))
+    return deps
+
+
+def parse_cargo_toml(path: str) -> list[DepLicense]:
+    """Parse Cargo.toml dependencies (basic)."""
+    deps = []
+    with open(path) as f:
+        for line in f:
+            match = re.match(r'^([a-zA-Z0-9_-]+)\s*=\s*"', line.strip())
+            if match and line.strip()[0] != "[" and line.strip() != "[dependencies]":
+                deps.append(DepLicense(name=match.group(1), source="Cargo.toml"))
+    return deps
+
+
+def parse_go_mod(path: str) -> list[DepLicense]:
+    """Parse go.mod dependencies."""
+    deps = []
+    with open(path) as f:
+        in_require = False
+        for line in f:
+            line = line.strip()
+            if line == "require (":
+                in_require = True
+                continue
+            if line == ")" and in_require:
+                in_require = False
+                continue
+            if in_require:
+                parts = line.split()
+                if len(parts) >= 2:
+                    deps.append(DepLicense(name=parts[0], version=parts[1], source="go.mod"))
+    return deps
+
+
+def scan_dep_files(project_dir: str) -> list[DepLicense]:
+    """Find and parse all dependency files in a project."""
+    all_deps = []
+    parsers = {
+        "requirements.txt": parse_requirements_txt,
+        "requirements-dev.txt": parse_requirements_txt,
+        "requirements_prod.txt": parse_requirements_txt,
+        "pyproject.toml": parse_pyproject_toml,
+        "setup.py": None,  # TODO: parse setup.py
+        "package.json": parse_package_json,
+        "Cargo.toml": parse_cargo_toml,
+        "go.mod": parse_go_mod,
+    }
+    
+    for filename, parser in parsers.items():
+        path = os.path.join(project_dir, filename)
+        if os.path.exists(path) and parser:
+            try:
+                deps = parser(path)
+                all_deps.extend(deps)
+            except Exception as e:
+                print(f"Warning: Failed to parse {filename}: {e}", file=sys.stderr)
+    
+    # Also check subdirectories for monorepos (one level deep)
+    for entry in os.listdir(project_dir):
+        subdir = os.path.join(project_dir, entry)
+        if os.path.isdir(subdir) and not entry.startswith("."):
+            for filename, parser in parsers.items():
+                path = os.path.join(subdir, filename)
+                if os.path.exists(path) and parser:
+                    try:
+                        deps = parser(path)
+                        for d in deps:
+                            d.source = f"{entry}/{filename}"
+                        all_deps.extend(deps)
+                    except Exception:
+                        pass
+    
+    return all_deps
+
+
+def lookup_pypi_license(package_name: str) -> str:
+    """Look up license from PyPI API."""
+    try:
+        url = f"https://pypi.org/pypi/{package_name}/json"
+        req = urllib.request.Request(url, headers={"Accept": "application/json"})
+        resp = urllib.request.urlopen(req, timeout=10)
+        data = json.loads(resp.read())
+        # Try classifiers first
+        for classifier in data.get("info", {}).get("classifiers", []):
+            if classifier.startswith("License ::"):
+                parts = classifier.split(" :: ")
+                if len(parts) >= 3:
+                    return parts[-1]
+        # Fall back to license field
+        lic = data.get("info", {}).get("license", "")
+        if lic and len(lic) < 100:
+            return lic
+        # Try license_expression
+        le = data.get("info", {}).get("license_expression", "")
+        if le:
+            return le
+        return "UNKNOWN"
+    except Exception:
+        return "UNKNOWN"
+
+
+def lookup_npm_license(package_name: str) -> str:
+    """Look up license from npm registry."""
+    try:
+        url = f"https://registry.npmjs.org/{package_name}"
+        req = urllib.request.Request(url, headers={"Accept": "application/json"})
+        resp = urllib.request.urlopen(req, timeout=10)
+        data = json.loads(resp.read())
+        lic = data.get("license", "UNKNOWN")
+        if isinstance(lic, dict):
+            lic = lic.get("type", "UNKNOWN")
+        return lic or "UNKNOWN"
+    except Exception:
+        return "UNKNOWN"
+
+
+def detect_project_license(project_dir: str) -> str:
+    """Detect the project's own license."""
+    for name in ("LICENSE", "LICENSE.md", "LICENSE.txt", "LICENCE", "COPYING"):
+        path = os.path.join(project_dir, name)
+        if os.path.exists(path):
+            with open(path) as f:
+                content = f.read().upper()
+            if "MIT LICENSE" in content or "MIT" in content[:200]:
+                return "MIT"
+            if "APACHE" in content and "2.0" in content:
+                return "Apache-2.0"
+            if "GNU GENERAL PUBLIC LICENSE" in content:
+                if "VERSION 3" in content:
+                    return "GPL-3.0"
+                if "VERSION 2" in content:
+                    return "GPL-2.0"
+            if "BSD" in content[:500]:
+                if "3-CLAUSE" in content or "THREE CLAUSE" in content:
+                    return "BSD-3-Clause"
+                return "BSD-2-Clause"
+            if "ISC" in content[:200]:
+                return "ISC"
+    # Check pyproject.toml
+    pypath = os.path.join(project_dir, "pyproject.toml")
+    if os.path.exists(pypath):
+        with open(pypath) as f:
+            content = f.read()
+        match = re.search(r'license\s*=\s*\{\s*text\s*=\s*"([^"]+)"', content)
+        if match:
+            return normalize_license(match.group(1))
+        match = re.search(r'license\s*=\s*"([^"]+)"', content)
+        if match:
+            return normalize_license(match.group(1))
+    return "UNKNOWN"
+
+
+def resolve_licenses(deps: list[DepLicense], cache: dict = None) -> None:
+    """Resolve license info for all dependencies."""
+    if cache is None:
+        cache = {}
+    
+    for dep in deps:
+        if dep.name in cache:
+            dep.license = cache[dep.name]
+            continue
+        
+        # Determine ecosystem
+        if dep.source in ("package.json",):
+            raw = lookup_npm_license(dep.name)
+        else:
+            raw = lookup_pypi_license(dep.name)
+        
+        dep.license = normalize_license(raw)
+        cache[dep.name] = dep.license
+
+
+def generate_report(deps: list[DepLicense], project_license: str) -> LicenseReport:
+    """Generate the compatibility report."""
+    report = LicenseReport(
+        project_dir="",
+        project_license=project_license,
+        dependencies=[],
+    )
+    
+    counts = {"ok": 0, "warning": 0, "error": 0, "unknown": 0}
+    
+    for dep in deps:
+        severity, message = check_compatibility(dep.license, project_license)
+        dep.severity = severity
+        dep.message = message
+        counts[severity.value] += 1
+        
+        if severity == Severity.ERROR:
+            report.errors.append(f"{dep.name}: {message}")
+        elif severity == Severity.WARNING:
+            report.warnings.append(f"{dep.name}: {message}")
+        
+        report.dependencies.append(asdict(dep))
+    
+    report.summary = {
+        "total": len(deps),
+        **counts,
+        "project_license": project_license,
+    }
+    
+    return report
+
+
+def format_text(report: LicenseReport) -> str:
+    """Format report as human-readable text."""
+    lines = []
+    lines.append("=" * 60)
+    lines.append("  LICENSE COMPATIBILITY REPORT")
+    lines.append("=" * 60)
+    lines.append(f"  Project License: {report.project_license}")
+    lines.append(f"  Dependencies: {report.summary.get('total', 0)}")
+    lines.append(f"  OK: {report.summary.get('ok', 0)}  "
+                 f"WARN: {report.summary.get('warning', 0)}  "
+                 f"ERR: {report.summary.get('error', 0)}  "
+                 f"UNK: {report.summary.get('unknown', 0)}")
+    lines.append("-" * 60)
+    
+    for dep in report.dependencies:
+        icon = {"ok": "[OK]", "warning": "[!!]", "error": "[XX]", "unknown": "[??]"}
+        sev = dep.get("severity", "unknown")
+        name = dep.get("name", "?")
+        lic = dep.get("license", "?")
+        msg = dep.get("message", "")
+        lines.append(f"  {icon.get(sev, '[ ]')} {name:30s} {lic:20s} {msg}")
+    
+    if report.errors:
+        lines.append("-" * 60)
+        lines.append("  ERRORS:")
+        for e in report.errors:
+            lines.append(f"    - {e}")
+    
+    if report.warnings:
+        lines.append("-" * 60)
+        lines.append("  WARNINGS:")
+        for w in report.warnings:
+            lines.append(f"    - {w}")
+    
+    lines.append("=" * 60)
+    return "\n".join(lines)
+
+
+def main():
+    parser = argparse.ArgumentParser(description="License Checker — Pipeline 5.4")
+    parser.add_argument("project_dir", help="Project directory to scan")
+    parser.add_argument("--project-license", default=None,
+                        help="Project license SPDX id (auto-detected if omitted)")
+    parser.add_argument("--format", choices=["json", "text"], default="text",
+                        help="Output format")
+    parser.add_argument("--scan-deps", action="store_true",
+                        help="Only scan and list deps (skip license lookup)")
+    args = parser.parse_args()
+    
+    project_dir = os.path.abspath(args.project_dir)
+    if not os.path.isdir(project_dir):
+        print(f"Error: {project_dir} is not a directory", file=sys.stderr)
+        sys.exit(1)
+    
+    # Detect project license
+    project_license = args.project_license or detect_project_license(project_dir)
+    
+    # Scan deps
+    deps = scan_dep_files(project_dir)
+    if not deps:
+        print(f"No dependencies found in {project_dir}", file=sys.stderr)
+        sys.exit(0)
+    
+    print(f"Found {len(deps)} dependencies", file=sys.stderr)
+    
+    if args.scan_deps:
+        for d in deps:
+            print(f"  {d.name} ({d.source})")
+        sys.exit(0)
+    
+    # Resolve licenses
+    print("Resolving licenses...", file=sys.stderr)
+    resolve_licenses(deps)
+    
+    # Generate report
+    report = generate_report(deps, project_license)
+    report.project_dir = project_dir
+    
+    if args.format == "json":
+        print(json.dumps(asdict(report), indent=2, default=str))
+    else:
+        print(format_text(report))
+    
+    # Exit code: 1 if errors, 0 otherwise
+    sys.exit(1 if report.errors else 0)
+
+
+if __name__ == "__main__":
+    main()
--- a/tests/test_license_checker.py
+++ b/tests/test_license_checker.py
@@ -0,0 +1,186 @@
+#!/usr/bin/env python3
+"""Tests for license_checker.py — Pipeline 5.4"""
+
+import json
+import os
+import sys
+import tempfile
+import unittest
+
+# Add scripts dir to path
+sys.path.insert(0, os.path.dirname(__file__))
+
+from license_checker import (
+    normalize_license,
+    check_compatibility,
+    parse_requirements_txt,
+    parse_package_json,
+    parse_pyproject_toml,
+    parse_go_mod,
+    detect_project_license,
+    scan_dep_files,
+    generate_report,
+    format_text,
+    Severity,
+    DepLicense,
+)
+
+
+class TestNormalizeLicense(unittest.TestCase):
+    def test_mit_aliases(self):
+        for alias in ["mit", "MIT License", "The MIT License", "MIT license"]:
+            self.assertEqual(normalize_license(alias), "MIT")
+
+    def test_apache_aliases(self):
+        for alias in ["Apache 2.0", "Apache-2.0", "apache software license"]:
+            self.assertEqual(normalize_license(alias), "Apache-2.0")
+
+    def test_gpl_aliases(self):
+        self.assertEqual(normalize_license("GPL-3.0"), "GPL-3.0")
+        self.assertEqual(normalize_license("gplv3"), "GPL-3.0")
+
+    def test_unknown(self):
+        self.assertEqual(normalize_license(""), "UNKNOWN")
+        self.assertEqual(normalize_license("UNKNOWN"), "UNKNOWN")
+
+    def test_already_spdx(self):
+        self.assertEqual(normalize_license("BSD-3-Clause"), "BSD-3-Clause")
+
+
+class TestCheckCompatibility(unittest.TestCase):
+    def test_permissive_ok(self):
+        sev, msg = check_compatibility("MIT", "MIT")
+        self.assertEqual(sev, Severity.OK)
+
+    def test_gpl_in_mit_error(self):
+        sev, msg = check_compatibility("GPL-3.0", "MIT")
+        self.assertEqual(sev, Severity.ERROR)
+
+    def test_unknown_warning(self):
+        sev, msg = check_compatibility("UNKNOWN", "MIT")
+        self.assertEqual(sev, Severity.WARNING)
+
+    def test_apache_in_mit_ok(self):
+        sev, msg = check_compatibility("Apache-2.0", "MIT")
+        self.assertEqual(sev, Severity.OK)
+
+    def test_lgpl_in_mit_error(self):
+        sev, msg = check_compatibility("LGPL-3.0", "MIT")
+        self.assertEqual(sev, Severity.ERROR)
+
+
+class TestParseRequirements(unittest.TestCase):
+    def test_basic(self):
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as f:
+            f.write("requests>=2.28.0\nflask==2.3.0\n# comment\npytest\n")
+            f.flush()
+            deps = parse_requirements_txt(f.name)
+        os.unlink(f.name)
+        names = [d.name for d in deps]
+        self.assertIn("requests", names)
+        self.assertIn("flask", names)
+        self.assertIn("pytest", names)
+        self.assertEqual(len(deps), 3)
+
+    def test_skip_flags(self):
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as f:
+            f.write("-r other.txt\n--index-url https://pypi.org\nreal-dep\n")
+            f.flush()
+            deps = parse_requirements_txt(f.name)
+        os.unlink(f.name)
+        self.assertEqual(len(deps), 1)
+        self.assertEqual(deps[0].name, "real-dep")
+
+
+class TestParsePackageJson(unittest.TestCase):
+    def test_basic(self):
+        data = {
+            "dependencies": {"express": "^4.18.0", "lodash": "^4.17.21"},
+            "devDependencies": {"jest": "^29.0.0"},
+        }
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
+            json.dump(data, f)
+            f.flush()
+            deps = parse_package_json(f.name)
+        os.unlink(f.name)
+        names = [d.name for d in deps]
+        self.assertIn("express", names)
+        self.assertIn("jest", names)
+        self.assertEqual(len(deps), 3)
+
+
+class TestParseGoMod(unittest.TestCase):
+    def test_basic(self):
+        content = """module example.com/mymod
+
+go 1.21
+
+require (
+    github.com/gin-gonic/gin v1.9.1
+    github.com/stretchr/testify v1.8.4
+)
+"""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".mod", delete=False) as f:
+            f.write(content)
+            f.flush()
+            deps = parse_go_mod(f.name)
+        os.unlink(f.name)
+        self.assertEqual(len(deps), 2)
+        self.assertEqual(deps[0].name, "github.com/gin-gonic/gin")
+
+
+class TestDetectProjectLicense(unittest.TestCase):
+    def test_mit_file(self):
+        with tempfile.TemporaryDirectory() as d:
+            with open(os.path.join(d, "LICENSE"), "w") as f:
+                f.write("MIT License\n\nCopyright (c) 2024...\n")
+            self.assertEqual(detect_project_license(d), "MIT")
+
+    def test_apache_file(self):
+        with tempfile.TemporaryDirectory() as d:
+            with open(os.path.join(d, "LICENSE"), "w") as f:
+                f.write("Apache License Version 2.0...")
+            self.assertEqual(detect_project_license(d), "Apache-2.0")
+
+    def test_no_license(self):
+        with tempfile.TemporaryDirectory() as d:
+            self.assertEqual(detect_project_license(d), "UNKNOWN")
+
+
+class TestScanDeps(unittest.TestCase):
+    def test_multi_ecosystem(self):
+        with tempfile.TemporaryDirectory() as d:
+            with open(os.path.join(d, "requirements.txt"), "w") as f:
+                f.write("flask\nrequests\n")
+            with open(os.path.join(d, "package.json"), "w") as f:
+                json.dump({"dependencies": {"express": "^4.0.0"}}, f)
+            deps = scan_dep_files(d)
+            names = [d.name for d in deps]
+            self.assertIn("flask", names)
+            self.assertIn("express", names)
+
+
+class TestGenerateReport(unittest.TestCase):
+    def test_basic(self):
+        deps = [
+            DepLicense(name="flask", license="BSD-3-Clause", source="requirements.txt"),
+            DepLicense(name="gpl-pkg", license="GPL-3.0", source="requirements.txt"),
+            DepLicense(name="unknown-pkg", license="UNKNOWN", source="requirements.txt"),
+        ]
+        report = generate_report(deps, "MIT")
+        self.assertEqual(report.summary["ok"], 1)
+        self.assertEqual(report.summary["error"], 1)
+        self.assertEqual(report.summary["warning"], 1)
+        self.assertEqual(len(report.errors), 1)
+        self.assertIn("gpl-pkg", report.errors[0])
+
+    def test_format_text(self):
+        deps = [DepLicense(name="flask", license="BSD-3-Clause", source="requirements.txt")]
+        report = generate_report(deps, "MIT")
+        text = format_text(report)
+        self.assertIn("LICENSE COMPATIBILITY REPORT", text)
+        self.assertIn("flask", text)
+
+
+if __name__ == "__main__":
+    unittest.main()
Author	SHA1	Message	Date
Alexander Whitestone	508f0363b5	test: license checker unit tests (#110 )	2026-04-15 03:32:58 +00:00
Alexander Whitestone	c3d1633859	feat: license checker — Pipeline 5.4 (#110 )	2026-04-15 03:31:35 +00:00