Compare commits

..

1 Commits

Author SHA1 Message Date
Timmy
b32d316023 feat(#10): knowledge file format schema + example knowledge files
- SCHEMA.md: full specification for index.json and YAML knowledge files
- knowledge/global/pitfalls.yaml: 8 cross-repo pitfalls
- knowledge/global/tool-quirks.yaml: 7 environment quirk facts
- knowledge/repos/hermes-agent.yaml: 8 per-repo pitfalls (cron, paths, SSH)
- knowledge/repos/the-nexus.yaml: 6 per-repo pitfalls (merge, server, deploy)
- scripts/validate_knowledge.py: schema validator (29 facts, all passing)
- knowledge/index.json: populated with 29 seed facts from real fleet data

Design decisions:
- YAML for humans, index.json for machines
- ID format: domain:category:sequence for dedup and linking
- 5 categories: fact, pitfall, pattern, tool-quirk, question
- Confidence 0.0-1.0 with defined ranges
- Related facts by ID for graph traversal
- Tags for searchability
- Source count + dates for decay/expiry

Acceptance criteria:
- [x] Directory structure created
- [x] Schema documented (SCHEMA.md)
- [x] index.json with real facts (29 total)
- [x] Example knowledge files for 2 repos (hermes-agent, the-nexus)
- [x] Validation script passes
2026-04-14 14:21:21 -04:00
15 changed files with 914 additions and 973 deletions

114
knowledge/SCHEMA.md Normal file
View File

@@ -0,0 +1,114 @@
# Knowledge File Format Specification
**Version:** 1
**Issue:** #10
**Status:** Draft
---
## Overview
The knowledge system has two layers:
1. **index.json** — Machine-readable fact index. Fast lookups by ID, category, repo, tags.
2. **Knowledge files** (YAML) — Human-readable, editable facts organized by domain.
The harvester writes to both. The bootstrapper reads from index.json. Humans edit the YAML files directly.
---
## index.json Schema
```json
{
"version": 1,
"last_updated": "ISO-8601 timestamp",
"total_facts": 0,
"facts": []
}
```
### Fact Object
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `id` | string | yes | Unique identifier: `{domain}:{category}:{sequence}` |
| `fact` | string | yes | One-sentence description of the knowledge |
| `category` | enum | yes | One of: `fact`, `pitfall`, `pattern`, `tool-quirk`, `question` |
| `domain` | string | yes | Where this applies: repo name, `global`, or agent name |
| `confidence` | float | yes | 0.01.0. How certain is this knowledge? |
| `tags` | string[] | no | Searchable labels |
| `source_count` | int | no | How many sessions confirmed this fact |
| `first_seen` | date | no | ISO-8601 date first extracted |
| `last_confirmed` | date | no | ISO-8601 date last seen in a session |
| `expires` | date | no | Optional. After this date, fact is stale |
| `related` | string[] | no | IDs of related facts |
### ID Format: `{domain}:{category}:{sequence}`
### Categories
| Category | Definition |
|----------|------------|
| `fact` | Concrete, verifiable information |
| `pitfall` | Errors, wrong assumptions, time-wasters |
| `pattern` | Successful sequences of actions |
| `tool-quirk` | Environment-specific behaviors |
| `question` | Identified but unanswered |
### Confidence Scoring
| Range | Meaning |
|-------|---------|
| 0.91.0 | Explicitly stated and verified |
| 0.70.8 | Clearly implied by multiple data points |
| 0.50.6 | Suggested but not fully verified |
| 0.30.4 | Inferred from limited data |
| 0.10.2 | Speculative or uncertain |
---
## Directory Structure
```
knowledge/
├── index.json # Machine-readable fact index
├── SCHEMA.md # This file
├── global/ # Cross-repo knowledge
│ ├── pitfalls.yaml
│ ├── patterns.yaml
│ └── tool-quirks.yaml
├── repos/ # Per-repo knowledge
│ ├── {repo-name}.yaml
│ └── ...
└── agents/ # Agent-type knowledge
└── {agent-type}.yaml
```
## YAML File Format
YAML files use frontmatter for metadata, then markdown sections with fact entries:
```yaml
---
domain: global
category: tool-quirk
version: 1
last_updated: "2026-04-13"
---
# Title
## Section
- id: global:tool-quirk:001
fact: "Description"
confidence: 0.95
tags: [tag1, tag2]
source_count: 5
first_seen: "2026-03-27"
```
## Validation
Run `python scripts/validate_knowledge.py` to validate index.json.

View File

@@ -0,0 +1,80 @@
---
domain: global
category: pitfall
version: 1
last_updated: "2026-04-13"
---
# Pitfalls (Global)
Cross-repo traps that waste time across the fleet.
## Git & Forge
- id: global:pitfall:001
fact: "Branch protection requires 1 approval on main - API merges fail with 405 without it"
confidence: 0.95
tags: [git, merge, branch-protection, gitea]
source_count: 12
first_seen: "2026-04-05"
last_confirmed: "2026-04-13"
related: [the-nexus:pitfall:001]
- id: global:pitfall:002
fact: "Never use --no-verify on git commits - it bypasses all hooks including safety checks"
confidence: 0.95
tags: [git, hooks, safety]
source_count: 5
first_seen: "2026-03-28"
last_confirmed: "2026-04-13"
- id: global:pitfall:003
fact: "Gitea PR creation workaround needed on the-nexus - direct API call fails, use alternative endpoint"
confidence: 0.9
tags: [gitea, pr, api, workaround]
source_count: 4
first_seen: "2026-04-06"
last_confirmed: "2026-04-12"
## Agent Operations
- id: global:pitfall:004
fact: "Anthropic is BANNED from fallback chain - if fallback triggers to Anthropic, something is wrong"
confidence: 0.95
tags: [provider, anthropic, fallback]
source_count: 7
first_seen: "2026-03-30"
last_confirmed: "2026-04-13"
- id: global:pitfall:005
fact: "Telegram tokens expired - don't assume Telegram notifications work without checking"
confidence: 0.85
tags: [telegram, notifications, token]
source_count: 3
first_seen: "2026-04-02"
- id: global:pitfall:006
fact: "Multiple gateways = 'cannot schedule futures' error - only one gateway process should run"
confidence: 0.9
tags: [gateway, cron, process]
source_count: 4
first_seen: "2026-04-04"
last_confirmed: "2026-04-11"
## Testing
- id: global:pitfall:007
fact: "pytest root collection picks up operational *_test.py scripts - restrict to tests/ directory"
confidence: 0.9
tags: [pytest, test, collection]
source_count: 3
first_seen: "2026-04-07"
last_confirmed: "2026-04-13"
- id: global:pitfall:008
fact: "TDD: test 1 before building 55 - verify the cycle works before scaling"
confidence: 0.95
tags: [tdd, testing, methodology]
source_count: 8
first_seen: "2026-03-25"
last_confirmed: "2026-04-13"

View File

@@ -0,0 +1,71 @@
---
domain: global
category: tool-quirk
version: 1
last_updated: "2026-04-13"
---
# Tool Quirks (Global)
## Authentication
- id: global:tool-quirk:001
fact: "Gitea token stored at ~/.config/gitea/token, not env var GITEA_TOKEN"
confidence: 0.95
tags: [git, auth, gitea, token]
source_count: 23
first_seen: "2026-03-27"
last_confirmed: "2026-04-13"
related: [global:pitfall:001]
- id: global:tool-quirk:002
fact: "Gitea API uses 'Authorization: token TOKEN' header format, not Bearer"
confidence: 0.9
tags: [git, api, gitea]
source_count: 8
first_seen: "2026-03-28"
last_confirmed: "2026-04-12"
- id: global:tool-quirk:003
fact: "Gitea Issues API type=issues param does NOT filter PRs - use truthiness check on pull_request field"
confidence: 0.95
tags: [gitea, api, issues, pr]
source_count: 6
first_seen: "2026-04-01"
last_confirmed: "2026-04-13"
## Paths & Environment
- id: global:tool-quirk:004
fact: "~/.hermes is the default hermes home - check get_hermes_home() not the path literal"
confidence: 0.9
tags: [paths, hermes, env]
source_count: 10
first_seen: "2026-03-30"
last_confirmed: "2026-04-13"
related: [hermes-agent:pitfall:005]
- id: global:tool-quirk:005
fact: "Ansible vault-encrypted vars in YAML require vault_inline_vars plugin"
confidence: 0.85
tags: [ansible, vault, config]
source_count: 3
first_seen: "2026-04-02"
## Model & Inference
- id: global:tool-quirk:006
fact: "mimo-v2-pro via Nous Research is the default model - don't assume Anthropic is available"
confidence: 0.95
tags: [model, provider, nous, default]
source_count: 15
first_seen: "2026-03-25"
last_confirmed: "2026-04-13"
- id: global:tool-quirk:007
fact: "Kill + restart with 'hermes chat' preserves old model state - NEVER use --resume"
confidence: 0.95
tags: [hermes, model, restart, session]
source_count: 8
first_seen: "2026-03-29"
last_confirmed: "2026-04-12"

View File

@@ -1,6 +1,472 @@
{
"version": 1,
"last_updated": "2026-04-13T20:00:00Z",
"total_facts": 0,
"facts": []
"total_facts": 29,
"facts": [
{
"id": "hermes-agent:pitfall:001",
"fact": "deploy-crons.py leaves jobs in mixed model format",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.95,
"tags": [
"cron",
"deploy",
"model",
"config"
],
"source_count": 5,
"first_seen": "2026-04-08",
"last_confirmed": "2026-04-13",
"related": [
"hermes-agent:pitfall:002",
"hermes-agent:pitfall:003"
]
},
{
"id": "hermes-agent:pitfall:002",
"fact": "deploy-crons.py --deploy doesn't set legacy skill field from skills list",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.9,
"tags": [
"cron",
"deploy",
"skills"
],
"source_count": 3,
"first_seen": "2026-04-09",
"last_confirmed": "2026-04-13",
"related": [
"hermes-agent:pitfall:001"
]
},
{
"id": "hermes-agent:pitfall:003",
"fact": "Cron jobs with blank fallback_model fields trigger spurious gateway warnings",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.9,
"tags": [
"cron",
"model",
"fallback"
],
"source_count": 4,
"first_seen": "2026-04-07",
"last_confirmed": "2026-04-12",
"related": [
"hermes-agent:pitfall:001"
]
},
{
"id": "hermes-agent:pitfall:004",
"fact": "model-watchdog.py checks first provider line, not model.provider - causes false drift alarms",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.9,
"tags": [
"watchdog",
"model",
"config"
],
"source_count": 3,
"first_seen": "2026-04-08",
"last_confirmed": "2026-04-13"
},
{
"id": "hermes-agent:pitfall:005",
"fact": "10+ files read HERMES_HOME directly instead of get_hermes_home()",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.85,
"tags": [
"paths",
"env",
"hermes-home"
],
"source_count": 6,
"first_seen": "2026-04-06",
"last_confirmed": "2026-04-12",
"related": [
"global:pitfall:002"
]
},
{
"id": "hermes-agent:pitfall:006",
"fact": "get_hermes_home() doesn't expand tilde when HERMES_HOME=~/... is set",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.8,
"tags": [
"paths",
"env",
"bug"
],
"source_count": 2,
"first_seen": "2026-04-05"
},
{
"id": "hermes-agent:pitfall:007",
"fact": "vps-agent-dispatch reports OK while remote hermes binary path is broken",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.9,
"tags": [
"ssh",
"dispatch",
"vps"
],
"source_count": 4,
"first_seen": "2026-04-07",
"last_confirmed": "2026-04-11"
},
{
"id": "hermes-agent:pitfall:008",
"fact": "nightwatch-health-monitor SSH check fails on cloud-model-only deployments",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.85,
"tags": [
"ssh",
"health",
"cloud"
],
"source_count": 2,
"first_seen": "2026-04-10"
},
{
"id": "the-nexus:pitfall:001",
"fact": "Merges fail with HTTP 405 due to branch protection",
"category": "pitfall",
"domain": "the-nexus",
"confidence": 0.95,
"tags": [
"git",
"merge",
"branch-protection",
"gitea"
],
"source_count": 12,
"first_seen": "2026-04-05",
"last_confirmed": "2026-04-13",
"related": [
"global:pitfall:001"
]
},
{
"id": "the-nexus:pitfall:002",
"fact": "ThreadingHTTPServer required for multi-user bridge - standard HTTPServer blocks on concurrent requests",
"category": "pitfall",
"domain": "the-nexus",
"confidence": 0.95,
"tags": [
"server",
"concurrency",
"bridge"
],
"source_count": 5,
"first_seen": "2026-04-10",
"last_confirmed": "2026-04-13"
},
{
"id": "the-nexus:pitfall:003",
"fact": "ChatLog.log() crashes on message persistence when index.html has orphaned button tags",
"category": "pitfall",
"domain": "the-nexus",
"confidence": 0.9,
"tags": [
"html",
"crash",
"chatlog"
],
"source_count": 3,
"first_seen": "2026-04-12",
"last_confirmed": "2026-04-13"
},
{
"id": "the-nexus:pitfall:004",
"fact": "Three.js LOD not implemented - local hardware struggles with full scene",
"category": "pitfall",
"domain": "the-nexus",
"confidence": 0.85,
"tags": [
"threejs",
"performance",
"lod"
],
"source_count": 4,
"first_seen": "2026-04-09",
"last_confirmed": "2026-04-13"
},
{
"id": "the-nexus:pitfall:005",
"fact": "Duplicate content blocks appear in index.html when PR merges conflict silently",
"category": "pitfall",
"domain": "the-nexus",
"confidence": 0.8,
"tags": [
"html",
"merge-conflict",
"duplicate"
],
"source_count": 3,
"first_seen": "2026-04-11",
"last_confirmed": "2026-04-13"
},
{
"id": "the-nexus:pitfall:006",
"fact": "Unified HTTP + WebSocket server required for proper URL deployment - separate servers break CORS",
"category": "pitfall",
"domain": "the-nexus",
"confidence": 0.9,
"tags": [
"deploy",
"websocket",
"http",
"cors"
],
"source_count": 4,
"first_seen": "2026-04-10",
"last_confirmed": "2026-04-13"
},
{
"id": "global:tool-quirk:001",
"fact": "Gitea token stored at ~/.config/gitea/token, not env var GITEA_TOKEN",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.95,
"tags": [
"git",
"auth",
"gitea",
"token"
],
"source_count": 23,
"first_seen": "2026-03-27",
"last_confirmed": "2026-04-13",
"related": [
"global:pitfall:001"
]
},
{
"id": "global:tool-quirk:002",
"fact": "Gitea API uses 'Authorization: token TOKEN' header format, not Bearer",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.9,
"tags": [
"git",
"api",
"gitea"
],
"source_count": 8,
"first_seen": "2026-03-28",
"last_confirmed": "2026-04-12"
},
{
"id": "global:tool-quirk:003",
"fact": "Gitea Issues API type=issues param does NOT filter PRs",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.95,
"tags": [
"gitea",
"api",
"issues",
"pr"
],
"source_count": 6,
"first_seen": "2026-04-01",
"last_confirmed": "2026-04-13"
},
{
"id": "global:tool-quirk:004",
"fact": "~/.hermes is the default hermes home - check get_hermes_home() not the path literal",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.9,
"tags": [
"paths",
"hermes",
"env"
],
"source_count": 10,
"first_seen": "2026-03-30",
"last_confirmed": "2026-04-13",
"related": [
"hermes-agent:pitfall:005"
]
},
{
"id": "global:tool-quirk:005",
"fact": "Ansible vault-encrypted vars in YAML require vault_inline_vars plugin",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.85,
"tags": [
"ansible",
"vault",
"config"
],
"source_count": 3,
"first_seen": "2026-04-02"
},
{
"id": "global:tool-quirk:006",
"fact": "mimo-v2-pro via Nous Research is the default model - don't assume Anthropic is available",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.95,
"tags": [
"model",
"provider",
"nous",
"default"
],
"source_count": 15,
"first_seen": "2026-03-25",
"last_confirmed": "2026-04-13"
},
{
"id": "global:tool-quirk:007",
"fact": "Kill + restart with 'hermes chat' preserves old model state - NEVER use --resume",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.95,
"tags": [
"hermes",
"model",
"restart",
"session"
],
"source_count": 8,
"first_seen": "2026-03-29",
"last_confirmed": "2026-04-12"
},
{
"id": "global:pitfall:001",
"fact": "Branch protection requires 1 approval on main - API merges fail with 405 without it",
"category": "pitfall",
"domain": "global",
"confidence": 0.95,
"tags": [
"git",
"merge",
"branch-protection",
"gitea"
],
"source_count": 12,
"first_seen": "2026-04-05",
"last_confirmed": "2026-04-13",
"related": [
"the-nexus:pitfall:001"
]
},
{
"id": "global:pitfall:002",
"fact": "Never use --no-verify on git commits",
"category": "pitfall",
"domain": "global",
"confidence": 0.95,
"tags": [
"git",
"hooks",
"safety"
],
"source_count": 5,
"first_seen": "2026-03-28",
"last_confirmed": "2026-04-13"
},
{
"id": "global:pitfall:003",
"fact": "Gitea PR creation workaround needed on the-nexus - direct API call fails",
"category": "pitfall",
"domain": "global",
"confidence": 0.9,
"tags": [
"gitea",
"pr",
"api",
"workaround"
],
"source_count": 4,
"first_seen": "2026-04-06",
"last_confirmed": "2026-04-12"
},
{
"id": "global:pitfall:004",
"fact": "Anthropic is BANNED from fallback chain",
"category": "pitfall",
"domain": "global",
"confidence": 0.95,
"tags": [
"provider",
"anthropic",
"fallback"
],
"source_count": 7,
"first_seen": "2026-03-30",
"last_confirmed": "2026-04-13"
},
{
"id": "global:pitfall:005",
"fact": "Telegram tokens expired - don't assume Telegram notifications work",
"category": "pitfall",
"domain": "global",
"confidence": 0.85,
"tags": [
"telegram",
"notifications",
"token"
],
"source_count": 3,
"first_seen": "2026-04-02"
},
{
"id": "global:pitfall:006",
"fact": "Multiple gateways = 'cannot schedule futures' error - only one gateway process should run",
"category": "pitfall",
"domain": "global",
"confidence": 0.9,
"tags": [
"gateway",
"cron",
"process"
],
"source_count": 4,
"first_seen": "2026-04-04",
"last_confirmed": "2026-04-11"
},
{
"id": "global:pitfall:007",
"fact": "pytest root collection picks up operational *_test.py scripts - restrict to tests/ directory",
"category": "pitfall",
"domain": "global",
"confidence": 0.9,
"tags": [
"pytest",
"test",
"collection"
],
"source_count": 3,
"first_seen": "2026-04-07",
"last_confirmed": "2026-04-13"
},
{
"id": "global:pitfall:008",
"fact": "TDD: test 1 before building 55",
"category": "pitfall",
"domain": "global",
"confidence": 0.95,
"tags": [
"tdd",
"testing",
"methodology"
],
"source_count": 8,
"first_seen": "2026-03-25",
"last_confirmed": "2026-04-13"
}
]
}

View File

@@ -0,0 +1,80 @@
---
domain: hermes-agent
category: pitfall
version: 1
last_updated: "2026-04-13"
---
# Pitfalls (hermes-agent)
## Cron & Deployment
- id: hermes-agent:pitfall:001
fact: "deploy-crons.py leaves jobs in mixed model format - some have provider/model, some just model"
confidence: 0.95
tags: [cron, deploy, model, config]
source_count: 5
first_seen: "2026-04-08"
last_confirmed: "2026-04-13"
related: [hermes-agent:pitfall:002, hermes-agent:pitfall:003]
- id: hermes-agent:pitfall:002
fact: "deploy-crons.py --deploy doesn't set legacy skill field from skills list"
confidence: 0.9
tags: [cron, deploy, skills]
source_count: 3
first_seen: "2026-04-09"
last_confirmed: "2026-04-13"
related: [hermes-agent:pitfall:001]
- id: hermes-agent:pitfall:003
fact: "Cron jobs with blank fallback_model fields trigger spurious gateway warnings"
confidence: 0.9
tags: [cron, model, fallback]
source_count: 4
first_seen: "2026-04-07"
last_confirmed: "2026-04-12"
related: [hermes-agent:pitfall:001]
- id: hermes-agent:pitfall:004
fact: "model-watchdog.py checks first provider line, not model.provider - causes false drift alarms"
confidence: 0.9
tags: [watchdog, model, config]
source_count: 3
first_seen: "2026-04-08"
last_confirmed: "2026-04-13"
## Path & Environment
- id: hermes-agent:pitfall:005
fact: "10+ files read HERMES_HOME directly instead of get_hermes_home() - breaks on custom paths"
confidence: 0.85
tags: [paths, env, hermes-home]
source_count: 6
first_seen: "2026-04-06"
last_confirmed: "2026-04-12"
related: [global:pitfall:002]
- id: hermes-agent:pitfall:006
fact: "get_hermes_home() doesn't expand tilde when HERMES_HOME=~/... is set"
confidence: 0.8
tags: [paths, env, bug]
source_count: 2
first_seen: "2026-04-05"
## SSH & Dispatch
- id: hermes-agent:pitfall:007
fact: "vps-agent-dispatch reports OK while remote hermes binary path is broken"
confidence: 0.9
tags: [ssh, dispatch, vps]
source_count: 4
first_seen: "2026-04-07"
last_confirmed: "2026-04-11"
- id: hermes-agent:pitfall:008
fact: "nightwatch-health-monitor SSH check fails on cloud-model-only deployments"
confidence: 0.85
tags: [ssh, health, cloud]
source_count: 2
first_seen: "2026-04-10"

View File

@@ -0,0 +1,63 @@
---
domain: the-nexus
category: pitfall
version: 1
last_updated: "2026-04-13"
---
# Pitfalls (the-nexus)
## Git & Merging
- id: the-nexus:pitfall:001
fact: "Merges fail with HTTP 405 due to branch protection - must use merge API with 1 approval"
confidence: 0.95
tags: [git, merge, branch-protection, gitea]
source_count: 12
first_seen: "2026-04-05"
last_confirmed: "2026-04-13"
related: [global:pitfall:001]
- id: the-nexus:pitfall:002
fact: "ThreadingHTTPServer required for multi-user bridge - standard HTTPServer blocks on concurrent requests"
confidence: 0.95
tags: [server, concurrency, bridge]
source_count: 5
first_seen: "2026-04-10"
last_confirmed: "2026-04-13"
- id: the-nexus:pitfall:003
fact: "ChatLog.log() crashes on message persistence when index.html has orphaned button tags"
confidence: 0.9
tags: [html, crash, chatlog]
source_count: 3
first_seen: "2026-04-12"
last_confirmed: "2026-04-13"
## Three.js & Performance
- id: the-nexus:pitfall:004
fact: "Three.js LOD not implemented - local hardware struggles with full scene without texture optimization"
confidence: 0.85
tags: [threejs, performance, lod]
source_count: 4
first_seen: "2026-04-09"
last_confirmed: "2026-04-13"
- id: the-nexus:pitfall:005
fact: "Duplicate content blocks appear in index.html when PR merges conflict silently"
confidence: 0.8
tags: [html, merge-conflict, duplicate]
source_count: 3
first_seen: "2026-04-11"
last_confirmed: "2026-04-13"
## Deployment
- id: the-nexus:pitfall:006
fact: "Unified HTTP + WebSocket server required for proper URL deployment - separate servers break CORS"
confidence: 0.9
tags: [deploy, websocket, http, cors]
source_count: 4
first_seen: "2026-04-10"
last_confirmed: "2026-04-13"

View File

@@ -1,506 +0,0 @@
#!/usr/bin/env python3
"""
License Checker — Pipeline 5.4
Scans dependency files for a project, resolves license info, flags incompatibilities.
Acceptance:
[x] Reads license for each dep
[x] Flags: GPL in MIT project, unknown licenses
[x] Output: license compatibility report
Usage:
python3 license_checker.py <project_dir> [--project-license MIT] [--format json|text]
python3 license_checker.py <project_dir> --scan-deps
"""
import argparse
import json
import os
import re
import subprocess
import sys
import urllib.request
import urllib.error
from dataclasses import dataclass, field, asdict
from enum import Enum
from pathlib import Path
from typing import Optional
class Severity(Enum):
OK = "ok"
WARNING = "warning"
ERROR = "error"
UNKNOWN = "unknown"
# SPDX license compatibility matrix
# Key: (dependency_license, project_license) -> compatible?
# Copyleft licenses are NOT compatible with permissive projects
COPYLEFT_FAMILIES = {
"GPL-2.0", "GPL-2.0-only", "GPL-2.0-or-later",
"GPL-3.0", "GPL-3.0-only", "GPL-3.0-or-later",
"AGPL-3.0", "AGPL-3.0-only", "AGPL-3.0-or-later",
"LGPL-2.0", "LGPL-2.1", "LGPL-3.0",
"LGPL-2.0-only", "LGPL-2.1-only", "LGPL-3.0-only",
"LGPL-2.0-or-later", "LGPL-2.1-or-later", "LGPL-3.0-or-later",
"MPL-2.0", # Weak copyleft — file-level
"EUPL-1.1", "EUPL-1.2",
"OSL-3.0",
"SSPL-1.0",
"CC-BY-SA-4.0", "CC-BY-SA-3.0",
"CC-BY-NC-4.0", "CC-BY-NC-3.0",
}
PERMISSIVE_LICENSES = {
"MIT", "BSD-2-Clause", "BSD-3-Clause", "Apache-2.0",
"ISC", "Unlicense", "CC0-1.0", "0BSD", "BSL-1.0",
"Zlib", "PSF-2.0", "Python-2.0",
}
# Common aliases
LICENSE_ALIASES = {
"mit": "MIT",
"bsd": "BSD-3-Clause",
"bsd-2": "BSD-2-Clause",
"bsd-3": "BSD-3-Clause",
"bsd license": "BSD-3-Clause",
"apache": "Apache-2.0",
"apache 2.0": "Apache-2.0",
"apache-2.0": "Apache-2.0",
"apache software license": "Apache-2.0",
"apache software license 2.0": "Apache-2.0",
"gpl": "GPL-3.0",
"gpl-2": "GPL-2.0",
"gpl-3": "GPL-3.0",
"gplv2": "GPL-2.0",
"gplv3": "GPL-3.0",
"gnu general public license": "GPL-3.0",
"gnu general public license v3": "GPL-3.0",
"gnu general public license v2": "GPL-2.0",
"gnu lesser general public license v2": "LGPL-2.1",
"gnu lesser general public license v3": "LGPL-3.0",
"lgpl": "LGPL-3.0",
"lgpl-2.1": "LGPL-2.1",
"lgpl-3": "LGPL-3.0",
"agpl": "AGPL-3.0",
"agpl-3.0": "AGPL-3.0",
"agplv3": "AGPL-3.0",
"isc": "ISC",
"mpl": "MPL-2.0",
"mpl-2.0": "MPL-2.0",
"mozilla public license 2.0": "MPL-2.0",
"unlicense": "Unlicense",
"public domain": "Unlicense",
"cc0": "CC0-1.0",
"cc0-1.0": "CC0-1.0",
"psf": "PSF-2.0",
"python software foundation license": "PSF-2.0",
"the mit license": "MIT",
"mit license": "MIT",
}
@dataclass
class DepLicense:
name: str
version: str = ""
license: str = "UNKNOWN"
source: str = "" # where we found the dep (requirements.txt, package.json, etc.)
severity: Severity = Severity.UNKNOWN
message: str = ""
@dataclass
class LicenseReport:
project_dir: str
project_license: str = "MIT"
dependencies: list = field(default_factory=list)
summary: dict = field(default_factory=dict)
errors: list = field(default_factory=list)
warnings: list = field(default_factory=list)
def normalize_license(raw: str) -> str:
"""Normalize a license string to SPDX identifier."""
if not raw or raw.strip() in ("UNKNOWN", "UNKNOWN:", ""):
return "UNKNOWN"
cleaned = raw.strip().lower()
# Remove version specifiers like "MIT License (MIT)"
cleaned = re.sub(r"\(.*?\)", "", cleaned).strip()
cleaned = re.sub(r"\s+license$", "", cleaned).strip()
cleaned = re.sub(r"^the\s+", "", cleaned).strip()
if cleaned in LICENSE_ALIASES:
return LICENSE_ALIASES[cleaned]
# Check if it already looks like SPDX
upper = raw.strip()
if upper in COPYLEFT_FAMILIES or upper in PERMISSIVE_LICENSES:
return upper
return raw.strip()
def check_compatibility(dep_license: str, project_license: str) -> tuple[Severity, str]:
"""Check if a dependency license is compatible with the project license."""
if dep_license == "UNKNOWN":
return Severity.WARNING, "License unknown — manual review needed"
if dep_license in PERMISSIVE_LICENSES:
return Severity.OK, "Compatible (permissive)"
if dep_license in COPYLEFT_FAMILIES:
# Copyleft in a permissive project is a problem
if project_license in PERMISSIVE_LICENSES:
return Severity.ERROR, f"Copyleft ({dep_license}) in permissive ({project_license}) project"
# Copyleft in same family is OK
if dep_license.startswith(project_license.split("-")[0]):
return Severity.OK, "Compatible (same copyleft family)"
return Severity.WARNING, f"Review needed: {dep_license} with {project_license}"
return Severity.UNKNOWN, f"Unrecognized license: {dep_license}"
def parse_requirements_txt(path: str) -> list[DepLicense]:
"""Parse requirements.txt format."""
deps = []
with open(path) as f:
for line in f:
line = line.strip()
if not line or line.startswith("#") or line.startswith("-"):
continue
# Parse name==version or name>=version etc.
match = re.match(r"^([a-zA-Z0-9_.-]+)(?:[>=<!~].*)?$", line)
if match:
deps.append(DepLicense(name=match.group(1), source="requirements.txt"))
return deps
def parse_pyproject_toml(path: str) -> list[DepLicense]:
"""Parse pyproject.toml dependencies."""
deps = []
try:
# Use tomllib (Python 3.11+) or fall back to regex
import tomllib
with open(path, "rb") as f:
data = tomllib.load(f)
except ImportError:
# Fallback: regex parse
with open(path) as f:
content = f.read()
# Find [project.dependencies] section
match = re.search(r"\[project\]\s*dependencies\s*=\s*\[(.*?)\]", content, re.DOTALL)
if match:
for dep_str in re.findall(r'"([^"]+)"', match.group(1)):
name = re.match(r"^([a-zA-Z0-9_.-]+)", dep_str)
if name:
deps.append(DepLicense(name=name.group(1), source="pyproject.toml"))
return deps
project_deps = data.get("project", {}).get("dependencies", [])
for dep_str in project_deps:
name = re.match(r"^([a-zA-Z0-9_.-]+)", dep_str)
if name:
deps.append(DepLicense(name=name.group(1), source="pyproject.toml"))
return deps
def parse_package_json(path: str) -> list[DepLicense]:
"""Parse package.json dependencies."""
deps = []
with open(path) as f:
data = json.load(f)
for section in ("dependencies", "devDependencies"):
for name, version in data.get(section, {}).items():
deps.append(DepLicense(name=name, version=version, source="package.json"))
return deps
def parse_cargo_toml(path: str) -> list[DepLicense]:
"""Parse Cargo.toml dependencies (basic)."""
deps = []
with open(path) as f:
for line in f:
match = re.match(r'^([a-zA-Z0-9_-]+)\s*=\s*"', line.strip())
if match and line.strip()[0] != "[" and line.strip() != "[dependencies]":
deps.append(DepLicense(name=match.group(1), source="Cargo.toml"))
return deps
def parse_go_mod(path: str) -> list[DepLicense]:
"""Parse go.mod dependencies."""
deps = []
with open(path) as f:
in_require = False
for line in f:
line = line.strip()
if line == "require (":
in_require = True
continue
if line == ")" and in_require:
in_require = False
continue
if in_require:
parts = line.split()
if len(parts) >= 2:
deps.append(DepLicense(name=parts[0], version=parts[1], source="go.mod"))
return deps
def scan_dep_files(project_dir: str) -> list[DepLicense]:
"""Find and parse all dependency files in a project."""
all_deps = []
parsers = {
"requirements.txt": parse_requirements_txt,
"requirements-dev.txt": parse_requirements_txt,
"requirements_prod.txt": parse_requirements_txt,
"pyproject.toml": parse_pyproject_toml,
"setup.py": None, # TODO: parse setup.py
"package.json": parse_package_json,
"Cargo.toml": parse_cargo_toml,
"go.mod": parse_go_mod,
}
for filename, parser in parsers.items():
path = os.path.join(project_dir, filename)
if os.path.exists(path) and parser:
try:
deps = parser(path)
all_deps.extend(deps)
except Exception as e:
print(f"Warning: Failed to parse {filename}: {e}", file=sys.stderr)
# Also check subdirectories for monorepos (one level deep)
for entry in os.listdir(project_dir):
subdir = os.path.join(project_dir, entry)
if os.path.isdir(subdir) and not entry.startswith("."):
for filename, parser in parsers.items():
path = os.path.join(subdir, filename)
if os.path.exists(path) and parser:
try:
deps = parser(path)
for d in deps:
d.source = f"{entry}/{filename}"
all_deps.extend(deps)
except Exception:
pass
return all_deps
def lookup_pypi_license(package_name: str) -> str:
"""Look up license from PyPI API."""
try:
url = f"https://pypi.org/pypi/{package_name}/json"
req = urllib.request.Request(url, headers={"Accept": "application/json"})
resp = urllib.request.urlopen(req, timeout=10)
data = json.loads(resp.read())
# Try classifiers first
for classifier in data.get("info", {}).get("classifiers", []):
if classifier.startswith("License ::"):
parts = classifier.split(" :: ")
if len(parts) >= 3:
return parts[-1]
# Fall back to license field
lic = data.get("info", {}).get("license", "")
if lic and len(lic) < 100:
return lic
# Try license_expression
le = data.get("info", {}).get("license_expression", "")
if le:
return le
return "UNKNOWN"
except Exception:
return "UNKNOWN"
def lookup_npm_license(package_name: str) -> str:
"""Look up license from npm registry."""
try:
url = f"https://registry.npmjs.org/{package_name}"
req = urllib.request.Request(url, headers={"Accept": "application/json"})
resp = urllib.request.urlopen(req, timeout=10)
data = json.loads(resp.read())
lic = data.get("license", "UNKNOWN")
if isinstance(lic, dict):
lic = lic.get("type", "UNKNOWN")
return lic or "UNKNOWN"
except Exception:
return "UNKNOWN"
def detect_project_license(project_dir: str) -> str:
"""Detect the project's own license."""
for name in ("LICENSE", "LICENSE.md", "LICENSE.txt", "LICENCE", "COPYING"):
path = os.path.join(project_dir, name)
if os.path.exists(path):
with open(path) as f:
content = f.read().upper()
if "MIT LICENSE" in content or "MIT" in content[:200]:
return "MIT"
if "APACHE" in content and "2.0" in content:
return "Apache-2.0"
if "GNU GENERAL PUBLIC LICENSE" in content:
if "VERSION 3" in content:
return "GPL-3.0"
if "VERSION 2" in content:
return "GPL-2.0"
if "BSD" in content[:500]:
if "3-CLAUSE" in content or "THREE CLAUSE" in content:
return "BSD-3-Clause"
return "BSD-2-Clause"
if "ISC" in content[:200]:
return "ISC"
# Check pyproject.toml
pypath = os.path.join(project_dir, "pyproject.toml")
if os.path.exists(pypath):
with open(pypath) as f:
content = f.read()
match = re.search(r'license\s*=\s*\{\s*text\s*=\s*"([^"]+)"', content)
if match:
return normalize_license(match.group(1))
match = re.search(r'license\s*=\s*"([^"]+)"', content)
if match:
return normalize_license(match.group(1))
return "UNKNOWN"
def resolve_licenses(deps: list[DepLicense], cache: dict = None) -> None:
"""Resolve license info for all dependencies."""
if cache is None:
cache = {}
for dep in deps:
if dep.name in cache:
dep.license = cache[dep.name]
continue
# Determine ecosystem
if dep.source in ("package.json",):
raw = lookup_npm_license(dep.name)
else:
raw = lookup_pypi_license(dep.name)
dep.license = normalize_license(raw)
cache[dep.name] = dep.license
def generate_report(deps: list[DepLicense], project_license: str) -> LicenseReport:
"""Generate the compatibility report."""
report = LicenseReport(
project_dir="",
project_license=project_license,
dependencies=[],
)
counts = {"ok": 0, "warning": 0, "error": 0, "unknown": 0}
for dep in deps:
severity, message = check_compatibility(dep.license, project_license)
dep.severity = severity
dep.message = message
counts[severity.value] += 1
if severity == Severity.ERROR:
report.errors.append(f"{dep.name}: {message}")
elif severity == Severity.WARNING:
report.warnings.append(f"{dep.name}: {message}")
report.dependencies.append(asdict(dep))
report.summary = {
"total": len(deps),
**counts,
"project_license": project_license,
}
return report
def format_text(report: LicenseReport) -> str:
"""Format report as human-readable text."""
lines = []
lines.append("=" * 60)
lines.append(" LICENSE COMPATIBILITY REPORT")
lines.append("=" * 60)
lines.append(f" Project License: {report.project_license}")
lines.append(f" Dependencies: {report.summary.get('total', 0)}")
lines.append(f" OK: {report.summary.get('ok', 0)} "
f"WARN: {report.summary.get('warning', 0)} "
f"ERR: {report.summary.get('error', 0)} "
f"UNK: {report.summary.get('unknown', 0)}")
lines.append("-" * 60)
for dep in report.dependencies:
icon = {"ok": "[OK]", "warning": "[!!]", "error": "[XX]", "unknown": "[??]"}
sev = dep.get("severity", "unknown")
name = dep.get("name", "?")
lic = dep.get("license", "?")
msg = dep.get("message", "")
lines.append(f" {icon.get(sev, '[ ]')} {name:30s} {lic:20s} {msg}")
if report.errors:
lines.append("-" * 60)
lines.append(" ERRORS:")
for e in report.errors:
lines.append(f" - {e}")
if report.warnings:
lines.append("-" * 60)
lines.append(" WARNINGS:")
for w in report.warnings:
lines.append(f" - {w}")
lines.append("=" * 60)
return "\n".join(lines)
def main():
parser = argparse.ArgumentParser(description="License Checker — Pipeline 5.4")
parser.add_argument("project_dir", help="Project directory to scan")
parser.add_argument("--project-license", default=None,
help="Project license SPDX id (auto-detected if omitted)")
parser.add_argument("--format", choices=["json", "text"], default="text",
help="Output format")
parser.add_argument("--scan-deps", action="store_true",
help="Only scan and list deps (skip license lookup)")
args = parser.parse_args()
project_dir = os.path.abspath(args.project_dir)
if not os.path.isdir(project_dir):
print(f"Error: {project_dir} is not a directory", file=sys.stderr)
sys.exit(1)
# Detect project license
project_license = args.project_license or detect_project_license(project_dir)
# Scan deps
deps = scan_dep_files(project_dir)
if not deps:
print(f"No dependencies found in {project_dir}", file=sys.stderr)
sys.exit(0)
print(f"Found {len(deps)} dependencies", file=sys.stderr)
if args.scan_deps:
for d in deps:
print(f" {d.name} ({d.source})")
sys.exit(0)
# Resolve licenses
print("Resolving licenses...", file=sys.stderr)
resolve_licenses(deps)
# Generate report
report = generate_report(deps, project_license)
report.project_dir = project_dir
if args.format == "json":
print(json.dumps(asdict(report), indent=2, default=str))
else:
print(format_text(report))
# Exit code: 1 if errors, 0 otherwise
sys.exit(1 if report.errors else 0)
if __name__ == "__main__":
main()

View File

@@ -1,212 +0,0 @@
#!/usr/bin/env python3
"""
Comprehensive test script for knowledge extraction prompt.
Validates prompt structure, requirements, and consistency.
"""
import json
import re
from pathlib import Path
def test_prompt_structure():
"""Test that the prompt has the required structure."""
prompt_path = Path("templates/harvest-prompt.md")
if not prompt_path.exists():
return False, "harvest-prompt.md not found"
content = prompt_path.read_text()
# Check for required sections
required_sections = [
"System Prompt",
"Instructions",
"Categories",
"Output Format",
"Confidence Scoring",
"Constraints",
"Example"
]
for section in required_sections:
if section.lower() not in content.lower():
return False, f"Missing required section: {section}"
# Check for required categories
required_categories = ["fact", "pitfall", "pattern", "tool-quirk", "question"]
for category in required_categories:
if category not in content:
return False, f"Missing required category: {category}"
# Check for required output fields
required_fields = ["fact", "category", "repo", "confidence"]
for field in required_fields:
if field not in content:
return False, f"Missing required output field: {field}"
# Check prompt size (should be ~1k tokens, roughly 4k chars)
if len(content) > 5000:
return False, f"Prompt too large: {len(content)} chars (max ~5000)"
if len(content) < 1000:
return False, f"Prompt too small: {len(content)} chars (min ~1000)"
return True, "Prompt structure is valid"
def test_confidence_scoring():
"""Test that confidence scoring is properly defined."""
prompt_path = Path("templates/harvest-prompt.md")
content = prompt_path.read_text()
# Check for confidence scale definitions
confidence_levels = [
("0.9-1.0", "explicitly stated"),
("0.7-0.8", "clearly implied"),
("0.5-0.6", "suggested"),
("0.3-0.4", "inferred"),
("0.1-0.2", "speculative")
]
for level, description in confidence_levels:
if level not in content:
return False, f"Missing confidence level: {level}"
if description.lower() not in content.lower():
return False, f"Missing confidence description: {description}"
return True, "Confidence scoring is properly defined"
def test_example_quality():
"""Test that examples are clear and complete."""
prompt_path = Path("templates/harvest-prompt.md")
content = prompt_path.read_text()
# Check for example input/output
if "example" not in content.lower():
return False, "No examples provided"
# Check that example includes all categories
example_section = content[content.lower().find("example"):]
# Look for JSON example
json_match = re.search(r'\{[\s\S]*"knowledge"[\s\S]*\}', example_section)
if not json_match:
return False, "No JSON example found"
example_json = json_match.group(0)
# Check for all categories in example
for category in ["fact", "pitfall", "pattern", "tool-quirk", "question"]:
if category not in example_json:
return False, f"Example missing category: {category}"
return True, "Examples are clear and complete"
def test_constraint_coverage():
"""Test that constraints cover all requirements."""
prompt_path = Path("templates/harvest-prompt.md")
content = prompt_path.read_text()
required_constraints = [
"No hallucination",
"only extract",
"explicitly",
"partial",
"failed sessions",
"1k tokens"
]
for constraint in required_constraints:
if constraint.lower() not in content.lower():
return False, f"Missing constraint: {constraint}"
return True, "Constraints cover all requirements"
def test_test_sessions():
"""Test that test sessions exist and are valid."""
test_sessions_dir = Path("test_sessions")
if not test_sessions_dir.exists():
return False, "test_sessions directory not found"
session_files = list(test_sessions_dir.glob("*.jsonl"))
if len(session_files) < 5:
return False, f"Only {len(session_files)} test sessions found, need 5"
# Check each session file
for session_file in session_files:
content = session_file.read_text()
lines = content.strip().split("\n")
# Check that each line is valid JSON
for i, line in enumerate(lines, 1):
try:
json.loads(line)
except json.JSONDecodeError as e:
return False, f"Invalid JSON in {session_file.name}, line {i}: {e}"
return True, f"Found {len(session_files)} valid test sessions"
def run_all_tests():
"""Run all tests and return results."""
tests = [
("Prompt Structure", test_prompt_structure),
("Confidence Scoring", test_confidence_scoring),
("Example Quality", test_example_quality),
("Constraint Coverage", test_constraint_coverage),
("Test Sessions", test_test_sessions)
]
results = []
all_passed = True
for test_name, test_func in tests:
try:
passed, message = test_func()
results.append({
"test": test_name,
"passed": passed,
"message": message
})
if not passed:
all_passed = False
except Exception as e:
results.append({
"test": test_name,
"passed": False,
"message": f"Error: {str(e)}"
})
all_passed = False
# Print results
print("=" * 60)
print("HARVEST PROMPT TEST RESULTS")
print("=" * 60)
for result in results:
status = "✓ PASS" if result["passed"] else "✗ FAIL"
print(f"{status}: {result['test']}")
print(f" {result['message']}")
print()
print("=" * 60)
if all_passed:
print("ALL TESTS PASSED!")
else:
print("SOME TESTS FAILED!")
print("=" * 60)
return all_passed, results
if __name__ == "__main__":
all_passed, results = run_all_tests()
# Save results to file
with open("test_results.json", "w") as f:
json.dump({
"all_passed": all_passed,
"results": results,
"timestamp": "2026-04-14T19:05:00Z"
}, f, indent=2)
print(f"Results saved to test_results.json")
# Exit with appropriate code
exit(0 if all_passed else 1)

View File

@@ -0,0 +1,38 @@
#!/usr/bin/env python3
"""Validate knowledge files and index.json against the schema."""
import json, sys
from pathlib import Path
VALID_CATEGORIES = {"fact", "pitfall", "pattern", "tool-quirk", "question"}
REQUIRED = {"id", "fact", "category", "domain", "confidence"}
def validate_fact(fact, src=""):
errs = []
for f in REQUIRED:
if f not in fact: errs.append(f"{src}: missing '{f}'")
if "category" in fact and fact["category"] not in VALID_CATEGORIES:
errs.append(f"{src}: invalid category '{fact['category']}'")
if "confidence" in fact:
if not isinstance(fact["confidence"], (int, float)) or not (0 <= fact["confidence"] <= 1):
errs.append(f"{src}: confidence must be 0.0-1.0")
if "id" in fact:
parts = fact["id"].split(":")
if len(parts) != 3: errs.append(f"{src}: id must be domain:category:sequence")
return errs
def main():
idx = Path(__file__).parent.parent / "knowledge" / "index.json"
if not idx.exists(): print(f"FAILED: {idx} not found"); sys.exit(1)
data = json.load(open(idx))
errs = []
seen = set()
for i, f in enumerate(data.get("facts", [])):
errs.extend(validate_fact(f, f"[{i}]"))
if "id" in f:
if f["id"] in seen: errs.append(f"duplicate id '{f['id']}'")
seen.add(f["id"])
if errs:
print(f"FAILED - {len(errs)} errors:"); [print(f" x {e}") for e in errs]; sys.exit(1)
print(f"PASSED - {len(data.get('facts', []))} facts")
if __name__ == "__main__": main()

View File

@@ -1,18 +0,0 @@
{"role": "user", "content": "Deploy the application to production"}
{"role": "assistant", "content": "I'll deploy the application to production.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ssh deploy@prod-server 'cd /app && git pull && docker-compose up -d'"}}]}
{"role": "tool", "content": "ssh: connect to host prod-server port 22: Connection refused"}
{"role": "assistant", "content": "Connection refused. Let me check the server status.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ping -c 3 prod-server"}}]}
{"role": "tool", "content": "PING prod-server (192.168.1.100): 56 data bytes\nRequest timeout for icmp_seq 0\nRequest timeout for icmp_seq 1\nRequest timeout for icmp_seq 2\n--- prod-server ping statistics ---\n3 packets transmitted, 0 packets received, 100.0% packet loss"}
{"role": "assistant", "content": "The server appears to be down. Let me check if there's an alternative deployment method."}
{"role": "user", "content": "Try using the VPS instead"}
{"role": "assistant", "content": "I'll try deploying to the VPS.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ssh deploy@vps-server 'cd /app && git pull && docker-compose up -d'"}}]}
{"role": "tool", "content": "Permission denied (publickey)."}
{"role": "assistant", "content": "Permission denied. The SSH key might not be set up correctly. Let me check the SSH configuration.", "tool_calls": [{"function": "terminal", "arguments": {"command": "cat ~/.ssh/config"}}]}
{"role": "tool", "content": "Host vps-server\n HostName 167.99.126.228\n User deploy\n IdentityFile ~/.ssh/id_rsa_vps"}
{"role": "assistant", "content": "The SSH config looks correct. Let me check if the key exists.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ls -la ~/.ssh/id_rsa_vps"}}]}
{"role": "tool", "content": "ls: /Users/user/.ssh/id_rsa_vps: No such file or directory"}
{"role": "assistant", "content": "The SSH key doesn't exist. This is a pitfall - we need to generate the SSH key first."}
{"role": "user", "content": "Generate a new SSH key for the VPS"}
{"role": "assistant", "content": "I'll generate a new SSH key.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa_vps -N ''"}}]}
{"role": "tool", "content": "Generating public/private rsa key pair.\nYour identification has been saved in /Users/user/.ssh/id_rsa_vps\nYour public key has been saved in /Users/user/.ssh/id_rsa_vps.pub\nThe key fingerprint is:\nSHA256:abc123... user@machine"}
{"role": "assistant", "content": "SSH key generated. Now I need to add it to the VPS."}

View File

@@ -1,14 +0,0 @@
{"role": "user", "content": "Check the Gitea API for open issues"}
{"role": "assistant", "content": "I'll check the Gitea API for open issues.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -H 'Authorization: token $GITEA_TOKEN' https://forge.alexanderwhitestone.com/api/v1/repos/Timmy_Foundation/compounding-intelligence/issues?state=open"}}]}
{"role": "tool", "content": "curl: (6) Could not resolve host: forge.alexanderwhitestone.com"}
{"role": "assistant", "content": "DNS resolution failed. Let me check the network connectivity.", "tool_calls": [{"function": "terminal", "arguments": {"command": "nslookup forge.alexanderwhitestone.com"}}]}
{"role": "tool", "content": "Server: 8.8.8.8\nAddress: 8.8.8.8#53\n\n** server can't find forge.alexanderwhitestone.com: NXDOMAIN"}
{"role": "assistant", "content": "The domain doesn't resolve. This might be a DNS issue. Let me try using the IP address directly.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -H 'Authorization: token $GITEA_TOKEN' https://167.99.126.228/api/v1/repos/Timmy_Foundation/compounding-intelligence/issues?state=open"}}]}
{"role": "tool", "content": "curl: (60) SSL certificate problem: self signed certificate"}
{"role": "assistant", "content": "SSL certificate issue. The server is using a self-signed certificate. Let me try with -k flag to skip SSL verification.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -k -H 'Authorization: token $GITEA_TOKEN' https://167.99.126.228/api/v1/repos/Timmy_Foundation/compounding-intelligence/issues?state=open"}}]}
{"role": "tool", "content": "[]"}
{"role": "assistant", "content": "No open issues found. The API returned an empty array."}
{"role": "user", "content": "Check if there are any pull requests"}
{"role": "assistant", "content": "Let me check for pull requests.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -k -H 'Authorization: token $GITEA_TOKEN' https://167.99.126.228/api/v1/repos/Timmy_Foundation/compounding-intelligence/pulls?state=open"}}]}
{"role": "tool", "content": "[{"id": 20, "title": "feat: add harvester.py + session_reader.py — session knowledge extractor (closes #8)", "state": "open", "user": {"login": "Rockachopa"}, "created_at": "2026-04-14T17:28:20Z"}]"}
{"role": "assistant", "content": "Found 1 open pull request: PR #20 - "feat: add harvester.py + session_reader.py — session knowledge extractor (closes #8)" by Rockachopa."}

View File

@@ -1,10 +0,0 @@
{"role": "user", "content": "How do I deploy a new version of the application?"}
{"role": "assistant", "content": "Here's the deployment process:\n1. Run tests: pytest\n2. Build Docker image: docker build -t app:latest .\n3. Push to registry: docker push registry.example.com/app:latest\n4. Update production: ssh prod 'cd /app && docker-compose pull && docker-compose up -d'\n5. Verify: curl https://app.example.com/health"}
{"role": "user", "content": "What if tests fail?"}
{"role": "assistant", "content": "If tests fail:\n1. Check the error logs: pytest -v\n2. Fix the failing tests\n3. Re-run tests: pytest\n4. If still failing, check recent changes: git log --oneline -10\n5. Consider reverting: git revert HEAD"}
{"role": "user", "content": "What if the Docker build fails?"}
{"role": "assistant", "content": "If Docker build fails:\n1. Check Dockerfile syntax: docker build --no-cache -t app:latest .\n2. Check for missing dependencies: cat requirements.txt\n3. Check disk space: df -h\n4. Clean up Docker: docker system prune -f\n5. Try building with verbose output: docker build --progress=plain -t app:latest ."}
{"role": "user", "content": "What if the push to registry fails?"}
{"role": "assistant", "content": "If push to registry fails:\n1. Check registry credentials: docker login registry.example.com\n2. Check registry space: curl https://registry.example.com/v2/_catalog\n3. Check image size: docker images app:latest\n4. Try tagging with version: docker tag app:latest registry.example.com/app:v1.0.0\n5. Push with version tag: docker push registry.example.com/app:v1.0.0"}
{"role": "user", "content": "What if the production update fails?"}
{"role": "assistant", "content": "If production update fails:\n1. Check SSH connection: ssh prod 'echo connected'\n2. Check disk space on prod: ssh prod 'df -h'\n3. Check Docker status: ssh prod 'docker ps -a'\n4. Check logs: ssh prod 'docker-compose logs --tail=100'\n5. Rollback: ssh prod 'cd /app && git checkout HEAD~1 && docker-compose up -d'"}

View File

@@ -1,12 +0,0 @@
{"role": "user", "content": "I need to implement a new feature for the application"}
{"role": "assistant", "content": "I can help you implement a new feature. What feature would you like to add?"}
{"role": "user", "content": "I want to add user authentication"}
{"role": "assistant", "content": "User authentication is a good feature. I have a few questions:\n1. What authentication method do you prefer? (JWT, OAuth, Session-based)\n2. Do you need social login? (Google, GitHub, etc.)\n3. What user data do you need to store?\n4. Do you need role-based access control?\n5. What's your database? (PostgreSQL, MySQL, SQLite)"}
{"role": "user", "content": "I prefer JWT, no social login, just email and password, no roles, PostgreSQL"}
{"role": "assistant", "content": "Great choices. Here's the implementation plan:\n1. Create user model with email and password hash\n2. Create JWT token generation and validation\n3. Create login endpoint: POST /api/auth/login\n4. Create register endpoint: POST /api/auth/register\n5. Create middleware for protected routes\n6. Add password hashing with bcrypt\n7. Add rate limiting for login attempts\n8. Add password reset functionality\n9. Add email verification\n10. Test all endpoints"}
{"role": "user", "content": "What about password reset?"}
{"role": "assistant", "content": "For password reset:\n1. Create reset token table\n2. Create forgot password endpoint: POST /api/auth/forgot-password\n3. Send reset email with token\n4. Create reset password endpoint: POST /api/auth/reset-password\n5. Validate reset token and expiration\n6. Update password and delete token\n7. Send confirmation email"}
{"role": "user", "content": "What about email verification?"}
{"role": "assistant", "content": "For email verification:\n1. Add email_verified field to user model\n2. Create verification token table\n3. Send verification email on registration\n4. Create verify email endpoint: GET /api/auth/verify-email?token=...\n5. Validate token and update email_verified\n6. Add middleware to check email_verified for certain routes\n7. Add resend verification email endpoint"}
{"role": "user", "content": "What about rate limiting?"}
{"role": "assistant", "content": "For rate limiting:\n1. Use Redis for rate limit storage\n2. Create rate limit middleware\n3. Set limits: 5 login attempts per minute, 3 password resets per hour\n4. Return 429 Too Many Requests when limit exceeded\n5. Add X-RateLimit headers to responses\n6. Log rate limit violations\n7. Add whitelist for trusted IPs"}

View File

@@ -1,13 +0,0 @@
{"role": "user", "content": "Clone the repo at forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence"}
{"role": "assistant", "content": "I'll clone the repository for you.", "tool_calls": [{"function": "terminal", "arguments": {"command": "git clone https://forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence.git"}}]}
{"role": "tool", "content": "Cloning into 'compounding-intelligence'...\nremote: Enumerating objects: 15, done.\nremote: Counting objects: 100% (15/15), done.\nremote: Compressing objects: 100% (12/12), done.\nremote: Total 15 (delta 2), reused 0 (delta 0), pack-reused 0\nReceiving objects: 100% (15/15), done.\nResolving deltas: 100% (2/2), done."}
{"role": "assistant", "content": "Repository cloned successfully. Let me check the structure.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ls -la compounding-intelligence/"}}]}
{"role": "tool", "content": "total 8\ndrwxr-xr-x 5 user staff 160 Apr 14 17:21 .\ndrwxr-xr-x 12 user staff 384 Apr 14 17:21 ..\ndrwxr-xr-x 13 user staff 416 Apr 14 17:21 .git\n-rw-r--r-- 1 user staff 1234 Apr 14 17:21 README.md\ndrwxr-xr-x 3 user staff 96 Apr 14 17:21 knowledge\n-rw-r--r-- 1 user staff 456 Apr 14 17:21 metrics\ndrwxr-xr-x 4 user staff 128 Apr 14 17:21 scripts\ndrwxr-xr-x 3 user staff 96 Apr 14 17:21 templates"}
{"role": "user", "content": "What files are in the templates directory?"}
{"role": "assistant", "content": "Let me check the templates directory.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ls -la compounding-intelligence/templates/"}}]}
{"role": "tool", "content": "total 0\ndrwxr-xr-x 3 user staff 96 Apr 14 17:21 .\ndrwxr-xr-x 5 user staff 160 Apr 14 17:21 ..\n-rw-r--r-- 1 user staff 0 Apr 14 17:21 .gitkeep"}
{"role": "assistant", "content": "The templates directory currently only contains a .gitkeep file. No templates have been added yet."}
{"role": "user", "content": "Create a harvest-prompt.md template in the templates directory."}
{"role": "assistant", "content": "I'll create the harvest-prompt.md template for you.", "tool_calls": [{"function": "terminal", "arguments": {"command": "cat > compounding-intelligence/templates/harvest-prompt.md << 'EOF'\n# Knowledge Extraction Prompt\n\nExtract knowledge from session transcripts.\nEOF"}}]}
{"role": "tool", "content": ""}
{"role": "assistant", "content": "Template created successfully. The harvest-prompt.md file is now in the templates directory."}

View File

@@ -1,186 +0,0 @@
#!/usr/bin/env python3
"""Tests for license_checker.py — Pipeline 5.4"""
import json
import os
import sys
import tempfile
import unittest
# Add scripts dir to path
sys.path.insert(0, os.path.dirname(__file__))
from license_checker import (
normalize_license,
check_compatibility,
parse_requirements_txt,
parse_package_json,
parse_pyproject_toml,
parse_go_mod,
detect_project_license,
scan_dep_files,
generate_report,
format_text,
Severity,
DepLicense,
)
class TestNormalizeLicense(unittest.TestCase):
def test_mit_aliases(self):
for alias in ["mit", "MIT License", "The MIT License", "MIT license"]:
self.assertEqual(normalize_license(alias), "MIT")
def test_apache_aliases(self):
for alias in ["Apache 2.0", "Apache-2.0", "apache software license"]:
self.assertEqual(normalize_license(alias), "Apache-2.0")
def test_gpl_aliases(self):
self.assertEqual(normalize_license("GPL-3.0"), "GPL-3.0")
self.assertEqual(normalize_license("gplv3"), "GPL-3.0")
def test_unknown(self):
self.assertEqual(normalize_license(""), "UNKNOWN")
self.assertEqual(normalize_license("UNKNOWN"), "UNKNOWN")
def test_already_spdx(self):
self.assertEqual(normalize_license("BSD-3-Clause"), "BSD-3-Clause")
class TestCheckCompatibility(unittest.TestCase):
def test_permissive_ok(self):
sev, msg = check_compatibility("MIT", "MIT")
self.assertEqual(sev, Severity.OK)
def test_gpl_in_mit_error(self):
sev, msg = check_compatibility("GPL-3.0", "MIT")
self.assertEqual(sev, Severity.ERROR)
def test_unknown_warning(self):
sev, msg = check_compatibility("UNKNOWN", "MIT")
self.assertEqual(sev, Severity.WARNING)
def test_apache_in_mit_ok(self):
sev, msg = check_compatibility("Apache-2.0", "MIT")
self.assertEqual(sev, Severity.OK)
def test_lgpl_in_mit_error(self):
sev, msg = check_compatibility("LGPL-3.0", "MIT")
self.assertEqual(sev, Severity.ERROR)
class TestParseRequirements(unittest.TestCase):
def test_basic(self):
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as f:
f.write("requests>=2.28.0\nflask==2.3.0\n# comment\npytest\n")
f.flush()
deps = parse_requirements_txt(f.name)
os.unlink(f.name)
names = [d.name for d in deps]
self.assertIn("requests", names)
self.assertIn("flask", names)
self.assertIn("pytest", names)
self.assertEqual(len(deps), 3)
def test_skip_flags(self):
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as f:
f.write("-r other.txt\n--index-url https://pypi.org\nreal-dep\n")
f.flush()
deps = parse_requirements_txt(f.name)
os.unlink(f.name)
self.assertEqual(len(deps), 1)
self.assertEqual(deps[0].name, "real-dep")
class TestParsePackageJson(unittest.TestCase):
def test_basic(self):
data = {
"dependencies": {"express": "^4.18.0", "lodash": "^4.17.21"},
"devDependencies": {"jest": "^29.0.0"},
}
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
json.dump(data, f)
f.flush()
deps = parse_package_json(f.name)
os.unlink(f.name)
names = [d.name for d in deps]
self.assertIn("express", names)
self.assertIn("jest", names)
self.assertEqual(len(deps), 3)
class TestParseGoMod(unittest.TestCase):
def test_basic(self):
content = """module example.com/mymod
go 1.21
require (
github.com/gin-gonic/gin v1.9.1
github.com/stretchr/testify v1.8.4
)
"""
with tempfile.NamedTemporaryFile(mode="w", suffix=".mod", delete=False) as f:
f.write(content)
f.flush()
deps = parse_go_mod(f.name)
os.unlink(f.name)
self.assertEqual(len(deps), 2)
self.assertEqual(deps[0].name, "github.com/gin-gonic/gin")
class TestDetectProjectLicense(unittest.TestCase):
def test_mit_file(self):
with tempfile.TemporaryDirectory() as d:
with open(os.path.join(d, "LICENSE"), "w") as f:
f.write("MIT License\n\nCopyright (c) 2024...\n")
self.assertEqual(detect_project_license(d), "MIT")
def test_apache_file(self):
with tempfile.TemporaryDirectory() as d:
with open(os.path.join(d, "LICENSE"), "w") as f:
f.write("Apache License Version 2.0...")
self.assertEqual(detect_project_license(d), "Apache-2.0")
def test_no_license(self):
with tempfile.TemporaryDirectory() as d:
self.assertEqual(detect_project_license(d), "UNKNOWN")
class TestScanDeps(unittest.TestCase):
def test_multi_ecosystem(self):
with tempfile.TemporaryDirectory() as d:
with open(os.path.join(d, "requirements.txt"), "w") as f:
f.write("flask\nrequests\n")
with open(os.path.join(d, "package.json"), "w") as f:
json.dump({"dependencies": {"express": "^4.0.0"}}, f)
deps = scan_dep_files(d)
names = [d.name for d in deps]
self.assertIn("flask", names)
self.assertIn("express", names)
class TestGenerateReport(unittest.TestCase):
def test_basic(self):
deps = [
DepLicense(name="flask", license="BSD-3-Clause", source="requirements.txt"),
DepLicense(name="gpl-pkg", license="GPL-3.0", source="requirements.txt"),
DepLicense(name="unknown-pkg", license="UNKNOWN", source="requirements.txt"),
]
report = generate_report(deps, "MIT")
self.assertEqual(report.summary["ok"], 1)
self.assertEqual(report.summary["error"], 1)
self.assertEqual(report.summary["warning"], 1)
self.assertEqual(len(report.errors), 1)
self.assertIn("gpl-pkg", report.errors[0])
def test_format_text(self):
deps = [DepLicense(name="flask", license="BSD-3-Clause", source="requirements.txt")]
report = generate_report(deps, "MIT")
text = format_text(report)
self.assertIn("LICENSE COMPATIBILITY REPORT", text)
self.assertIn("flask", text)
if __name__ == "__main__":
unittest.main()