|
|
cc215e3ed7
|
feat: knowledge deduplication — content hash + token similarity (#196)
Test / pytest (pull_request) Failing after 21s
Dedup module for knowledge entries with:
- SHA256 content hashing for exact duplicates
- Token Jaccard similarity for near-duplicates (default 0.95)
- Quality-based merge: keeps higher confidence/source_count
- Metadata merging: tags, related, source_count
- Dry-run mode
- 30 tests passing
- Built-in --test mode with generated duplicates
Usage:
python scripts/dedup.py --input knowledge/index.json
python scripts/dedup.py --input knowledge/index.json --dry-run
python scripts/dedup.py --test
Closes #196.
|
2026-04-21 07:58:09 -04:00 |
|
Alexander Whitestone
|
e1e42c3f8e
|
feat: quality gate — score and filter knowledge entries (#198)
Test / pytest (pull_request) Failing after 34s
quality_gate.py:
4-dimension scoring (0.0-1.0):
specificity (0.3): concrete examples vs vague
actionability (0.3): can this be used?
freshness (0.2): exponential decay over time
source_quality (0.2): model reliability score
filter_entries(entries, threshold=0.5)
quality_report() — distribution + pass rate
CLI: --threshold, --json, --filter
tests/test_quality_gate.py: 14 tests
specificity: specific high, vague low, empty baseline
actionability: actionable high, abstract low
freshness: recent high, old low, none baseline
source: claude high, ollama low, unknown default
entry: good high, poor low
filter: removes low quality
|
2026-04-20 20:31:04 -04:00 |
|