Files
compounding-intelligence/tests/test_dedup.py
Timmy cc215e3ed7
Some checks failed
Test / pytest (pull_request) Failing after 21s
feat: knowledge deduplication — content hash + token similarity (#196)
Dedup module for knowledge entries with:
- SHA256 content hashing for exact duplicates
- Token Jaccard similarity for near-duplicates (default 0.95)
- Quality-based merge: keeps higher confidence/source_count
- Metadata merging: tags, related, source_count
- Dry-run mode
- 30 tests passing
- Built-in --test mode with generated duplicates

Usage:
  python scripts/dedup.py --input knowledge/index.json
  python scripts/dedup.py --input knowledge/index.json --dry-run
  python scripts/dedup.py --test

Closes #196.
2026-04-21 07:58:09 -04:00

6.7 KiB