Timmy
cc215e3ed7
Test / pytest (pull_request) Failing after 21s
feat: knowledge deduplication — content hash + token similarity (#196)
Dedup module for knowledge entries with:
- SHA256 content hashing for exact duplicates
- Token Jaccard similarity for near-duplicates (default 0.95)
- Quality-based merge: keeps higher confidence/source_count
- Metadata merging: tags, related, source_count
- Dry-run mode
- 30 tests passing
- Built-in --test mode with generated duplicates
Usage:
python scripts/dedup.py --input knowledge/index.json
python scripts/dedup.py --input knowledge/index.json --dry-run
python scripts/dedup.py --test
Closes #196.
2026-04-21 07:58:09 -04:00
..
2026-04-14 11:17:01 -04:00
2026-04-15 15:06:09 +00:00
2026-04-14 14:05:30 -04:00
2026-04-15 03:46:43 +00:00
2026-04-21 07:58:09 -04:00
2026-04-15 03:44:12 +00:00
2026-04-15 03:56:27 +00:00
2026-04-15 03:49:00 +00:00
2026-04-14 14:03:30 -04:00
2026-04-15 14:47:26 +00:00
2026-04-15 14:42:28 +00:00
2026-04-15 04:00:12 +00:00
2026-04-15 14:47:27 +00:00
2026-04-15 10:52:51 -04:00
2026-04-15 10:54:58 -04:00
2026-04-15 03:02:12 +00:00
2026-04-14 19:06:16 +00:00
2026-04-15 03:39:08 +00:00
2026-04-14 14:03:30 -04:00
2026-04-15 14:53:43 +00:00
2026-04-14 14:05:30 -04:00
2026-04-15 03:57:21 +00:00
2026-04-15 03:50:04 +00:00
2026-04-14 19:02:41 +00:00
2026-04-14 14:05:30 -04:00
2026-04-14 14:03:30 -04:00
2026-04-15 14:47:30 +00:00
2026-04-15 04:00:46 +00:00
2026-04-15 10:52:51 -04:00
2026-04-15 10:54:58 -04:00
2026-04-15 03:39:09 +00:00
2026-04-14 14:21:21 -04:00