Alexander Whitestone
|
931d9f3e10
|
feat: bounded hash dedup with daily rotation (#628)
Architecture Lint / Linter Tests (pull_request) Successful in 19s
PR Checklist / pr-checklist (pull_request) Failing after 3m23s
Smoke Test / smoke (pull_request) Failing after 16s
Validate Config / YAML Lint (pull_request) Failing after 13s
Validate Config / JSON Validate (pull_request) Successful in 21s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m24s
Validate Config / Shell Script Lint (pull_request) Failing after 1m14s
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 19s
Validate Config / Playbook Schema Validation (pull_request) Successful in 23s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
scripts/hash_dedup.py:
HashDedup class with SHA-256 deduplication
Date-stamped files: .hashes/YYYY-MM-DD.json
Daily rotation prevents unbounded growth
cleanup(keep_days=N) removes old files
max_hashes_per_file cap (default 100K)
is_duplicate(), is_duplicate_any(lookback_days)
add(), add_batch(), stats()
CLI: --check, --add, --file, --cleanup, --stats
tests/test_hash_dedup.py: 17 tests
hash computation, dedup logic, batch add,
file format, cleanup rotation, stats, max cap
Fixes unbounded growth by:
1. Daily file rotation
2. Configurable retention (cleanup keep_days)
3. Per-file hash cap (max_hashes_per_file)
|
2026-04-15 20:54:46 -04:00 |
|