Commit Graph

1 Commits

Author SHA1 Message Date
Alexander Whitestone
931d9f3e10 feat: bounded hash dedup with daily rotation (#628)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 19s
PR Checklist / pr-checklist (pull_request) Failing after 3m23s
Smoke Test / smoke (pull_request) Failing after 16s
Validate Config / YAML Lint (pull_request) Failing after 13s
Validate Config / JSON Validate (pull_request) Successful in 21s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m24s
Validate Config / Shell Script Lint (pull_request) Failing after 1m14s
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 19s
Validate Config / Playbook Schema Validation (pull_request) Successful in 23s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
scripts/hash_dedup.py:
  HashDedup class with SHA-256 deduplication
  Date-stamped files: .hashes/YYYY-MM-DD.json
  Daily rotation prevents unbounded growth
  cleanup(keep_days=N) removes old files
  max_hashes_per_file cap (default 100K)
  is_duplicate(), is_duplicate_any(lookback_days)
  add(), add_batch(), stats()
  CLI: --check, --add, --file, --cleanup, --stats

tests/test_hash_dedup.py: 17 tests
  hash computation, dedup logic, batch add,
  file format, cleanup rotation, stats, max cap

Fixes unbounded growth by:
  1. Daily file rotation
  2. Configurable retention (cleanup keep_days)
  3. Per-file hash cap (max_hashes_per_file)
2026-04-15 20:54:46 -04:00