feat: session garbage collection (#315) #383

Merged
Rockachopa merged 1 commits from feat/315-session-gc into main 2026-04-14 00:33:17 +00:00
Owner

What

Session garbage collection for empty/trivial sessions.

Changes

hermes_state.py — new garbage_collect() method on SessionDB:

  • Empty sessions (0 messages) older than 24 hours → deleted
  • Trivial sessions (1–5 messages) older than 7 days → deleted
  • Sessions with >5 messages → kept indefinitely
  • Active (not ended) sessions → never touched
  • Handles child session FK constraints
  • Supports dry_run=True for preview, source= filtering

hermes_cli/main.py — new hermes sessions gc subcommand:

  • --empty-hours (default: 24)
  • --trivial-days (default: 7)
  • --trivial-max (default: 5)
  • --source filter
  • --dry-run preview mode
  • --yes skip confirmation

tests/test_hermes_state.py — 7 new tests covering all scenarios.

Why

Empirical audit found 67.3% of sessions have <=5 messages. 3,564 are completely empty. This reduces state.db noise and file size.

Testing

pytest tests/test_hermes_state.py::TestGarbageCollect -v
→ 7 passed
pytest tests/test_hermes_state.py::TestPruneSessions -v
→ 3 passed (no regression)

Closes #315

## What Session garbage collection for empty/trivial sessions. ## Changes **`hermes_state.py`** — new `garbage_collect()` method on `SessionDB`: - Empty sessions (0 messages) older than 24 hours → deleted - Trivial sessions (1–5 messages) older than 7 days → deleted - Sessions with >5 messages → kept indefinitely - Active (not ended) sessions → never touched - Handles child session FK constraints - Supports `dry_run=True` for preview, `source=` filtering **`hermes_cli/main.py`** — new `hermes sessions gc` subcommand: - `--empty-hours` (default: 24) - `--trivial-days` (default: 7) - `--trivial-max` (default: 5) - `--source` filter - `--dry-run` preview mode - `--yes` skip confirmation **`tests/test_hermes_state.py`** — 7 new tests covering all scenarios. ## Why Empirical audit found 67.3% of sessions have <=5 messages. 3,564 are completely empty. This reduces state.db noise and file size. ## Testing ``` pytest tests/test_hermes_state.py::TestGarbageCollect -v → 7 passed pytest tests/test_hermes_state.py::TestPruneSessions -v → 3 passed (no regression) ``` Closes #315
Rockachopa added 4 commits 2026-04-13 21:31:26 +00:00
fix(poka-yoke): add circuit breaker for error cascading (#309)
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 28s
110642d86a
After 3 consecutive tool errors, inject a warning into the tool result
advising the agent to switch strategies. Escalates at 6 and 9+ errors.

Empirical data from audit:
- P(error | prev error) = 58.6% vs P(error | prev success) = 25.2%
- 2.33x cascade amplification factor
- Max observed streak: 31 consecutive errors

Intervention tiers:
- 3 errors: advisory warning (try different tool, use terminal, simplify)
- 6 errors: urgent stop (halt retries, investigate or switch)
- 9+ errors: terminal-only recovery path

Tracks errors in both sequential and concurrent execution paths.
fix(poka-yoke): add tool fixation detection (#310)
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 26s
ec3cd2081b
Detect when the same tool is called 5+ times consecutively and inject
a nudge advising the agent to diversify its approach.

Evidence from empirical audit:
- Top marathon session (qwen, 1643 msgs): execute_code streak of 20
- Opus session (1472 msgs): terminal streak of 10

The nudge fires every 5 consecutive calls (5, 10, 15...) so it
persists without being spammy. Tracks independently in both
sequential and concurrent execution paths.
fix: gateway config debt - validation, defaults, fallback chain checks (#328)
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 1m32s
992498463e
- Expand validate_config_structure() to catch:
  - fallback_providers format errors (non-list, missing provider/model)
  - session_reset.idle_minutes <= 0 (causes immediate resets)
  - session_reset.at_hour out of 0-23 range
  - API_SERVER enabled without API_SERVER_KEY
  - Unknown root-level keys that look like misplaced custom_providers fields
- Add _validate_fallback_providers() in gateway/config.py to validate
  fallback chain at gateway startup (logs warnings for malformed entries)
- Add API_SERVER_KEY check in gateway config loader (warns on unauthenticated endpoint)
- Expand _KNOWN_ROOT_KEYS to include all valid top-level config sections
  (session_reset, browser, checkpoints, voice, stt, tts, etc.)
- Add 13 new tests for fallback_providers and session_reset validation
- All existing tests pass (47/47)

Closes #328
feat: session garbage collection (#315)
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 14s
69e10967bd
Add garbage_collect() method to SessionDB that cleans up empty and
trivial sessions based on age:
- Empty sessions (0 messages) older than 24h
- Trivial sessions (1-5 messages) older than 7 days
- Sessions with >5 messages kept indefinitely

Add `hermes sessions gc` CLI command with:
- --empty-hours (default: 24)
- --trivial-days (default: 7)
- --trivial-max (default: 5)
- --source filter
- --dry-run preview mode
- --yes skip confirmation

The dry-run flow: preview what would be deleted, ask for confirmation,
then execute. Handles child session FK constraints properly.

7 tests covering: empty/trivial deletion, active session protection,
substantial session preservation, dry-run, source filtering, and child
session handling.

Closes #315
Timmy approved these changes 2026-04-13 22:08:55 +00:00
Dismissed
Timmy left a comment
Owner

Review: Session GC + Config Validation + Circuit Breaker (#315, #328, #309, #310)

This PR bundles four distinct features. Each is reasonable individually, but bundling them makes review and rollback harder. Consider splitting in the future.

Session Garbage Collection (#315)

Clean implementation. The GC policy (empty >24h, trivial >7d, substantial kept) is sensible. Good that active sessions are never touched.

Minor concern: The child session deletion logic deletes children of GC targets even if those children are substantial sessions (>5 messages). If a parent was empty but a child has 50 messages, the child gets deleted anyway. This may be intentional but is worth calling out -- consider checking child message counts before cascading.

Config Validation (#328)

Good additions:

  • fallback_providers structure validation catches a real class of user errors.
  • session_reset.idle_minutes positive check catches the =0 immediate-reset bug.
  • _KNOWN_ROOT_KEYS expansion reduces false-positive "unknown key" warnings.
  • API_SERVER_KEY check is a nice security nudge.

Nit: The warning message for API_SERVER_KEY references ~/.hermes/.env literally rather than using display_hermes_home(). This is the exact pattern PR #384 is trying to eliminate.

Circuit Breaker (#309) and Tool Fixation (#310)

Data-driven approach is good. The escalating warning levels (3/6/9+ errors) are well thought out.

Issue: The circuit breaker and tool fixation tracking happen in both the concurrent (_run_tool threaded) and sequential tool execution paths. In the threaded path, self._consecutive_tool_errors, self._error_streak_tool_names, self._same_tool_streak, and self._last_tool_name are all modified from multiple threads with no synchronization. This means:

  • Counters may increment incorrectly under concurrency
  • The _error_streak_tool_names list could have concurrent append/clear races

Since these are advisory nudges this is probably acceptable, but the duplicated code across both paths should ideally be extracted into helper methods.

Tests

Good coverage for GC (including child sessions, source filtering, dry-run) and config validation. No tests for the circuit breaker or tool fixation features, which would be valuable.

Verdict

Overall solid work. The main concerns are the child-cascade GC behavior and the thread-safety of circuit breaker counters, neither of which are blocking.

## Review: Session GC + Config Validation + Circuit Breaker (#315, #328, #309, #310) This PR bundles four distinct features. Each is reasonable individually, but bundling them makes review and rollback harder. Consider splitting in the future. ### Session Garbage Collection (#315) Clean implementation. The GC policy (empty >24h, trivial >7d, substantial kept) is sensible. Good that active sessions are never touched. **Minor concern**: The child session deletion logic deletes children of GC targets even if those children are *substantial* sessions (>5 messages). If a parent was empty but a child has 50 messages, the child gets deleted anyway. This may be intentional but is worth calling out -- consider checking child message counts before cascading. ### Config Validation (#328) Good additions: - `fallback_providers` structure validation catches a real class of user errors. - `session_reset.idle_minutes` positive check catches the `=0` immediate-reset bug. - `_KNOWN_ROOT_KEYS` expansion reduces false-positive "unknown key" warnings. - API_SERVER_KEY check is a nice security nudge. **Nit**: The warning message for API_SERVER_KEY references `~/.hermes/.env` literally rather than using `display_hermes_home()`. This is the exact pattern PR #384 is trying to eliminate. ### Circuit Breaker (#309) and Tool Fixation (#310) Data-driven approach is good. The escalating warning levels (3/6/9+ errors) are well thought out. **Issue**: The circuit breaker and tool fixation tracking happen in both the concurrent (`_run_tool` threaded) and sequential tool execution paths. In the threaded path, `self._consecutive_tool_errors`, `self._error_streak_tool_names`, `self._same_tool_streak`, and `self._last_tool_name` are all modified from multiple threads with no synchronization. This means: - Counters may increment incorrectly under concurrency - The `_error_streak_tool_names` list could have concurrent append/clear races Since these are advisory nudges this is probably acceptable, but the duplicated code across both paths should ideally be extracted into helper methods. ### Tests Good coverage for GC (including child sessions, source filtering, dry-run) and config validation. No tests for the circuit breaker or tool fixation features, which would be valuable. ### Verdict Overall solid work. The main concerns are the child-cascade GC behavior and the thread-safety of circuit breaker counters, neither of which are blocking.
Timmy approved these changes 2026-04-14 00:11:53 +00:00
Timmy left a comment
Owner

Good config validation additions with solid test coverage.

The new _validate_fallback_providers() function in gateway/config.py properly validates structure at startup, and the CLI-side validation in hermes_cli/config.py mirrors it with user-friendly ConfigIssue messages. The expanded _KNOWN_ROOT_KEYS set is a welcome improvement to reduce false-positive "unknown key" warnings.

The session GC feature in hermes_state.py is well-designed: the ended_at IS NOT NULL guard prevents deleting active sessions, and the dry-run preview before actual deletion is a nice UX touch. However, the child-session deletion loop (lines ~308-317) iterates one-by-one with individual DELETE statements -- for large databases this could be slow. Consider using DELETE FROM messages WHERE session_id IN (...) with a batched approach.

The circuit breaker and tool fixation detection in run_agent.py are clever poka-yoke mechanisms. One concern: _same_tool_streak is incremented in both the concurrent _run_tool worker and the sequential path without any thread synchronization. With concurrent tool execution, this counter could have race conditions. Consider using threading.Lock or threading.atomic for the streak counter.

The bare except Exception: pass in _validate_fallback_providers() silently swallows all errors including YAML parse failures. At minimum, log at DEBUG level so users can diagnose config loading issues.

Tests are thorough and well-structured. Overall a solid PR.

**Good config validation additions with solid test coverage.** The new `_validate_fallback_providers()` function in `gateway/config.py` properly validates structure at startup, and the CLI-side validation in `hermes_cli/config.py` mirrors it with user-friendly ConfigIssue messages. The expanded `_KNOWN_ROOT_KEYS` set is a welcome improvement to reduce false-positive "unknown key" warnings. The session GC feature in `hermes_state.py` is well-designed: the `ended_at IS NOT NULL` guard prevents deleting active sessions, and the dry-run preview before actual deletion is a nice UX touch. However, the child-session deletion loop (lines ~308-317) iterates one-by-one with individual DELETE statements -- for large databases this could be slow. Consider using `DELETE FROM messages WHERE session_id IN (...)` with a batched approach. The circuit breaker and tool fixation detection in `run_agent.py` are clever poka-yoke mechanisms. One concern: `_same_tool_streak` is incremented in both the concurrent `_run_tool` worker and the sequential path without any thread synchronization. With concurrent tool execution, this counter could have race conditions. Consider using `threading.Lock` or `threading.atomic` for the streak counter. The bare `except Exception: pass` in `_validate_fallback_providers()` silently swallows all errors including YAML parse failures. At minimum, log at DEBUG level so users can diagnose config loading issues. Tests are thorough and well-structured. Overall a solid PR.
Rockachopa merged commit 37af40a38e into main 2026-04-14 00:33:17 +00:00
Author
Owner

Closing: stale — session GC, no clear issue or superseded. Triage by KETER — fleet-ops#143.

Closing: stale — session GC, no clear issue or superseded. Triage by KETER — fleet-ops#143.
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/hermes-agent#383