- Expand validate_config_structure() to catch:
- fallback_providers format errors (non-list, missing provider/model)
- session_reset.idle_minutes <= 0 (causes immediate resets)
- session_reset.at_hour out of 0-23 range
- API_SERVER enabled without API_SERVER_KEY
- Unknown root-level keys that look like misplaced custom_providers fields
- Add _validate_fallback_providers() in gateway/config.py to validate
fallback chain at gateway startup (logs warnings for malformed entries)
- Add API_SERVER_KEY check in gateway config loader (warns on unauthenticated endpoint)
- Expand _KNOWN_ROOT_KEYS to include all valid top-level config sections
(session_reset, browser, checkpoints, voice, stt, tts, etc.)
- Add 13 new tests for fallback_providers and session_reset validation
- All existing tests pass (47/47)
Closes#328
Detect when the same tool is called 5+ times consecutively and inject
a nudge advising the agent to diversify its approach.
Evidence from empirical audit:
- Top marathon session (qwen, 1643 msgs): execute_code streak of 20
- Opus session (1472 msgs): terminal streak of 10
The nudge fires every 5 consecutive calls (5, 10, 15...) so it
persists without being spammy. Tracks independently in both
sequential and concurrent execution paths.
After 3 consecutive tool errors, inject a warning into the tool result
advising the agent to switch strategies. Escalates at 6 and 9+ errors.
Empirical data from audit:
- P(error | prev error) = 58.6% vs P(error | prev success) = 25.2%
- 2.33x cascade amplification factor
- Max observed streak: 31 consecutive errors
Intervention tiers:
- 3 errors: advisory warning (try different tool, use terminal, simplify)
- 6 errors: urgent stop (halt retries, investigate or switch)
- 9+ errors: terminal-only recovery path
Tracks errors in both sequential and concurrent execution paths.
Fixes#352
Problem: When the gateway restarts, Python's interpreter enters
shutdown phase while the last cron tick is still processing jobs.
ThreadPoolExecutor.submit() raises RuntimeError("cannot schedule
new futures after interpreter shutdown") for every remaining job.
This cascades through the entire tick queue.
Fix (two-part):
1. run_job(): Wrap ThreadPoolExecutor creation + submit in try/except.
On RuntimeError, fall back to synchronous execution (same thread)
so the job at least attempts instead of dying silently.
2. tick(): Check sys.is_finalizing() before each job. If the
interpreter is shutting down, stop processing immediately
instead of wasting time on doomed ThreadPoolExecutor.submit() calls.
Issue #342: Cron ticker thread not starting in gateway
Root cause: asyncio.get_running_loop() can raise RuntimeError in edge cases,
and ticker thread can die silently without restart.
Fix:
1. Wrap get_running_loop() in try/except with fallback
2. Add explicit logger.info when ticker starts
3. Add async monitor that restarts ticker if it dies
4. Log PID and thread name for debugging
Validates tool names against valid_tool_names before execution.
Both sequential and concurrent paths checked.
When model hallucinates non-existent tool:
- Logs warning with tool name
- Returns error listing available tools
- Does NOT make API call (saves budget)