When cloud provider fails during tool calling (timeout, 429, 503),
fall back to local Ollama to keep the agent working.
New agent/tool_fallback.py:
- ToolFallbackHandler: manages fallback execution
- should_fallback(error): detects provider failures (429, 503,
timeout, rate limit, quota exceeded, connection errors)
- call_with_fallback(): makes API call via local Ollama when
primary provider fails
- FallbackEvent: records each fallback for fleet reporting
- format_report(): human-readable fallback summary
- Singleton handler via get_tool_fallback_handler()
Config via env vars:
- TOOL_FALLBACK_PROVIDER (default: ollama)
- TOOL_FALLBACK_MODEL (default: qwen2.5:7b)
- TOOL_FALLBACK_BASE_URL (default: http://localhost:11434/v1)
Tests: tests/test_tool_fallback.py
Closes#746