Some checks failed
Smoke Test / smoke (pull_request) Failing after 19s
Closes #749
2.5 KiB
2.5 KiB
Predictive Resource Allocation
Forecasts near-term fleet demand from historical telemetry so the operator can pre-provision resources before a surge hits.
How It Works
The predictor reads two data sources:
- Metric logs (
metrics/local_*.jsonl) — request cadence, token volume, caller mix, success/failure rates - Heartbeat logs (
heartbeat/ticks_*.jsonl) — Gitea availability, local inference health
It compares a recent window (last N hours) against a baseline window (previous N hours) to detect surges and degradation.
Output Contract
{
"resource_mode": "steady|surge",
"dispatch_posture": "normal|degraded",
"horizon_hours": 6,
"recent_request_rate": 12.5,
"baseline_request_rate": 8.0,
"predicted_request_rate": 15.0,
"surge_factor": 1.56,
"demand_level": "elevated|normal|low|critical",
"gitea_outages": 0,
"inference_failures": 2,
"top_callers": [...],
"recommended_actions": ["..."]
}
Demand Levels
| Surge Factor | Level | Meaning |
|---|---|---|
| > 3.0 | critical | Extreme surge, immediate action needed |
| > 1.5 | elevated | Notable increase, pre-warm recommended |
| > 1.0 | normal | Slight increase, monitor |
| <= 1.0 | low | Flat or declining |
Posture Signals
| Signal | Effect |
|---|---|
| Surge factor > 1.5 | resource_mode: surge + pre-warm recommendation |
| Gitea outages >= 1 | dispatch_posture: degraded + cache recommendation |
| Inference failures >= 2 | resource_mode: surge + reliability investigation |
| Heavy batch callers | Throttle recommendation |
| High caller failure rates | Investigation recommendation |
Usage
# Markdown report
python3 scripts/predictive_resource_allocator.py
# JSON output
python3 scripts/predictive_resource_allocator.py --json
# Custom paths and horizon
python3 scripts/predictive_resource_allocator.py \
--metrics metrics/local_20260329.jsonl \
--heartbeat heartbeat/ticks_20260329.jsonl \
--horizon 12
Tests
python3 -m pytest tests/test_predictive_resource_allocator.py -v
Recommended Actions
The predictor generates contextual recommendations:
- Pre-warm local inference — surge detected, warm up before next window
- Throttle background jobs — heavy batch work consuming capacity
- Investigate failure rates — specific callers failing at high rates
- Investigate model reliability — inference health degraded
- Cache forge state — Gitea availability issues
- Maintain current allocation — no issues detected