docs/PREDICTIVE_RESOURCE_ALLOCATION.md

# Predictive Resource Allocation

Forecasts near-term fleet demand from historical telemetry so the operator can
pre-provision resources before a surge hits.

## How It Works

The predictor reads two data sources:

1. **Metric logs** (`metrics/local_*.jsonl`) — request cadence, token volume,
   caller mix, success/failure rates
2. **Heartbeat logs** (`heartbeat/ticks_*.jsonl`) — Gitea availability,
   local inference health

It compares a **recent window** (last N hours of activity) against the **previous active window**
(previous N hours ending at the most recent event before the current window) so sparse telemetry still yields a meaningful baseline.

## Output Contract

```json
{
  "resource_mode": "steady|surge",
  "dispatch_posture": "normal|degraded",
  "horizon_hours": 6,
  "recent_request_rate": 12.5,
  "baseline_request_rate": 8.0,
  "predicted_request_rate": 15.0,
  "surge_factor": 1.56,
  "demand_level": "elevated|normal|low|critical",
  "gitea_outages": 0,
  "inference_failures": 2,
  "top_callers": [...],
  "recommended_actions": ["..."]
}
```

### Demand Levels

| Surge Factor | Level | Meaning |
|-------------|-------|---------|
| > 3.0 | critical | Extreme surge, immediate action needed |
| > 1.5 | elevated | Notable increase, pre-warm recommended |
| > 1.0 | normal | Slight increase, monitor |
| <= 1.0 | low | Flat or declining |

### Posture Signals

| Signal | Effect |
|--------|--------|
| Surge factor > 1.5 | `resource_mode: surge` + pre-warm recommendation |
| Gitea outages >= 1 | `dispatch_posture: degraded` + cache recommendation |
| Inference failures >= 2 | `resource_mode: surge` + reliability investigation |
| Heavy batch callers | Throttle recommendation |
| High caller failure rates | Investigation recommendation |

## Usage

```bash
# Markdown report
python3 scripts/predictive_resource_allocator.py

# JSON output
python3 scripts/predictive_resource_allocator.py --json

# Custom paths and horizon
python3 scripts/predictive_resource_allocator.py \
    --metrics metrics/local_20260329.jsonl \
    --heartbeat heartbeat/ticks_20260329.jsonl \
    --horizon 12
```

## Tests

```bash
python3 -m pytest tests/test_predictive_resource_allocator.py -v
```

## Recommended Actions

The predictor generates contextual recommendations:

- **Pre-warm local inference** — surge detected, warm up before next window
- **Throttle background jobs** — heavy batch work consuming capacity
- **Investigate failure rates** — specific callers failing at high rates
- **Investigate model reliability** — inference health degraded
- **Cache forge state** — Gitea availability issues
- **Maintain current allocation** — no issues detected
docs: predictive resource allocation — operator guide (#749) Closes #749 2026-04-16 01:51:19 +00:00			`# Predictive Resource Allocation`

			`Forecasts near-term fleet demand from historical telemetry so the operator can`
			`pre-provision resources before a surge hits.`

			`## How It Works`

			`The predictor reads two data sources:`

			1. Metric logs (`metrics/local_*.jsonl`) — request cadence, token volume,
			`caller mix, success/failure rates`
			2. Heartbeat logs (`heartbeat/ticks_*.jsonl`) — Gitea availability,
			`local inference health`

fix: use prior active window baseline for #749 2026-04-17 00:19:50 -04:00			`It compares a recent window (last N hours of activity) against the previous active window`
			`(previous N hours ending at the most recent event before the current window) so sparse telemetry still yields a meaningful baseline.`
docs: predictive resource allocation — operator guide (#749) Closes #749 2026-04-16 01:51:19 +00:00
			`## Output Contract`

			```json
			`{`
			`"resource_mode": "steady\|surge",`
			`"dispatch_posture": "normal\|degraded",`
			`"horizon_hours": 6,`
			`"recent_request_rate": 12.5,`
			`"baseline_request_rate": 8.0,`
			`"predicted_request_rate": 15.0,`
			`"surge_factor": 1.56,`
			`"demand_level": "elevated\|normal\|low\|critical",`
			`"gitea_outages": 0,`
			`"inference_failures": 2,`
			`"top_callers": [...],`
			`"recommended_actions": ["..."]`
			`}`
			```

			`### Demand Levels`

			`\| Surge Factor \| Level \| Meaning \|`
			`\|-------------\|-------\|---------\|`
			`\| > 3.0 \| critical \| Extreme surge, immediate action needed \|`
			`\| > 1.5 \| elevated \| Notable increase, pre-warm recommended \|`
			`\| > 1.0 \| normal \| Slight increase, monitor \|`
			`\| <= 1.0 \| low \| Flat or declining \|`

			`### Posture Signals`

			`\| Signal \| Effect \|`
			`\|--------\|--------\|`
			\| Surge factor > 1.5 \| `resource_mode: surge` + pre-warm recommendation \|
			\| Gitea outages >= 1 \| `dispatch_posture: degraded` + cache recommendation \|
			\| Inference failures >= 2 \| `resource_mode: surge` + reliability investigation \|
			`\| Heavy batch callers \| Throttle recommendation \|`
			`\| High caller failure rates \| Investigation recommendation \|`

			`## Usage`

			```bash
			`# Markdown report`
			`python3 scripts/predictive_resource_allocator.py`

			`# JSON output`
			`python3 scripts/predictive_resource_allocator.py --json`

			`# Custom paths and horizon`
			`python3 scripts/predictive_resource_allocator.py \`
			`--metrics metrics/local_20260329.jsonl \`
			`--heartbeat heartbeat/ticks_20260329.jsonl \`
			`--horizon 12`
			```

			`## Tests`

			```bash
			`python3 -m pytest tests/test_predictive_resource_allocator.py -v`
			```

			`## Recommended Actions`

			`The predictor generates contextual recommendations:`

			`- Pre-warm local inference — surge detected, warm up before next window`
			`- Throttle background jobs — heavy batch work consuming capacity`
			`- Investigate failure rates — specific callers failing at high rates`
			`- Investigate model reliability — inference health degraded`
			`- Cache forge state — Gitea availability issues`
			`- Maintain current allocation — no issues detected`