Alexander Whitestone
ebf69d155b
feat: GPU Inference Scheduler — Multi-Model Resource Management
Fixes #645
Queue-based model loading with priority lanes and VRAM budget tracking.
Prevents GPU OOM crashes when multiple projects compete for VRAM.
## Features
### Priority Lanes
- REALTIME (1): LPM, live video, interactive sessions
- INTERACTIVE (2): Playground, chat, user-facing
- BATCH (3): Harvester, overnight jobs, background
### VRAM Management
- Tracks total/used/available VRAM
- Reserves VRAM when job starts
- Releases VRAM when job completes
- CPU fallback when GPU full
### Model Registry
Pre-registered models:
- Video Forge: SD XL (8GB), HeartMuLa (4GB), Wan2.1 (12GB)
- LPM: Video Gen (16GB), A2A (8GB)
- Local: Llama 3 70B (40GB), Llama 3 8B (8GB), MiMo v2 Pro (16GB)
- Playground: SDXL Turbo (6GB)
### Cross-Project Scenarios Handled
1. Video Forge batch + LPM live → LPM gets priority
2. 3 Video Forge jobs → Sequential with shared cache
3. Night harvester + playground → Batch runs on idle cycles
## Files
- tools/gpu_scheduler.py: InferenceScheduler class, CLI interface
- tests/tools/test_gpu_scheduler.py: 19 tests, all passing
## Usage
```python
from tools.gpu_scheduler import InferenceScheduler, Priority
scheduler = InferenceScheduler(vram_budget_mb=49152) # 48GB
scheduler.submit_job("job-1", "lpm", "llama3_8b", Priority.REALTIME)
job = scheduler.get_next_job()
scheduler.start_job(job)
# ... do inference ...
scheduler.complete_job(job)
```
2026-04-14 21:15:58 -04:00
..
2026-04-07 08:40:22 -04:00
2026-04-12 06:18:05 +05:30
2026-03-17 02:33:12 -07:00
2026-03-31 08:48:54 +09:00
2026-03-23 07:43:12 -07:00
2026-04-14 15:43:31 -07:00
2026-04-08 02:24:32 -07:00
2026-04-01 04:18:50 -07:00
2026-04-10 03:44:43 -07:00
2026-04-14 16:55:55 -07:00
2026-04-08 02:24:32 -07:00
2026-04-11 14:50:44 -07:00
2026-04-07 13:36:38 -07:00
2026-04-13 04:23:27 -07:00
2026-04-11 13:59:52 -07:00
2026-04-13 16:32:04 -07:00
2026-04-07 10:25:31 -07:00
2026-04-12 00:54:35 -07:00
2026-04-13 16:32:04 -07:00
2026-04-13 23:54:45 -07:00
2026-04-13 16:32:04 -07:00
2026-04-10 16:47:44 -07:00
2026-04-14 21:15:58 -04:00
2026-04-13 04:45:07 -07:00
2026-04-13 16:32:04 -07:00
2026-04-11 14:02:58 -07:00
2026-04-02 12:40:03 +11:00
2026-04-10 03:05:04 -07:00
2026-04-14 17:19:20 -07:00
2026-04-14 10:18:05 -07:00
2026-04-13 16:32:04 -07:00
2026-03-18 02:55:30 -07:00
2026-03-11 20:02:36 -07:00
2026-04-05 12:46:07 -07:00
2026-04-10 16:47:44 -07:00
2026-04-11 13:59:52 -07:00
2026-04-12 00:36:22 -07:00
2026-04-14 17:19:20 -07:00
2026-04-07 10:25:31 -07:00
2026-04-14 00:11:49 -07:00
2026-04-13 04:54:42 -07:00
2026-04-14 14:23:37 -07:00
2026-04-13 16:32:04 -07:00
2026-04-12 16:39:04 -07:00
2026-04-11 01:21:20 -07:00
2026-04-14 10:42:58 -07:00
2026-04-13 16:32:04 -07:00
2026-03-25 19:47:58 -07:00
2026-04-11 16:22:50 -07:00
2026-03-31 08:48:54 +09:00
2026-04-11 14:26:11 -07:00
2026-04-13 16:32:04 -07:00
2026-04-12 16:46:18 -07:00
2026-04-10 05:04:28 -07:00
2026-04-13 16:32:04 -07:00
2026-04-13 16:32:04 -07:00
2026-04-13 16:32:04 -07:00
2026-04-07 10:25:31 -07:00