The DFlash benchmark with --draft-sliding-window-size 4096 on the 9B model
causes a Metal GPU timeout on Apple Silicon (kIOGPUCommandBufferCallbackErrorTimeout).
Root cause: the 9B model's larger compute workload combined with a 4096-size
draft sliding window produces GPU command buffers that exceed the watchdog
timeout. The 4B model does not exhibit this problem.
Mitigation: lower the default draft sliding window for the 9B pair from 4096
to 2048. This avoids the timeout while still providing meaningful speedup.
Changes:
- Add benchmarks/dflash_apple_silicon.py (DFlash benchmark planner)
- 9B pair now uses draft_sliding_window_size=2048
- 4B pair retains draft_sliding_window_size=4096
- Add tests/test_dflash_apple_silicon.py with #154-specific test
- Add docs/DFLASH_APPLE_SILICON.md documenting the mitigation
- Add benchmarks/reports/dflash_m3max_36gb_qwen35_9b_timeout.md recording failure
Verification: pytest -q tests/test_dflash_apple_silicon.py
Test explicitly asserts 9B uses window=2048 to prevent timeout regression.
Closes#154