turboquant

Timmy_Foundation/turboquant

Fork 0

Commit Graph

Author	SHA1	Message	Date
Alexander Payne	704d284d14	fix: mitigate MLX Metal GPU timeout for qwen35-9b (issue #154 ) All checks were successful Smoke Test / smoke (pull_request) Successful in 10s Details The DFlash benchmark with --draft-sliding-window-size 4096 on the 9B model causes a Metal GPU timeout on Apple Silicon (kIOGPUCommandBufferCallbackErrorTimeout). Root cause: the 9B model's larger compute workload combined with a 4096-size draft sliding window produces GPU command buffers that exceed the watchdog timeout. The 4B model does not exhibit this problem. Mitigation: lower the default draft sliding window for the 9B pair from 4096 to 2048. This avoids the timeout while still providing meaningful speedup. Changes: - Add benchmarks/dflash_apple_silicon.py (DFlash benchmark planner) - 9B pair now uses draft_sliding_window_size=2048 - 4B pair retains draft_sliding_window_size=4096 - Add tests/test_dflash_apple_silicon.py with #154-specific test - Add docs/DFLASH_APPLE_SILICON.md documenting the mitigation - Add benchmarks/reports/dflash_m3max_36gb_qwen35_9b_timeout.md recording failure Verification: pytest -q tests/test_dflash_apple_silicon.py Test explicitly asserts 9B uses window=2048 to prevent timeout regression. Closes #154	2026-04-25 20:04:55 -04:00

Author

SHA1

Message

Date

Alexander Payne

704d284d14

fix: mitigate MLX Metal GPU timeout for qwen35-9b (issue #154 )

Smoke Test / smoke (pull_request) Successful in 10s

Details

The DFlash benchmark with --draft-sliding-window-size 4096 on the 9B model
causes a Metal GPU timeout on Apple Silicon (kIOGPUCommandBufferCallbackErrorTimeout).

Root cause: the 9B model's larger compute workload combined with a 4096-size
draft sliding window produces GPU command buffers that exceed the watchdog
timeout. The 4B model does not exhibit this problem.

Mitigation: lower the default draft sliding window for the 9B pair from 4096
to 2048. This avoids the timeout while still providing meaningful speedup.

Changes:
- Add benchmarks/dflash_apple_silicon.py (DFlash benchmark planner)
  - 9B pair now uses draft_sliding_window_size=2048
  - 4B pair retains draft_sliding_window_size=4096
- Add tests/test_dflash_apple_silicon.py with #154-specific test
- Add docs/DFLASH_APPLE_SILICON.md documenting the mitigation
- Add benchmarks/reports/dflash_m3max_36gb_qwen35_9b_timeout.md recording failure

Verification: pytest -q tests/test_dflash_apple_silicon.py
Test explicitly asserts 9B uses window=2048 to prevent timeout regression.

Closes #154

2026-04-25 20:04:55 -04:00

1 Commits