docs: record Qwen3.5-9B DFlash Metal timeout (refs #152, #154)
All checks were successful
Smoke Test / smoke (pull_request) Successful in 19s
All checks were successful
Smoke Test / smoke (pull_request) Successful in 19s
This commit is contained in:
59
benchmarks/reports/dflash_m3max_36gb_qwen35_9b_timeout.md
Normal file
59
benchmarks/reports/dflash_m3max_36gb_qwen35_9b_timeout.md
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
# DFlash on Apple Silicon Failure Report — Qwen3.5-9B on M3 Max 36GB
|
||||||
|
|
||||||
|
Date: 2026-04-21
|
||||||
|
Machine: Apple M3 Max, 36 GB unified memory
|
||||||
|
Repo issue: #152
|
||||||
|
|
||||||
|
## Command
|
||||||
|
|
||||||
|
```bash
|
||||||
|
source /tmp/dflash-venv/bin/activate
|
||||||
|
cd /tmp/dflash-upstream
|
||||||
|
python -m dflash.benchmark --backend mlx \
|
||||||
|
--model Qwen/Qwen3.5-9B \
|
||||||
|
--draft-model z-lab/Qwen3.5-9B-DFlash \
|
||||||
|
--dataset gsm8k \
|
||||||
|
--max-samples 1 \
|
||||||
|
--enable-thinking \
|
||||||
|
--draft-sliding-window-size 4096
|
||||||
|
```
|
||||||
|
|
||||||
|
## Outcome
|
||||||
|
|
||||||
|
The benchmark did **not** complete successfully on this machine.
|
||||||
|
|
||||||
|
### Failure signature
|
||||||
|
|
||||||
|
```text
|
||||||
|
libc++abi: terminating due to uncaught exception of type std::runtime_error:
|
||||||
|
[METAL] Command buffer execution failed:
|
||||||
|
Caused GPU Timeout Error (00000002:kIOGPUCommandBufferCallbackErrorTimeout)
|
||||||
|
```
|
||||||
|
|
||||||
|
Additional shutdown noise:
|
||||||
|
|
||||||
|
```text
|
||||||
|
bash: [11285: 1] tcsetattr: Inappropriate ioctl for device
|
||||||
|
resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
|
||||||
|
```
|
||||||
|
|
||||||
|
## Interpretation
|
||||||
|
|
||||||
|
This is strong evidence that the `Qwen/Qwen3.5-9B + z-lab/Qwen3.5-9B-DFlash` pair is **not currently stable** on an M3 Max 36GB Mac under the upstream MLX benchmark path, at least with the default settings used here.
|
||||||
|
|
||||||
|
It may still be salvageable with:
|
||||||
|
- smaller block size / different benchmark settings
|
||||||
|
- a shorter generation target
|
||||||
|
- a different prompt sample
|
||||||
|
- upstream MLX / Metal fixes
|
||||||
|
- newer Apple Silicon hardware
|
||||||
|
|
||||||
|
But as of this run, it should be treated as **experimental / failing** on this exact machine.
|
||||||
|
|
||||||
|
## Recommendation
|
||||||
|
|
||||||
|
For this Mac, the working local proof path is still:
|
||||||
|
- `Qwen/Qwen3.5-4B`
|
||||||
|
- `z-lab/Qwen3.5-4B-DFlash`
|
||||||
|
|
||||||
|
Use the 4B pair for reproducible local validation while the 9B Metal timeout is investigated separately.
|
||||||
@@ -77,6 +77,29 @@ Pilot outcome on this Mac:
|
|||||||
|
|
||||||
Treat that as a **directional proof**, not a final decision benchmark. The next step is the fuller comparison slice against plain MLX or llama.cpp speculative decoding.
|
Treat that as a **directional proof**, not a final decision benchmark. The next step is the fuller comparison slice against plain MLX or llama.cpp speculative decoding.
|
||||||
|
|
||||||
|
## Known 9B failure on this machine
|
||||||
|
|
||||||
|
A follow-up live run with:
|
||||||
|
|
||||||
|
- `Qwen/Qwen3.5-9B`
|
||||||
|
- `z-lab/Qwen3.5-9B-DFlash`
|
||||||
|
|
||||||
|
failed on this same M3 Max 36GB Mac with:
|
||||||
|
|
||||||
|
```text
|
||||||
|
[METAL] Command buffer execution failed:
|
||||||
|
Caused GPU Timeout Error (00000002:kIOGPUCommandBufferCallbackErrorTimeout)
|
||||||
|
```
|
||||||
|
|
||||||
|
That failure is recorded in:
|
||||||
|
|
||||||
|
- `benchmarks/reports/dflash_m3max_36gb_qwen35_9b_timeout.md`
|
||||||
|
|
||||||
|
So the current guidance is:
|
||||||
|
- treat `qwen35-9b` as **experimental** on this machine
|
||||||
|
- treat `qwen35-4b` as the current **known-working local proof path**
|
||||||
|
- keep the issue open until we either stabilize the 9B path or clearly rule it out for this hardware tier
|
||||||
|
|
||||||
## Upstream benchmark command
|
## Upstream benchmark command
|
||||||
|
|
||||||
The harness uses the upstream MLX benchmark syntax from `z-lab/dflash`:
|
The harness uses the upstream MLX benchmark syntax from `z-lab/dflash`:
|
||||||
|
|||||||
Reference in New Issue
Block a user