diff --git a/benchmarks/reports/dflash_m3max_36gb_qwen35_9b_timeout.md b/benchmarks/reports/dflash_m3max_36gb_qwen35_9b_timeout.md new file mode 100644 index 0000000..0c5dc5e --- /dev/null +++ b/benchmarks/reports/dflash_m3max_36gb_qwen35_9b_timeout.md @@ -0,0 +1,59 @@ +# DFlash on Apple Silicon Failure Report — Qwen3.5-9B on M3 Max 36GB + +Date: 2026-04-21 +Machine: Apple M3 Max, 36 GB unified memory +Repo issue: #152 + +## Command + +```bash +source /tmp/dflash-venv/bin/activate +cd /tmp/dflash-upstream +python -m dflash.benchmark --backend mlx \ + --model Qwen/Qwen3.5-9B \ + --draft-model z-lab/Qwen3.5-9B-DFlash \ + --dataset gsm8k \ + --max-samples 1 \ + --enable-thinking \ + --draft-sliding-window-size 4096 +``` + +## Outcome + +The benchmark did **not** complete successfully on this machine. + +### Failure signature + +```text +libc++abi: terminating due to uncaught exception of type std::runtime_error: +[METAL] Command buffer execution failed: +Caused GPU Timeout Error (00000002:kIOGPUCommandBufferCallbackErrorTimeout) +``` + +Additional shutdown noise: + +```text +bash: [11285: 1] tcsetattr: Inappropriate ioctl for device +resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown +``` + +## Interpretation + +This is strong evidence that the `Qwen/Qwen3.5-9B + z-lab/Qwen3.5-9B-DFlash` pair is **not currently stable** on an M3 Max 36GB Mac under the upstream MLX benchmark path, at least with the default settings used here. + +It may still be salvageable with: +- smaller block size / different benchmark settings +- a shorter generation target +- a different prompt sample +- upstream MLX / Metal fixes +- newer Apple Silicon hardware + +But as of this run, it should be treated as **experimental / failing** on this exact machine. + +## Recommendation + +For this Mac, the working local proof path is still: +- `Qwen/Qwen3.5-4B` +- `z-lab/Qwen3.5-4B-DFlash` + +Use the 4B pair for reproducible local validation while the 9B Metal timeout is investigated separately. diff --git a/docs/DFLASH_APPLE_SILICON.md b/docs/DFLASH_APPLE_SILICON.md index d866cc4..2accad9 100644 --- a/docs/DFLASH_APPLE_SILICON.md +++ b/docs/DFLASH_APPLE_SILICON.md @@ -77,6 +77,29 @@ Pilot outcome on this Mac: Treat that as a **directional proof**, not a final decision benchmark. The next step is the fuller comparison slice against plain MLX or llama.cpp speculative decoding. +## Known 9B failure on this machine + +A follow-up live run with: + +- `Qwen/Qwen3.5-9B` +- `z-lab/Qwen3.5-9B-DFlash` + +failed on this same M3 Max 36GB Mac with: + +```text +[METAL] Command buffer execution failed: +Caused GPU Timeout Error (00000002:kIOGPUCommandBufferCallbackErrorTimeout) +``` + +That failure is recorded in: + +- `benchmarks/reports/dflash_m3max_36gb_qwen35_9b_timeout.md` + +So the current guidance is: +- treat `qwen35-9b` as **experimental** on this machine +- treat `qwen35-4b` as the current **known-working local proof path** +- keep the issue open until we either stabilize the 9B path or clearly rule it out for this hardware tier + ## Upstream benchmark command The harness uses the upstream MLX benchmark syntax from `z-lab/dflash`: