docs: record Qwen3.5-9B DFlash Metal timeout (refs #152, #154)
All checks were successful
Smoke Test / smoke (pull_request) Successful in 19s

This commit is contained in:
Alexander Whitestone
2026-04-21 22:25:25 -04:00
parent 69cef8a90f
commit dabb96d315
2 changed files with 82 additions and 0 deletions

View File

@@ -77,6 +77,29 @@ Pilot outcome on this Mac:
Treat that as a **directional proof**, not a final decision benchmark. The next step is the fuller comparison slice against plain MLX or llama.cpp speculative decoding.
## Known 9B failure on this machine
A follow-up live run with:
- `Qwen/Qwen3.5-9B`
- `z-lab/Qwen3.5-9B-DFlash`
failed on this same M3 Max 36GB Mac with:
```text
[METAL] Command buffer execution failed:
Caused GPU Timeout Error (00000002:kIOGPUCommandBufferCallbackErrorTimeout)
```
That failure is recorded in:
- `benchmarks/reports/dflash_m3max_36gb_qwen35_9b_timeout.md`
So the current guidance is:
- treat `qwen35-9b` as **experimental** on this machine
- treat `qwen35-4b` as the current **known-working local proof path**
- keep the issue open until we either stabilize the 9B path or clearly rule it out for this hardware tier
## Upstream benchmark command
The harness uses the upstream MLX benchmark syntax from `z-lab/dflash`: