974 B
974 B
DFlash Apple Silicon Benchmark Report
Machine
- Label: M3 Max 36GB
- Selected pair: qwen35-9b
- Base model: Qwen/Qwen3.5-9B
- Draft model: z-lab/Qwen3.5-9B-DFlash
- Estimated total weight footprint: 19.93 GB
Setup
python3 -m venv .venv-dflash
source .venv-dflash/bin/activate
git clone https://github.com/z-lab/dflash.git
cd dflash
pip install -e .[mlx]
python -m dflash.benchmark --backend mlx \
--model Qwen/Qwen3.5-9B \
--draft-model z-lab/Qwen3.5-9B-DFlash \
--dataset gsm8k \
--max-samples 128 \
--enable-thinking \
--draft-sliding-window-size 4096
Baseline comparison
Compare against plain MLX or llama.cpp speculative decoding on the same prompt set.
Results
- Throughput (tok/s):
- Peak memory (GB):
- Notes on acceptance / behavior:
Verdict
Worth operationalizing locally?
- Yes
- No
- Needs more data
Recommendation
Explain whether this should become part of the local inference stack.