# DFlash Apple Silicon Benchmark Report ## Machine - Label: M3 Max 36GB - Selected pair: qwen35-9b - Base model: Qwen/Qwen3.5-9B - Draft model: z-lab/Qwen3.5-9B-DFlash - Estimated total weight footprint: 19.93 GB ## Setup ```bash python3 -m venv .venv-dflash source .venv-dflash/bin/activate git clone https://github.com/z-lab/dflash.git cd dflash pip install -e .[mlx] python -m dflash.benchmark --backend mlx \ --model Qwen/Qwen3.5-9B \ --draft-model z-lab/Qwen3.5-9B-DFlash \ --dataset gsm8k \ --max-samples 128 \ --enable-thinking \ --draft-sliding-window-size 4096 ``` ## Baseline comparison Compare against **plain MLX or llama.cpp speculative decoding** on the same prompt set. ## Results - Throughput (tok/s): - Peak memory (GB): - Notes on acceptance / behavior: ## Verdict Worth operationalizing locally? - [ ] Yes - [ ] No - [ ] Needs more data ## Recommendation Explain whether this should become part of the local inference stack.