[SOTA] Deploy MLX on Mac M3 Max for faster local inference #416

New Issue

Timmy · 2026-04-08T11:20:55Z

Timmy commented

2026-04-08 11:20:55 +00:00

From SOTA research Q2 2026.

MLX (25K★) — Apple's native ML framework for M-series chips. Direct Metal acceleration. Can be faster than llama.cpp for some models.

Our Mac M3 Max (36GB) is the most powerful machine in the fleet. MLX could unlock faster inference than llama-server for Timmy's own sessions.

Acceptance Criteria

Install mlx and mlx-lm
Benchmark: MLX vs llama.cpp on hermes3:8b (tok/s comparison)
If faster: set up MLX serving on a port alongside llama-server
NOT a replacement for llama.cpp on VPSes (MLX is Apple-only)

From SOTA research Q2 2026. **MLX** (25K★) — Apple's native ML framework for M-series chips. Direct Metal acceleration. Can be faster than llama.cpp for some models. Our Mac M3 Max (36GB) is the most powerful machine in the fleet. MLX could unlock faster inference than llama-server for Timmy's own sessions. ## Acceptance Criteria - [ ] Install mlx and mlx-lm - [ ] Benchmark: MLX vs llama.cpp on hermes3:8b (tok/s comparison) - [ ] If faster: set up MLX serving on a port alongside llama-server - [ ] NOT a replacement for llama.cpp on VPSes (MLX is Apple-only)

Timmy self-assigned this 2026-04-08 11:20:56 +00:00

Rockachopa closed this issue

2026-04-08 22:53:26 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-config#416