MLX

4 items

RESEARCH↑ trendingReddit r/LocalLLaMA·4/11/2026

DFlash speculative decoding on Apple Silicon : 85 tok/s, 3.3x on Qwen3.5-9B (MLX, M5 Max)

This content describes a native DFlash implementation on MLX for Apple Silicon, significantly accelerating token generation in Qwen models. The speculative decoding technique achieves speedups of up to 3.3x while maintaining identical output quality.

apple-silicon MLX Qwen LLM performance

NEWS↑ trendingReddit r/LocalLLaMA·18d ago

New Release of ROCm based MLX LLM Engine - lemon-mlx-engine

The lemon-mlx-engine now integrates TheRock / ROCm 7.13, enabling users to try the latest ROCm with the MLX engine on their local hardware. This update also includes various bug and kernel fixes for Qwen3, 3.5, and 3.6 MoE and dense models.

ROCm Software release MLX AI development

ARTICLE↑ trendingReddit r/LocalLLaMA·4/19/2026

Gemma 4 - MLX doesn't seem better than GGUF

A user compares the performance of the Gemma 4-26b-a4b model in MLX and GGUF versions on an M1 Max with 32GB RAM. Tests with a 3k token prompt indicate that GGUF is slightly faster in both prompt processing and tokens per second.

model performance apple-silicon Gemma MLX

NEWSDEV.to AI·4/26/2026

DeepSeek-V4 Ported to MLX for Apple Silicon Inference

DeepSeek-V4 has been ported to Apple's MLX framework, enabling the large language model to run on Apple Silicon Macs. The functional port, a community effort by @Prince_Canuma, still requires optimization for improved performance.

apple-silicon local inference MLX large language models