apple-silicon

9 items

RESEARCH↑ trendingReddit r/LocalLLaMA·4/11/2026

DFlash speculative decoding on Apple Silicon : 85 tok/s, 3.3x on Qwen3.5-9B (MLX, M5 Max)

This content describes a native DFlash implementation on MLX for Apple Silicon, significantly accelerating token generation in Qwen models. The speculative decoding technique achieves speedups of up to 3.3x while maintaining identical output quality.

apple-silicon MLX Qwen LLM performance

NEWS↑ trendingReddit r/MachineLearning·5/1/2026

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

Phosphene is a free, open-source desktop panel for Apple Silicon Macs that generates video with synchronized audio using Lightricks' LTX 2.3 model. Its key differentiator is generating video and audio in a single diffusion pass, ensuring perfect timing between visual and auditory elements.

Open Source AI models apple-silicon video generation

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

ARTICLE↑ trendingReddit r/LocalLLaMA·4/19/2026

Gemma 4 - MLX doesn't seem better than GGUF

A user compares the performance of the Gemma 4-26b-a4b model in MLX and GGUF versions on an M1 Max with 32GB RAM. Tests with a 3k token prompt indicate that GGUF is slightly faster in both prompt processing and tokens per second.

model performance apple-silicon Gemma MLX

RESEARCHarXiv CS.LG·21d ago

Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra

This research systematically optimizes real-time diffusion model inference on Apple M3 Ultra, exploring various techniques like CoreML conversion, quantization, and model distillation. The study achieved 22.7 FPS for 512x512 img2img transformation by combining CoreML conversion of SDXS-512 with a 3-thread camera pipeline.

Diffusion Models Optimization apple-silicon image generation

ARTICLEDEV.to AI·4/13/2026

I Built a Free Local AI Art Pipeline on My Mac — Here's What Broke

This content describes building a free, complete AI art generation and evaluation pipeline that runs entirely locally on a MacBook Apple Silicon. It leverages tools like ComfyUI/SDXL and Vulca, eliminating the need for cloud APIs or GPU servers.

apple-silicon AI art Local AI SDXL

ARTICLEDEV.to AI·9d ago

Best Local AI Models for Apple Silicon in 2026

The article discusses the significant change in running local AI models on Apple Silicon Macs, a task that previously required dedicated NVIDIA GPUs. This transformation is driven by Apple Silicon's unified memory architecture, which efficiently utilizes shared RAM across components.

mac apple-silicon Local AI hardware

RESEARCHarXiv CS.CL·4/21/2026

Cross-Family Speculative Decoding for Polish Language Models on Apple~Silicon: An Empirical Evaluation of Bielik~11B with UAG-Extended MLX-LM

This research evaluates cross-family speculative decoding for Polish LLMs on Apple Silicon, extending the MLX-LM framework with Universal Assisted Generation (UAG) for cross-tokenizer compatibility. Experiments show that context-aware token translation significantly improves acceptance rates for Bielik 11B on Polish language datasets.

apple-silicon Natural Language Processing Inference Optimization Speculative Decoding

ARTICLEDEV.to AI·4/20/2026

What 19 GB of Memory Compression Taught Me About MLX on M1 Max

The author describes encountering 19 GB of memory compression while running a large LLM with MLX on an M1 Max, initially mistaking it for a leak. The fix involved a single MLX API call to properly manage macOS unified memory for large models idling between inferences.

LLMs apple-silicon memory management Performance optimization

NEWSDEV.to AI·4/26/2026

DeepSeek-V4 Ported to MLX for Apple Silicon Inference

DeepSeek-V4 has been ported to Apple's MLX framework, enabling the large language model to run on Apple Silicon Macs. The functional port, a community effort by @Prince_Canuma, still requires optimization for improved performance.

apple-silicon local inference MLX large language models