Hardware Acceleration

7 items

NEWS↑ trendingReddit r/MachineLearning·4/22/2026

INT3 compression+fused metal kernels [R]

A solo founder developed INT3 model compression and a 2-bit KV cache with custom fused Metal kernels for Mac (M-series). Qwen 7B is available in preview, and further optimizations and GPU support are planned.

Hardware Acceleration LLMs quantization model optimization

DOC↑ trendingReddit r/LocalLLaMA·4/15/2026

Qwen3.5-35B running well on RTX4060 Ti 16GB at 60 tok/s

The author shares a successful optimization for running the Qwen3.5-35B-A3B-UD-Q4_K_L model on an RTX 4060 Ti 16GB using llama.cpp, achieving 40-60 tokens/s with 64k context. The post provides the detailed `models.ini` configuration and server start command to replicate this performance.

Hardware Acceleration AI Model Optimization llama.cpp local inference

ARTICLE↑ trendingReddit r/LocalLLaMA·28d ago

I got a real transformer language model running locally on a stock Game Boy Color!

A transformer language model (TinyStories-260K) was successfully run locally on a stock Game Boy Color, utilizing INT8 weights and fixed-point math. This impressive technical feat involved a custom ROM and on-device tokenization, though performance is extremely slow and output is gibberish.

Hardware Acceleration Edge AI quantization AI inference

I got a real transformer language model running locally on a stock Game Boy Color!

NEWS↑ trendingReddit r/LocalLLaMA·18d ago

OpenBMB presents the model BitCPM-CANN 1.58 bit

OpenBMB presented the BitCPM-CANN 1.58 bit model. New models are being tested on the Huawei Ascend 910B.

Hardware Acceleration AI models Huawei AI development

OpenBMB presents the model BitCPM-CANN 1.58 bit

RESEARCHHugging Face Blog·4/22/2026

Gemma 4 VLA Demo on Jetson Orin Nano Super

A demonstration of the Gemma 4 VLA model running on the Jetson Orin Nano Super device.

Hardware Acceleration NVIDIA Jetson Edge AI vision-language model

ARTICLEDEV.to AI·4/11/2026

Deep Learning on FPGAs: Past, Present, and Future

This article reviews the evolution of Deep Learning implementation on FPGAs, covering its historical development, current state, and future directions. It also highlights the critical importance of hardware acceleration for the advancement of artificial intelligence.

Hardware Acceleration FPGAs deep learning machine learning

RESEARCHDEV.to AI·5/2/2026

Accelerating CNN inference on FPGAs: A Survey

This survey paper examines various techniques and methods for accelerating Convolutional Neural Network (CNN) inference specifically on Field-Programmable Gate Arrays (FPGAs). It provides an overview of existing research and architectural approaches to improve the performance and efficiency of CNN deployments on hardware.

Hardware Acceleration deep learning FPGA computer vision