Gemma 4 31B — 4bit is all you need
This content compares the performance of Gemma 4 31B's 4-bit and 8-bit quantized versions on an M5 Max MacBook Pro, surprisingly finding the 4-bit version scored higher (91.3% vs 88.4%). It also notes an issue where Gemma 4 26B-A4B entered a regression loop, truncating responses after hitting the max token limit of 16,384.






