RESEARCH28
LinAlg-Bench: A Forensic Benchmark Revealing Structural Failure Modes in LLM Mathematical Reasoning
arXiv CS.AIΒ·May 19, 2026
LinAlg-Bench is a new diagnostic benchmark evaluating 10 frontier large language models (LLMs) on structured linear algebra computation, revealing structural failure modes. It assesses LLM performance across a dimensional gradient of matrices, classifying failures into ten primary error types and identifying a behavioral threshold at 4x4 matrices.
Read original β