RESEARCH28

LinAlg-Bench: A Forensic Benchmark Revealing Structural Failure Modes in LLM Mathematical Reasoning

arXiv CS.AI·May 19, 2026

LinAlg-Bench is a new diagnostic benchmark evaluating 10 frontier large language models (LLMs) on structured linear algebra computation, revealing structural failure modes. It assesses LLM performance across a dimensional gradient of matrices, classifying failures into ten primary error types and identifying a behavioral threshold at 4x4 matrices.

mathematical reasoning benchmarking linear algebra AI evaluation LLM

Read original ↗