RESEARCHarXiv CS.CL·5/1/2026
CL-bench Life: Can Language Models Learn from Real-Life Context?
CL-bench Life is a new human-curated benchmark designed to assess whether frontier language models can effectively learn from complex, messy real-life contexts. It comprises 405 context-task pairs to test models' ability to reason over personal and social experiences.
27