RESEARCH27

CanLegalRAGBench: Evaluating Retrieval-Augmented Generation on Canadian Case Law

arXiv CS.CL·June 1, 2026

This paper introduces CanLegalRAGBench, a new Canadian legal QA benchmark for evaluating Retrieval-Augmented Generation (RAG) systems using realistic queries and expert-annotated case law answers. It highlights the sensitivity of retrieval performance, the competitiveness of open-source embedding models, and the limitations of automatic evaluations and LLM hallucinations in generated responses.

Retrieval Augmented Generation LLMs evaluation Legal AI Benchmarks

Read original ↗