RESEARCHarXiv CS.CL·5/4/2026
NorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpus
NorBERTo is a new ModernBERT model trained on a 331 billion token Brazilian Portuguese corpus (Aurora-PT), designed for long-context support and efficient attention mechanisms. It achieves state-of-the-art results among evaluated encoder models on semantic similarity, textual entailment, and classification tasks using datasets like ASSIN 2 and PLUE.
28