Portuguese — AI articles, news & research

RESEARCHarXiv CS.CL·5/4/2026

NorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpus

NorBERTo is a new ModernBERT model trained on a 331 billion token Brazilian Portuguese corpus (Aurora-PT), designed for long-context support and efficient attention mechanisms. It achieves state-of-the-art results among evaluated encoder models on semantic similarity, textual entailment, and classification tasks using datasets like ASSIN 2 and PLUE.

AI models BERT Portuguese NLP