← heapsort
RESEARCH27

Soro: A Lightweight Foundation Model and Chatbot for Tajik

arXiv CS.AIΒ·May 28, 2026

Soro is a family of Tajik-specialized conversational large language models (LLMs) designed for deployment in Tajikistan under tight compute constraints. Developed from open-weight Gemma 3 checkpoints and continually pretrained on a 1.9-billion-token Tajik corpus, it substantially outperforms baselines on new Tajik benchmarks.

Read original β†—