RESEARCHarXiv CS.AI·12d ago
Soro: A Lightweight Foundation Model and Chatbot for Tajik
Soro is a family of Tajik-specialized conversational large language models (LLMs) designed for deployment in Tajikistan under tight compute constraints. Developed from open-weight Gemma 3 checkpoints and continually pretrained on a 1.9-billion-token Tajik corpus, it substantially outperforms baselines on new Tajik benchmarks.
27