RESEARCH27
TajPersLexon: A Tajik-Persian Lexical Resource and Hybrid Model for Cross-Script Low-Resource NLP
arXiv CS.CLΒ·May 11, 2026
This research introduces TajPersLexon, a Tajik-Persian parallel lexical resource with 40,112 word pairs for cross-script NLP in low-resource environments. It evaluates hybrid, neural, and retrieval models, demonstrating high accuracy for neural and retrieval baselines (98-99%) and a favorable accuracy-efficiency trade-off for the hybrid model (96.4%) in OCR post-correction.
Read original β