← heapsort-ai

NLP

124 items

RESEARCHarXiv CS.CL·29d ago

TajPersLexon: A Tajik-Persian Lexical Resource and Hybrid Model for Cross-Script Low-Resource NLP

This research introduces TajPersLexon, a Tajik-Persian parallel lexical resource with 40,112 word pairs for cross-script NLP in low-resource environments. It evaluates hybrid, neural, and retrieval models, demonstrating high accuracy for neural and retrieval baselines (98-99%) and a favorable accuracy-efficiency trade-off for the hybrid model (96.4%) in OCR post-correction.

27
RESEARCHarXiv CS.CL·26d ago

Differences in Text Generated by Diffusion and Autoregressive Language Models

This research explores the intrinsic differences in text generated by Diffusion Language Models (DLMs) and Autoregressive Language Models (ARMs), finding that DLMs show lower n-gram entropy but higher semantic coherence and diversity. Controlled experiments reveal that DLM training objectives enhance coherence and diversity through bidirectional context, while decoding algorithms are responsible for entropy reduction.

27
RESEARCHarXiv CS.CL·13d ago

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

This work introduces CroCo, a method for cross-lingual contrastive preference tuning on self-generated responses from LLMs, demonstrating effective transfer across 14 languages without language-specific preference annotations. An English-trained reward model yields useful rankings across most languages, improving existing models and preventing catastrophic forgetting, provided on-policy data is used.

27
DOCDEV.to AI·16d ago

RAG 시스템 실전 구축 (v18)

This document details the practical implementation of RAG (Retrieval-Augmented Generation) systems, explaining their core concepts and operational loop. It covers the retrieval, augmentation, and generation stages to enhance LLM responses, including semantic document chunking.

27
ARTICLEDEV.to AI·8d ago

AI debt sales reshape global corporate bond markets

The integration of AI in debt sales is poised to significantly alter global corporate bond markets, driven by AI systems' ability to analyze vast data and make accurate predictions. AI debt sales platforms leverage machine learning algorithms and natural language processing to assess creditworthiness and identify risks and opportunities.

27
ARTICLEDEV.to AI·4/27/2026

Epismo Agent Package

The Epismo Agent Package Technical Analysis details an innovative solution for creating AI-powered digital humans for customer service, entertainment, and education. Its microservices architecture integrates natural language processing, machine learning, and computer vision, managed by an Agent Core and a Knowledge Graph.

27
DOCDEV.to AI·20d ago

92. BERT: The Model That Reads in Both Directions

BERT distinguishes itself from GPT through its bidirectional reading capability, predicting masked words rather than sequential ones. This comprehensive contextual understanding made it dominant in NLP benchmarks and a cornerstone for understanding tasks. The content details BERT's pre-training mechanisms and fine-tuning techniques.

27
ARTICLEDEV.to AI·26d ago

NLP Video Editing Copilot

Cutting Room AI is a standalone Windows desktop app that enables DaVinci Resolve Studio users to control their timeline with plain English commands. It translates natural language instructions into scripting API calls, allowing users to modify clip properties, perform track operations, and manage markers without needing scripting knowledge.

27
DOCDEV.to AI·26d ago

Spellar 3.0

Spellar 3.0 is an AI-driven language learning platform that provides personalized instruction and feedback. Its technical architecture features a React frontend, a Node.js backend with PostgreSQL, and an NLP engine capable of analyzing multi-language user input.

27