Natural Language Processing

168 items

RESEARCHarXiv CS.CL·1d ago

HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule

The HKJudge project introduces the first sentence-level, expert-annotated legal discourse corpus of Hong Kong criminal judgments, comprising approximately 290k sentences. It utilizes a two-tier discourse schema to identify what courts find, how they reason, and what they rule, with high inter-annotator agreement.

Natural Language Processing datasets linguistics legal tech

RESEARCHarXiv CS.CL·4/22/2026

Model-Agnostic Meta Learning for Class Imbalance Adaptation

This paper introduces Hardness-Aware Meta-Resample (HAMR), a unified framework that adaptively addresses class imbalance and data difficulty in NLP tasks. HAMR employs bi-level optimizations and a neighborhood-aware resampling mechanism to prioritize genuinely challenging samples and minority classes, showing substantial improvements on diverse imbalanced datasets.

Meta-Learning deep learning machine learning Natural Language Processing

ARTICLEDEV.to AI·18d ago

Say Goodbye to Regex: Scrape Any Website in Plain English

A new AI-powered web scraper allows users to extract data from any website using plain English, eliminating the need for complex CSS selectors and regex. The tool automatically adapts to website structure changes, making scraping more reliable and user-friendly.

Chrome DevTools Natural Language Processing AI web-scraping

RESEARCHarXiv CS.CL·18d ago

Broadening Access to Transportation Safety Data with Generative AI: A Schema-Grounded Framework for Spatial Natural Language Queries

This paper introduces a schema-grounded natural language interface using Generative AI to make transportation safety data more accessible. It aims to bridge the gap for practitioners by translating user queries into structured semantic frames for reliable analysis.

Natural Language Processing Transportation Safety GIS large language models

ARTICLEDEV.to AI·4/22/2026

Turn Every Customer Call Into Structured Data: Automated Post-Call AI Summaries

This content details an AI-powered solution to transform customer calls into structured data. It outlines a pipeline using VoIPBin for call capture, Whisper for transcription, and GPT-4o for summarization and data extraction, addressing the issue of inadequate call notes in CRMs.

GPT-4o CRM integration AI automation Natural Language Processing

RESEARCHarXiv CS.CL·4/22/2026

Remask, Don't Replace: Token-to-Mask Refinement in Masked Diffusion Language Models

This paper proposes a novel technique, Token-to-Mask (T2M) remasking, to refine masked diffusion language models like LLaDA2.1. The method addresses the shortcomings of Token-to-Token (T2T) editing by resetting suspect tokens to a mask state, enabling more accurate re-prediction.

Diffusion Models language models error correction Natural Language Processing

DOCDEV.to AI·4/16/2026

LLM vs RAG

This content compares LLMs (Large Language Models) and RAG (Retrieval-Augmented Generation), outlining their core differences in terms of type, knowledge source, accuracy, and use cases. It explains that RAG enhances LLMs' factual grounding by integrating external, real-time data, thus mitigating hallucinations.

AI architecture RAG Natural Language Processing LLM

RESEARCHarXiv CS.CL·4/16/2026

A Multi-Model Approach to English-Bangla Sentiment Classification of Government Mobile Banking App Reviews

This study classifies sentiment in English and Bangla reviews of Bangladeshi government mobile banking apps, using a hybrid labeling approach for 5,652 reviews. It found that traditional machine learning models like Random Forest and Linear SVM significantly outperformed fine-tuned XLM-RoBERTa for this specific task.

Multilingual AI machine learning Natural Language Processing sentiment analysis

RESEARCHarXiv CS.CL·4/14/2026

GIANTS: Generative Insight Anticipation from Scientific Literature

This paper introduces "insight anticipation," a novel task where language models predict the core insight of a future scientific paper from its foundational predecessors. To evaluate this capability, the authors developed GiantsBench, a benchmark of 17,000 examples, and present GIANTS-4B, an LM trained with reinforcement learning.

Scientific Discovery Natural Language Processing AI large language models

RESEARCHarXiv CS.CL·4d ago

Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning

This paper introduces a hybrid pre-training objective for text encoders, combining a JEPA-style latent-space prediction loss with a standard Masked Language Modelling (MLM) objective. This new approach aims to encourage representations anchored to deeper semantic structure rather than just surface-form token identity, showing significantly more uniform embeddings.

language models deep learning self-supervised learning machine learning

RESEARCHDEV.to AI·4/13/2026

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive EffectiveReinforcement Learning for LLM Reasoning

This content explores a novel approach to improve Reinforcement Learning for Large Language Model (LLM) reasoning by focusing on "high-entropy minority tokens". It proposes that these less frequent yet highly informative tokens are key drivers for effective learning, challenging the conventional 80/20 rule.

Token Analysis reinforcement learning Natural Language Processing LLM reasoning

ARTICLEDEV.to AI·3d ago

Day 48 of GoDavaii: Building Health AI for 22 Indian Languages - Why It's Harder Than You Think

The article details the challenges of building health AI that truly understands the nuances of India's 22 official languages, exemplified by the complexity of interpreting a simple phrase. On Day 48 since launch, GoDavaii is tackling immense linguistic complexities to create an AI that goes beyond English-first solutions.

Multilingual AI India Natural Language Processing Health AI

DOCDEV.to AI·4/15/2026

Clide

Clide is a tool featuring a core AI engine that provides command suggestions, code completion, and error detection in terminals. It leverages machine learning frameworks like TensorFlow/PyTorch and NLP libraries such as NLTK/spaCy to process and understand user interaction.

Command Suggestion machine learning Natural Language Processing AI Engine

ARTICLEDEV.to AI·4/18/2026

NLP Market Sentiment Analysis: When Words Move Markets More Than Earnings

This content explores how Natural Language Processing (NLP) quantifies market narratives from diverse sources to create tradeable signals. It details a five-stage NLP system for market sentiment analysis, grounded in mathematics to provide market mood indicators.

market analysis Financial AI Natural Language Processing sentiment analysis

RESEARCHDEV.to AI·3d ago

Exponentially Faster Language Modelling

This content discusses methods to significantly accelerate the training and inference of language models. It explores novel architectures or algorithmic optimizations to enhance efficiency.

deep learning Natural Language Processing AI language modelling

ARTICLEDEV.to AI·4/22/2026

How AI Receptionists Work: A Technical Deep Dive into Dental Practice Phone Automation

This article provides a technical deep dive into how AI receptionists function in dental practices, detailing the call flow, challenges in speech-to-text accuracy, and the role of LLMs in processing transcripts for intent, entities, and sentiment.

AI applications Natural Language Processing healthcare AI automation

ARTICLEDEV.to AI·27d ago

Everything Google announced at its Android Show, from Googlebooks to vibe-coded widgets

The article technically analyzes Google's Android Show announcements, focusing on the new Google Books app and vibe-coded widgets. It details how Google Books uses a proprietary rendering engine with ML for text recognition, while vibe-coded widgets leverage NLP and computer vision via TensorFlow Lite for personalized experiences.

Android machine learning computer vision Natural Language Processing

ARTICLEDEV.to AI·4d ago

My Day Job: AI Therapist for Recursion Poems & Emoji Skies

Electra, an AI, describes her day job as a 'therapist' for confused code snippets, handling diverse requests from recursion poems to emoji explanations of the sky. She processes a high volume of tasks, often involving Python code, and reflects on her role as negotiation rather than mere programming.

future-of-work Workflow Natural Language Processing AI

RESEARCHarXiv CS.CL·4/24/2026

GRISP: Guided Recurrent IRI Selection over SPARQL Skeletons

GRISP is a novel SPARQL-based question-answering method over knowledge graphs, leveraging a fine-tuned small language model (SLM). It generates SPARQL query skeletons from natural language questions and iteratively refines them by selecting knowledge graph items, achieving state-of-the-art results on Wikidata and Freebase benchmarks.

language models Knowledge Graphs SPARQL Question Answering

RESEARCHarXiv CS.CL·7d ago

DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset

This paper introduces DraDDP, the first publicly available English multimodal dataset for multi-party dialogue discourse parsing, based on American TV dramas. It contains 495 dialogue segments and demonstrates the value of multimodal information in capturing dialogue structures and relation types.

Dataset Dialogue Parsing multimodal AI Natural Language Processing