Natural Language Processing

168 items

ARTICLEDEV.to AI·4/24/2026

GoDavaii's Day 5: When 22 Indian Languages Redefine 'Hard' in Health AI

GoDavaii is tackling the deep-tech challenge of health AI in India, focusing on understanding health descriptions in 22 local languages with their cultural nuances. The company emphasizes that interpreting culturally specific health expressions is far more complex than direct translation, a challenge often missed by global health AIs.

Multilingual AI India AI in healthcare Natural Language Processing

RESEARCHDEV.to AI·4/26/2026

Multi-Perspective Context Matching for Machine Comprehension

This content discusses an innovative multi-perspective context matching method designed to enhance machine comprehension. The technique aims to improve AI systems' ability to understand complex texts by analyzing information from various angles.

Context Matching Natural Language Processing Machine Comprehension

ARTICLEDEV.to AI·4/27/2026

Building Smart Student Engagement Detector: An AI-Powered Early Learning Issue Detection System using ML, NLP & Multimodal Analytics

This project describes an AI-powered student engagement detection system that uses ML, NLP, and multimodal analytics to identify early signs of learning difficulties. The goal is to intervene before academic, attendance, or behavioral issues escalate and reflect in grades.

Multimodal Analytics education machine learning Natural Language Processing

RESEARCHarXiv CS.AI·21d ago

From Prompts to Protocols: An AI Agent for Laboratory Automation

This paper introduces an AI agent architecture that integrates large language models with laboratory orchestration. It enables scientists to interactively create and monitor automated lab protocols using natural language.

Experiment Orchestration AI agent Natural Language Processing large language models

RESEARCHarXiv CS.AI·4/7/2026

Towards the AI Historian: Agentic Information Extraction from Primary Sources

Este relatório técnico apresenta o primeiro módulo de Chronos, um Historiador de IA em desenvolvimento. Ele permite que historiadores convertam imagens digitalizadas de fontes primárias em dados através de interações em linguagem natural, adaptando e refinando fluxos de trabalho.

Open Source Information Extraction Natural Language Processing AI

RESEARCHarXiv CS.CL·4/7/2026

Text Summarization With Graph Attention Networks

Este estudo explorou o uso de informações de grafos (RST e Co-referência) para sumarização de texto, descobrindo que Redes de Atenção Gráficas não melhoraram o desempenho, enquanto um Perceptron Multicamadas obteve sucesso. Adicionalmente, foi criado um novo benchmark para sumarização baseada em grafos ao anotar o dataset XSum com informações RST.

Graph Attention Networks Rhetorical Structure Theory machine learning Natural Language Processing

RESEARCHarXiv CS.LG·4/6/2026

SIEVE: Sample-Efficient Parametric Learning from Natural Language

SIEVE propõe um método para aprendizado paramétrico com eficiência de amostra a partir de contexto de linguagem natural, necessitando de apenas três exemplos de consulta. Ele emprega uma pipeline de geração de dados sintéticos, SIEVE-GEN, que decompõe o contexto para gerar resultados de maior qualidade e destilar o contexto no modelo.

language models Sample Efficiency contextual learning machine learning

RESEARCHarXiv CS.CL·4/6/2026

Speaking of Language: Reflections on Metalanguage Research in NLP

Este trabalho define metalinguagem e explora sua conexão com PNL e LLMs, discutindo esforços de pesquisa e dimensões de tarefas metalinguísticas. Propõe ainda uma lista de futuras direções de pesquisa pouco estudadas.

LLMs research Metalanguage NLP

RESEARCHarXiv CS.AI·4/23/2026

Algorithm Selection with Zero Domain Knowledge via Text Embeddings

The paper proposes ZeroFolio, a feature-free algorithm selection method that uses pretrained text embeddings of raw instance files. This approach, requiring zero domain knowledge, outperforms traditional methods with hand-crafted features in most evaluated scenarios across diverse problem domains.

machine learning Natural Language Processing algorithm selection zero-shot learning

RESEARCHarXiv CS.AI·4/23/2026

Automated Detection of Dosing Errors in Clinical Trial Narratives: A Multi-Modal Feature Engineering Approach with LightGBM

This research presents an automated system for detecting dosing errors in clinical trial narratives, leveraging LightGBM with comprehensive multi-modal feature engineering. It combines traditional NLP, semantic embeddings, medical patterns, and transformer scores to achieve high ROC-AUC on an imbalanced dataset.

machine learning Natural Language Processing healthcare AI

RESEARCHarXiv CS.AI·4/23/2026

Exploring Data Augmentation and Resampling Strategies for Transformer-Based Models to Address Class Imbalance in AI Scoring of Scientific Explanations in NGSS Classroom

This study explores data augmentation strategies to enhance transformer-based models for automated scoring of student scientific explanations, specifically addressing class imbalance. It evaluates methods like GPT-4 generated responses, EASE, and ALP against a SciBERT baseline, using a dataset of 1,466 high school responses.

machine learning Natural Language Processing education technology Data Augmentation

RESEARCHarXiv CS.CL·5/6/2026

MedStruct-S: A Benchmark for Key Discovery, Key-Conditioned QA and Semi-Structured Extraction from OCR Clinical Reports

MedStruct-S is a new benchmark for semi-structured information extraction from OCR-derived clinical reports, addressing challenges like heterogeneous key representations and OCR noise. It aims to evaluate model robustness in real-world settings for key discovery, key-conditioned QA, and key-value pair extraction.

Information Extraction clinical reports Benchmarking Natural Language Processing

RESEARCHarXiv CS.CL·5/6/2026

Geometric Deviation as an Unsupervised Pre-Generation Reliability Signal: Probing LLM Representations for Answerability

This research explores using geometric deviation of LLM hidden states as a pre-generation signal to determine if a query is outside the model's knowledge, requiring no labeled failure data. Across various models and prompt forms, it finds that this signal effectively predicts unanswerable math prompts but not factual ones.

LLMs research Model Evaluation Reliability

RESEARCHarXiv CS.CL·5/6/2026

How Language Models Process Negation

This study investigates how Large Language Models (LLMs) mechanistically process negation, revealing that even open-weight models possess internal components for correct negation processing despite often providing wrong answers. Their poor accuracy is attributed to late-layer attention promoting simple shortcuts, and models implement both attending to negated phrases and directly constructing negative phrase representations.

LLMs Mechanistic Interpretability attention mechanisms Natural Language Processing

RESEARCHarXiv CS.CL·5/6/2026

S^2tory: Story Spine Distillation for Movie Script Summarization

S^2tory is a narratology-grounded AI framework designed for movie script summarization, addressing the complexity of non-linear narratives by identifying "plot nuclei" through character development trajectories. It employs a Narrative Expert Agent to distill knowledge, which then conditions a model to identify essential plot points for summary generation.

machine learning narrative AI Natural Language Processing summarization

DOCAndrej Karpathy (YouTube)·2/20/2024

Let's build the GPT Tokenizer

This content provides a practical guide on building a GPT tokenizer, detailing the fundamental steps and concepts involved. It explores how GPT models process text by converting it into smaller units for analysis.

GPT learning Natural Language Processing tokenizer

DOCfast.ai Blog·1/20/2026

How To Use AI for the Ancient Art of Close Reading

This content explores how to use artificial intelligence, specifically Large Language Models (LLMs), for the ancient practice of close reading. It details experiments and approaches for applying AI to this traditional art form.

text analysis LLMs learning Natural Language Processing

How To Use AI for the Ancient Art of Close Reading

RESEARCHarXiv CS.AI·4/9/2026

BDI-Kit Demo: A Toolkit for Programmable and Conversational Data Harmonization

O BDI-Kit é uma ferramenta que aborda a harmonização de dados, superando a heterogeneidade em esquemas e valores. Ele oferece uma API Python para pipelines programáticos e uma interface de chat assistida por IA para especialistas, permitindo explorar, validar e refinar correspondências de dados de forma iterativa.

Data Harmonization Natural Language Processing AI

ARTICLEDEV.to AI·5/2/2026

The Aunty Test - what Bengali-speaking patients see when they ask Health AI in their own language

Existing English-first Health AI poorly handles medical queries in languages like Bengali, often suggesting rephrasing in English. In contrast, GoDavaii reasons natively in 22 Indian languages, providing accurate and culturally relevant medical advice.

Multilingual AI Healthcare technology AI bias Natural Language Processing

ARTICLEDEV.to AI·5/2/2026

Advances in Multimodal AI: Researchers Develop New Framework for Fusion of Vision and Language

Multimodal AI, integrating multiple data sources like vision and language, is gaining traction due to increasing digitization and diverse applications across sectors. Despite its promise, a key challenge remains the effective fusion of disparate data types with different processing requirements.

multimodal AI computer vision Natural Language Processing