large language models

262 items

RESEARCHarXiv CS.CL·5/4/2026

NorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpus

NorBERTo is a new ModernBERT model trained on a 331 billion token Brazilian Portuguese corpus (Aurora-PT), designed for long-context support and efficient attention mechanisms. It achieves state-of-the-art results among evaluated encoder models on semantic similarity, textual entailment, and classification tasks using datasets like ASSIN 2 and PLUE.

AI models BERT Portuguese NLP

RESEARCHarXiv CS.AI·5/11/2026

When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning

This paper introduces SCALAR (Structured Critic--Actor Loop for AI Reasoning), an Actor--Critic--Judge pipeline applied to theoretical physics problems. It investigates how the interaction between researchers and AI agents affects results in physics reasoning tasks, demonstrating that multi-turn dialogue significantly improves over single-shot attempts.

theoretical physics AI Reasoning Agentic AI large language models

RESEARCHarXiv CS.LG·4/23/2026

Accelerating PayPal's Commerce Agent with Speculative Decoding: An Empirical Study on EAGLE3 with Fine-Tuned Nemotron Models

This paper evaluates speculative decoding with EAGLE3 as an inference-time optimization for PayPal's Commerce Agent, powered by fine-tuned Nemotron models. The study demonstrates significant performance improvements, including 22-49% throughput increase and 18-33% latency reduction at zero additional hardware cost.

Performance benchmarking LLM optimization Inference acceleration large language models

RESEARCHarXiv CS.CL·4/23/2026

CoAuthorAI: A Human in the Loop System For Scientific Book Writing

CoAuthorAI is a human-in-the-loop system designed for scientific book writing, tackling LLM challenges like inconsistency and unreliable citations. It combines retrieval-augmented generation, expert outlines, and automatic reference linking, demonstrated by a high satisfaction rate and a published book.

human-in-the-loop Content Generation AI tools Scientific Writing

RESEARCHarXiv CS.LG·4/23/2026

Rethinking Reinforcement Fine-Tuning in LVLM: Convergence, Reward Decomposition, and Generalization

This research introduces the Tool-Augmented Markov Decision Process (TA-MDP) to formally model multimodal agentic decision-making, addressing theoretical gaps in reinforcement fine-tuning for Large Vision-Language Models (LVLMs). It specifically investigates how composite verifiable rewards affect GRPO convergence and why training on small datasets generalizes to out-of-distribution domains for agentic LVLMs.

Theoretical AI reinforcement learning vision models large language models

RESEARCHarXiv CS.LG·4/23/2026

Super Apriel: One Checkpoint, Many Speeds

Super Apriel, a 15B-parameter supernet, has been released, offering four trained mixer choices per decoder layer to enable multiple speed/quality presets from a single checkpoint. This allows for 2.9x to 10.7x decode throughput gains with 96% to 77% quality retention, and also facilitates speculative decoding without a separate draft model.

neural network architecture Performance optimization attention mechanisms large language models

RESEARCHarXiv CS.CL·26d ago

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

This comprehensive replication study evaluates the efficacy of DExperts, an inference-time mitigation technique, to reduce toxicity in Large Language Models. The research establishes baseline toxicity measurements, implements DExperts to mitigate explicit toxicity, and stress-tests the method against implicit hate speech.

DExperts security Toxicity large language models

RESEARCHarXiv CS.CL·20d ago

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

Large language models struggle with complex long-context reasoning tasks despite supporting extensive inputs. ProxyCoT is a novel training framework designed to transfer reasoning capabilities from short proxy contexts to full long contexts, outperforming strong baselines.

machine learning Natural Language Processing Reasoning large language models

RESEARCHarXiv CS.CL·13d ago

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

FLUID is a new framework designed to efficiently adapt Autoregressive (AR) backbones to the diffusion paradigm for parallel text generation. It enables initialization from GPT-style models and introduces an entropy-driven mechanism called Elastic Horizons, achieving state-of-the-art performance with significantly reduced training costs.

Diffusion Models text generation large language models Autoregressive Models

ARTICLEDEV.to AI·4/14/2026

Best Qwen Models in 2026 — Alibaba's Open-Source AI Powerhouse

This article highlights Alibaba's Qwen model family as the largest and most complete open-source AI offering in 2026, detailing the Qwen3 series and the advanced Qwen3.5. It emphasizes the flagship Qwen3-235B-A22B's competitive performance against Gemini 2.5 Pro and discusses Alibaba's broader AI strategy.

AI models Alibaba open-source AI large language models

NEWSDEV.to AI·4/17/2026

GPT‑Rosalind for life sciences research

GPT-Rosalind, a new OpenAI tool based on GPT-4 and fine-tuned on scientific data, has been launched to accelerate life sciences research. It addresses the data bottleneck by optimizing hypothesis generation, literature analysis, and experimental design, with the potential to reduce drug development costs and timelines.

Scientific Discovery Life Sciences AI large language models

RESEARCHarXiv CS.CL·4/16/2026

Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage

Dental-TriageBench introduces the first expert-annotated benchmark for multimodal reasoning in hierarchical dental triage, comprising 246 authentic, de-identified cases. The research highlights a substantial performance gap between 19 MLLMs and junior dentists, particularly in treatment-level triage tasks requiring multiple referral domains.

multimodal AI Healthcare Benchmarking large language models

RESEARCHarXiv CS.AI·5/1/2026

End-to-end autonomous scientific discovery on a real optical platform

The text introduces the Qiushi Discovery Engine, an LLM-based agentic system for autonomous scientific discovery on a real optical platform. It demonstrates end-to-end discovery by combining nonlinear research phases, Meta-Trace memory, and a dual-layer architecture, successfully reproducing a published experiment.

Autonomous systems Scientific Automation large language models robotics

RESEARCHarXiv CS.CL·23d ago

Greedy or not, here I come: Language production under vocabulary constraints in humans and resource-rational models

This research explores how humans communicate with limited vocabularies, comparing their strategies to computational sampling algorithms powered by large language models. The study reveals that human language production under constraint often mirrors greedy sampling, although more skilled individuals exhibit non-greedy revision behaviors.

cognitive science human behavior language production Natural Language Processing

RESEARCHarXiv CS.CL·23d ago

Fluency and Faithfulness in Human and Machine Literary Translation

This research investigates the balance between fluency and faithfulness in literary translation, comparing human, Google Translate, and TranslateGemma performance across 106 novels in 16 source languages. It reveals a consistent negative correlation between fluency and faithfulness, particularly for human and Google Translate, and indicates that segment length significantly impacts automatic evaluation.

Literary Translation Translation Evaluation Natural Language Processing machine translation

RESEARCHarXiv CS.CL·6d ago

When Retrieval Doesn't Help: A Large-Scale Study of Biomedical RAG

A large-scale study re-evaluates Retrieval-Augmented Generation (RAG) in medical question answering, finding only small and inconsistent improvements over no-retrieval baselines. It suggests that the choice of the backbone model is more critical than retrieval methods, and the main bottleneck is the model's ability to effectively use retrieved evidence.

RAG Medical Question Answering Biomedical AI large language models

RESEARCHarXiv CS.LG·6d ago

Unlocking Feature Learning in Gated Delta Networks at Scale

This paper derives scaling rules for Gated Delta Networks to address the computational demands of training and scaling Large Language Models. Experiments validate that these configurations enable stable learning-rate transfer across various model widths, unlike standard parametrization.

neural networks learning Hyperparameter Tuning machine learning

RESEARCHarXiv CS.AI·6d ago

Thinking Through Signs: PEEL as a Semiotic Scaffolding for Epistemically Accountable AI-Enabled Research

This commentary introduces PEEL, a working scaffolding combining deterministic distant reading with LLM interpretation, grounded in Peircean semiotics and abductive reasoning. Applied to AI-generated condensations, PEEL reveals systematic distortions invisible without non-AI measurement, implying deterministic instruments must accompany AI tools to ensure fidelity and epistemic authority.

Research methodology AI in research Epistemic accountability large language models

DOCDEV.to AI·8d ago

The Developer's Guide to Slashing Your AI API Bill by 95%

This guide shows developers how to slash AI API costs by up to 95%, advocating for cheaper alternatives like DeepSeek V4 Flash over GPT-4o. It emphasizes a 40x price difference for similar output quality, helping developers manage project budgets effectively.

DeepSeek-V4-Flash AI API costs Cost Optimization developer guide

NEWSDEV.to AI·20d ago

Google Sparks AI Race with Gemini 3.5 Flash’s Breakthrough Speed

Google's Gemini 3.5 Flash revolutionizes AI speed, offering instant, top-tier intelligence for coding and complex reasoning tasks. This new model sets a new standard for performance, outperforming previous versions and challenging rivals.

Google AI AI Speed Gemini large language models