large language models

262 items

RESEARCHarXiv CS.LG·4/16/2026

Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization

This paper introduces STOMP, a novel offline reinforcement learning algorithm for multi-objective optimization using smooth Tchebysheff scalarization. It addresses the limitation of linear scalarization in recovering non-convex Pareto fronts, crucial for aligning large language models and other real-world applications with conflicting rewards.

reinforcement learning Multi-objective Optimization AI alignment machine learning

RESEARCHarXiv CS.AI·5d ago

Thinking Through Signs: PEEL as a Semiotic Scaffolding for Epistemically Accountable AI-Enabled Research

This commentary introduces PEEL, a working scaffolding combining deterministic distant reading with LLM interpretation, grounded in Peircean semiotics and abductive reasoning. Applied to AI-generated condensations, PEEL reveals systematic distortions invisible without non-AI measurement, implying deterministic instruments must accompany AI tools to ensure fidelity and epistemic authority.

Research methodology AI in research Epistemic accountability large language models

RESEARCHarXiv CS.AI·4/21/2026

Agentic Risk-Aware Set-Based Engineering Design

This paper proposes an LLM-guided multi-agent framework for early-stage engineering design, integrating a human-in-the-loop approach and formal risk management. It uses specialized agents to explore and prune design candidates, demonstrated on aerodynamic airfoil design.

Engineering Design multi-agent systems large language models risk management

ARTICLEDEV.to AI·3d ago

<think>

This content focuses on comparing the costs of various AI models, highlighting cheaper alternatives to GPT-4o. It explores significant savings by using models like GPT-4o-mini, DeepSeek V4 Flash, and Qwen3-32B, which can be up to 40 times more cost-effective.

AI models GPT-4o large language models Cost Efficiency

ARTICLEDEV.to AI·3d ago

<think>

This article details an indie hacker's discovery of substantial cost savings by leveraging alternative AI models via the Global API, comparing their pricing against GPT-4o. It highlights how developers can reduce expenses for large language model inference using a wide range of available options.

AI models Cost Optimization large language models developer tools

ARTICLEDEV.to AI·4/13/2026

Everyone thinks ChatGPT is an AI agent. It's not.

This article delves into the crucial distinction between a chatbot with tools and a true AI agent, arguing that the confusion between the two is why many "AI agent" startups fail. It explores what truly makes a language model an agent, capable of taking real actions and chaining them together autonomously.

AI architecture chatbots large language models AI development

RESEARCHDEV.to AI·4/18/2026

ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using LargeLanguage Models

ChatCAD is an interactive computer-aided diagnosis system that leverages Large Language Models to analyze medical images. It aims to enhance the accuracy and efficiency of medical diagnosis through artificial intelligence.

computer-aided diagnosis Healthcare large language models Medical Imaging

RESEARCHarXiv CS.CL·4/14/2026

GIANTS: Generative Insight Anticipation from Scientific Literature

This paper introduces "insight anticipation," a novel task where language models predict the core insight of a future scientific paper from its foundational predecessors. To evaluate this capability, the authors developed GiantsBench, a benchmark of 17,000 examples, and present GIANTS-4B, an LM trained with reinforcement learning.

Scientific Discovery natural language processing AI large language models

ARTICLEDEV.to AI·5d ago

Context Window Management: Tactics That Survive Real Sessions

Large language models often have a significantly smaller practical context window than their advertised nominal limit due to overhead and attention degradation. This discrepancy affects prompt design and leads to quality drops and truncation long before the hard token limit is reached.

prompt engineering Technical limitations AI performance large language models

RESEARCHarXiv CS.CL·5d ago

Cross-Prompt Generalization in Detecting AI-Generated Fake News Using Interpretable Linguistic Features

This study investigates cross-prompt generalization in detecting AI-generated fake news using interpretable linguistic features like lexical diversity and readability. A random forest classifier achieved consistently high performance (AUC 0.988-1.000) across various train-test combinations, demonstrating robustness against different prompting strategies.

Generalization AI detection fake news large language models

RESEARCHarXiv CS.AI·13d ago

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

This paper proposes POLAR, a multimodal memory-augmented framework for personalized embodied agents over long-term user interactions. POLAR organizes prior interactions into a multimodal knowledge graph, capturing semantic and episodic memory to guide embodied task execution.

personalization multimodal AI memory large language models

ARTICLEDEV.to AI·4/11/2026

Why Your pip Install Output Doesn't Belong in Claude's Context

Este artigo discute como o output detalhado do comando `pip install` é desnecessário e prejudicial para o contexto de modelos de IA como o Claude, que precisam apenas saber se a instalação de pacotes Python foi bem-sucedida ou falhou. Detalhes verbosos como barras de progresso e logs de compilação são considerados ruído que não auxilia a IA na depuração.

prompt engineering AI Context pip Python

RESEARCHarXiv CS.CL·4/20/2026

Think Multilingual, Not Harder: A Data-Efficient Framework for Teaching Reasoning Models to Code-Switch

This research introduces a data-efficient fine-tuning framework to teach large language models to effectively code-switch for reasoning tasks. It identifies beneficial code-switched behaviors, moving beyond treating code-switching as an error, through systematic analysis of diverse reasoning traces.

Multilingual AI Code-Switching Reasoning large language models

RESEARCHarXiv CS.LG·4/16/2026

Design Conditions for Intra-Group Learning of Sequence-Level Rewards: Token Gradient Cancellation

This paper presents a necessary condition for intra-group learning algorithm design in Reinforcement Learning, requiring objectives to maintain gradient exchangeability across token updates to prevent reward-irrelevant drift. It proposes minimal transformations to restore this cancellation structure, which stabilizes training and improves sample efficiency.

reinforcement learning large language models gradient dynamics model optimization

RESEARCHarXiv CS.LG·5/7/2026

Structured Progressive Knowledge Activation for LLM-Driven Neural Architecture Search

This paper introduces Structured Progressive Knowledge Activation (SPARK) to address the challenge of integrating architectural knowledge in LLM-driven Neural Architecture Search (NAS). SPARK mitigates "functional entanglement" by enabling factor-conditioned editing, leading to more targeted and reliable architecture modifications.

Neural Architecture Search machine learning Knowledge Integration large language models

RESEARCHarXiv CS.CL·4/22/2026

Mango: Multi-Agent Web Navigation via Global-View Optimization

Mango is a multi-agent web navigation method that optimizes complex website exploration by leveraging a global view. It dynamically determines optimal starting points and adaptively allocates navigation budget, achieving a 63.6% success rate with GPT-5-mini, outperforming the best baseline by 7.3%.

Optimization web navigation large language models AI agents

RESEARCHarXiv CS.LG·4/22/2026

Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling

This work addresses the challenge of missing modalities in multimodal clinical data for diagnosis by reframing it as an autoregressive sequence modeling task. It leverages causal decoders from LLMs and a missingness-aware contrastive pre-training to outperform baselines on benchmarks like MIMIC-IV and eICU.

multimodal AI machine learning large language models healthcare AI

RESEARCHarXiv CS.LG·4/28/2026

Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing

This work addresses the significant memory footprint of Key-Value (KV) caching in transformer language models, proposing optimization through the depth dimension. It introduces a method for cross-layer cache sharing, demonstrating that dropping a layer's cache can be efficient without information loss, and suggests a training approach with random cross-layer attention.

deep learning Memory Optimization large language models Transformers

RESEARCHarXiv CS.CL·4/13/2026

Drift and selection in LLM text ecosystems

This paper introduces a mathematical framework to analyze the recursive process where AI-generated text re-enters and shapes the public record from which LLMs learn. It distinguishes between "drift," which removes rare forms through unfiltered reuse, and "selection," which filters content based on criteria like quality, showing normative selection preserves deeper linguistic structures.

Text Ecosystems data drift model collapse large language models

RESEARCHarXiv CS.LG·19d ago

Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry

Geometry-Lite is a novel prompt-level probe designed to interpret how safety evidence develops across layers in large language models. It analyzes layer-wise margin geometry using various readouts to understand boundary formation, improving safety detection over single-layer probes.

deep learning Probing interpretability large language models