Large Language Models

264 items

DOCDEV.to AI·8d ago

The Developer's Guide to Slashing Your AI API Bill by 95%

This guide shows developers how to slash AI API costs by up to 95%, advocating for cheaper alternatives like DeepSeek V4 Flash over GPT-4o. It emphasizes a 40x price difference for similar output quality, helping developers manage project budgets effectively.

DeepSeek-V4-Flash AI API costs Cost Optimization developer guide

NEWSDEV.to AI·20d ago

Google Sparks AI Race with Gemini 3.5 Flash’s Breakthrough Speed

Google's Gemini 3.5 Flash revolutionizes AI speed, offering instant, top-tier intelligence for coding and complex reasoning tasks. This new model sets a new standard for performance, outperforming previous versions and challenging rivals.

Google AI AI Speed Gemini Large Language Models

RESEARCHarXiv CS.CL·4/20/2026

Why Fine-Tuning Encourages Hallucinations and How to Fix It

Large language models often hallucinate facts, a problem exacerbated by supervised fine-tuning (SFT) which degrades pre-trained knowledge. This research proposes a self-distillation SFT method, inspired by continual learning, to mitigate hallucinations by regularizing output-distribution drift while effectively acquiring new factual information.

hallucinations Large Language Models fine-tuning Continual Learning

RESEARCHarXiv CS.AI·4/16/2026

ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold

ReSS is a framework that bridges symbolic and neural reasoning models for tabular data prediction, aiming for both high accuracy and understandable reasoning. It leverages decision trees to extract symbolic scaffolds that guide an LLM to generate natural-language reasoning, which is then used to fine-tune specialized tabular reasoning LLMs.

Machine Learning Explainable AI tabular data Large Language Models

ARTICLEDEV.to AI·4/21/2026

The Physics Wall in 2026: 3 Papers That Show Why Node Shrinks Won't Save Us

This article contends that simple semiconductor node shrinks no longer guarantee significant performance or power efficiency gains, challenging industry buzzwords like "2nm." It analyzes recent research papers and real-world LLM inference benchmarks to measure the current "physics wall" and predict future trends.

technology limitations AI hardware semiconductors Performance optimization

NEWSDEV.to AI·18d ago

Google: Recaps Dialogues Stage at I/O 2026

Google has released a recap of the Dialogues stage sessions from its I/O 2026 developer conference, featuring conversations with Sundar Pichai and other AI leaders. The recap highlights the company's advancements in artificial intelligence, its integration across products, responsible AI development, and future applications including LLMs.

AI applications Google AI Large Language Models AI development

RESEARCHarXiv CS.CL·4/13/2026

Medical Reasoning with Large Language Models: A Survey and MR-Bench

This paper presents a comprehensive review of medical reasoning with Large Language Models (LLMs), conceptualizing it as an iterative process of abduction, deduction, and induction. It organizes existing methods into seven technical routes and conducts a unified cross-benchmark evaluation of representative models.

Medical Reasoning LLMs in Medicine Large Language Models healthcare AI

ARTICLEDEV.to AI·21d ago

Airflow to the Rescue: How AI Powers Better DAG Failures

This article presents a production-implemented approach to enhance Apache Airflow failure detection and diagnosis. It leverages large language models, statistical methods, and traditional machine learning to analyze extensive logs and classify messages.

data engineering Machine Learning AI Large Language Models

RESEARCHarXiv CS.AI·4/13/2026

StaRPO: Stability-Augmented Reinforcement Policy Optimization

StaRPO is a novel reinforcement learning framework designed to improve the logical consistency and structural coherence of large language models in complex reasoning tasks. It explicitly incorporates stability metrics, such as Autocorrelation Function and Path Efficiency, to evaluate local step-to-step coherence and global goal-directedness of the reasoning process.

Policy optimization LLMs reinforcement learning Reasoning

RESEARCHarXiv CS.CL·5/11/2026

MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text

MELD is a new deployable AI-generated text detector that enhances binary detection with auxiliary multi-task supervision. It aims for robustness against attacks, transferability to unseen generators, and low false-positive rates.

security Large Language Models AI-generated text detection

RESEARCHarXiv CS.AI·5/11/2026

More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

New research indicates that position bias in reasoning models, such as Chain-of-thought, scales with the length of the reasoning trajectory. This effect was observed across various model configurations and benchmarks, suggesting that "more thinking" can exacerbate certain biases.

AI bias Natural Language Processing reasoning models Machine learning research

RESEARCHarXiv CS.CL·4/7/2026

SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

SoLA é um novo método de compressão sem treinamento para LLMs, que utiliza esparsidade de ativação suave e decomposição de baixo-rank. Ele identifica componentes cruciais para a inferência e comprime a maioria, visando reduzir parâmetros de modelos de linguagem grandes de forma eficiente e acessível.

Sparsity Low-Rank Decomposition LLM compression Large Language Models

RESEARCHarXiv CS.AI·5/1/2026

Think it, Run it: Autonomous ML pipeline generation via self-healing multi-agent AI

This paper proposes a unified multi-agent AI architecture to automate end-to-end machine learning (ML) pipeline generation from datasets and natural-language goals. The five-agent system integrates RAG, an explainable hybrid recommender, and an LLM-based self-healing mechanism, achieving an 84.7% success rate and improved robustness.

Retrieval Augmented Generation multi-agent AI Large Language Models ML Automation

RESEARCHarXiv CS.AI·7d ago

Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models

This paper evaluates "harmful overthinking" in Large Reasoning Models, where continued reasoning after a correct answer can destabilize a correct trajectory. It introduces a protocol to distinguish verbose from harmful overthinking, finding issues in multimodal benchmarks.

multimodal AI Overthinking Model Evaluation AI Reasoning

RESEARCHarXiv CS.CL·21d ago

MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent

MMoA is a recurrent Mixture-of-Agents (MoA) framework that integrates LSTM-based gating for adaptive agent selection. It enhances LLM performance by dynamically activating fewer agents while achieving comparable accuracy on benchmarks like AlpacaEval 2.0.

benchmarking Recurrence Mixture-of-Agents Large Language Models

RESEARCHarXiv CS.CL·22d ago

Language Acquisition Device in Large Language Models

This paper proposes LAD-inspired pre-pretraining on MP-STRUCT, a formal language reflecting natural language structures, to improve Large Language Models' data efficiency. A brief pre-pretraining with MP-STRUCT matches strong formal-language baselines in token efficiency and imparts human-like resistance to structurally implausible languages.

Formal Languages Pre-pretraining Language Acquisition MP-STRUCT

DOCDEV.to AI·6d ago

One API Key to Rule All AI Models: A Developer's Guide to TokenEase

TokenEase is an AI API aggregation gateway that provides a single OpenAI-compatible API key to access multiple leading language models. It simplifies AI application development by eliminating the need to manage various keys, endpoints, and rate limits for different models.

AI integration API Management Large Language Models developer tools

NEWSDEV.to AI·18d ago

Qwen3-Coder-Next: 80B total, 3B active, 70.6 on SWE-Bench

Qwen3-Coder-Next is an 80B total, 3B active sparse Mixture-of-Experts (MoE) model, achieving a score of 70.6 on SWE-Bench Verified. It features a hybrid attention mechanism and Apache 2.0 weights, being a coding-tuned variant of the Qwen3-Next-80B-A3B base.

benchmarking code generation Mixture of Experts Large Language Models

ARTICLEDEV.to AI·4/26/2026

DeepSeek V4: Million-Token Context That Actually Works

DeepSeek V4 delivers a 1 million-token context that is actually usable, solving the GPU memory issue with a hybrid attention architecture that compresses the KV cache by nearly 9x. This makes it a practical solution for long-context inference, unlike many other models.

DeepSeek AI models Model Architecture Large Language Models

RESEARCHDEV.to AI·26d ago

Large Language Models are Few-Shot Health Learners

This content explores the ability of Large Language Models (LLMs) to learn health-related tasks with few examples. It discusses how few-shot learning can be effectively applied in the healthcare domain using LLMs.

learning AI Few-Shot Learning Large Language Models