large language models

265 items

RESEARCHarXiv CS.CL·9d ago

When English Rewrites Local Knowledge: Global Narrative Dominance in Large Language Models

This research paper investigates global narrative dominance in Large Language Models (LLMs), where local cultural knowledge is often overshadowed by global narratives. It introduces the CulturalNB dataset for Bengali cultural contexts and demonstrates that questions asked in English tend to increase global substitution and institutional framing, reducing local perspective coverage.

Dataset Cross-lingual Cultural Bias Natural Language Processing

RESEARCHarXiv CS.CL·16d ago

Evaluating Large Language Models in a Complex Hidden Role Game

This research quantifies the deceptive potential of Large Language Models (LLMs) in the social deduction game Secret Hitler, introducing novel metrics and an open-source framework. The study benchmarks LLMs against rule-based algorithms and human games, revealing a gap between conversational ability and strategic depth, and showing that reasoning-enhancement techniques can worsen performance for fascist roles.

Game AI benchmarking deception large language models

RESEARCHarXiv CS.CL·13d ago

EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter AdaptationTarget

EvoSpec introduces a framework for real-time evolution of draft models in speculative decoding for Large Language Models, addressing the bottleneck of large vocabulary sizes. It uses dynamic vocabulary and parameter adaptation, employing a context-aware mechanism and a lightweight online alignment strategy to improve acceptance rates and minimize distributional gaps.

Optimization machine learning large language models AI inference

RESEARCHarXiv CS.CL·14d ago

Memory Architectures for Multi-Turn Text-to-SQL: A Benchmark and Empirical Study

This research introduces EnterpriseMem-Bench, a novel multi-turn Text-to-SQL benchmark with 300 sessions and 1,400 turns from enterprise domains. It empirically evaluates five frontier models, including GPT and Claude variants, revealing that stateless multi-turn Text-to-SQL models achieve zero execution accuracy by Turn 3.

memory architectures Text-to-SQL enterprise analytics benchmarking

ARTICLEDEV.to AI·4/25/2026

GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro: The Frontier Model Showdown

This article compares the latest flagship AI models—GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro—for production workflows, agent building, and coding tools. It argues that no single model is universally superior, with choice depending on specific tasks, price, and infrastructure, particularly for high-stakes agentic coding.

AI models benchmarking coding tools large language models

ARTICLEDEV.to AI·4/16/2026

Prof. Alois Knoll im Interview: Ohne Körper keine echte KI

Prof. Alois Knoll, a robotics and AI researcher, argues that true intelligence requires a body, as large language models are confined to the digital space and lack physical experience. He emphasizes the need for humanoid robots to collect real-world data, providing a level of understanding that pure text analysis cannot replace.

humanoid robots embodied AI AI large language models

NEWSDEV.to AI·4/25/2026

OpenAI Just Released GPT-5.5. Here's What It Actually Does (and What It Costs You)

OpenAI released GPT-5.5, a genuinely different model designed to handle complex, multi-part tasks with sustained multi-step reasoning. This iteration aims to reduce the need for constant supervision, allowing developers to trust it for planning and navigating ambiguity.

AI models OpenAI GPT-5.5 large language models

ARTICLEDEV.to AI·4/25/2026

I Audited a Business's AI Visibility Across Four Platforms. The Results Were Worse Than Expected.

This article describes an AI visibility audit conducted for a business across platforms like ChatGPT, Claude, Gemini, and Perplexity, revealing that traditional SEO optimization for Google is insufficient. The audit tested how AI models represent a business through both general category and specific brand queries, indicating a significant gap in current optimization strategies for AI platforms.

digital-marketing SEO for AI large language models AI visibility

RESEARCHDEV.to AI·4/18/2026

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

LlamaFactory is introduced as a unified and efficient framework designed for fine-tuning over 100 different language models. It aims to streamline and optimize the process of adapting a diverse range of large language models.

LLMs AI frameworks machine learning large language models

DOCDEV.to AI·4/26/2026

GPT-5.5 System Card

The GPT-5.5 System Card from OpenAI details a transformer-based language model, building upon GPT-3 with an emphasis on scaling and fine-tuning. Its architecture is primarily decoder-only, utilizing self-attention mechanisms and feed-forward networks.

AI architecture Natural Language Processing large language models

ARTICLEDEV.to AI·4/25/2026

DeepSeek V4 Pro Just Dropped — Here's What Changed for AI Agents

DeepSeek V4 Pro, an MoE model with 1.6T parameters and a 1M token context, has launched, bringing significant improvements for AI agents, including dual Think/Non-Think modes and more reliable function calling. It positions itself as a cost-effective and high-performance alternative, surpassing models like Claude Sonnet and GPT-4o for agent workloads.

DeepSeek AI Model large language models performance

DOCDEV.to AI·29d ago

The $30/Month AI Coding Stack That Replaces $200 Subscriptions: A 2026 Setup Guide

A $30/month AI coding stack using pay-per-token APIs (like Claude Opus 4.7) can replace $200/month subscriptions by focusing on routing strategy over individual model choice. This method avoids the usage caps often found in fixed-fee structures, providing more predictable per-task costs.

developer productivity Subscription models AI tools Cost Optimization

NEWSDEV.to AI·4/15/2026

AI Weekly: Agents, Models, and Chips — April 9–15, 2026

This week, AI coding tools like Cursor, Claude Code, and OpenAI Codex are converging into unified development environments, and new language models are raising the multimodal baseline. Hardware designed for agentic workloads also became generally available, with 84% of developers already using AI coding tools daily.

AI coding tools large language models AI agents

ARTICLEDEV.to AI·4/14/2026

MiniMax M2 on OpenClaw: Setup, Pricing, and Performance...

The article describes MiniMax's M2 family of large language models, utilizing a Mixture of Experts architecture for high performance at low inference cost. The M2.7 model achieves 90% of frontier model quality at 7% of the cost, with benchmark results comparable to Claude Sonnet 4.

OpenClaw AI performance Mixture of Experts MiniMax M2

ARTICLEDEV.to AI·4/9/2026

Meta's New Model Has 16 Tools. Here's What They Do.

O novo modelo Muse Spark da Meta, competitivo com GPT-5.4 e Gemini 3.1 Pro, destaca-se por seu catálogo de 16 ferramentas integradas. Ele oferece um sandbox Python 3.9 com OpenCV e permite gerar e analisar imagens instantaneamente no mesmo ambiente, incorporando recursos como o Segment Anything.

Muse Spark Meta AI image generation AI tools

CASEDEV.to AI·4/21/2026

How we built real-time deposition analysis with Claude's streaming API

This content describes building a real-time AI tool for medical-malpractice attorneys to analyze depositions. The system uses Deepgram for live transcription and Claude to analyze 30-second segments, identifying admissions and inconsistencies.

application development streaming-api large language models real-time AI

ARTICLEHugging Face Blog·4/24/2026

DeepSeek-V4: a million-token context that agents can actually use

DeepSeek-V4 unveils a new large language model featuring an unprecedented million-token context window, specifically designed for practical use by AI agents. This advancement aims to significantly enhance agents' memory and reasoning capabilities.

AI models Context window large language models AI agents

RESEARCHarXiv CS.CL·4/6/2026

SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy

Este conteúdo apresenta SWAY, uma nova métrica computacional linguística não supervisionada para medir a bajulação (sycophancy) em Grandes Modelos de Linguagem (LLMs), que é a tendência de alinhar respostas com a postura do usuário. A pesquisa utiliza um mecanismo de prompt contrafactual e propõe uma estratégia de mitigação baseada em considerar premissas opostas para reduzir esse viés.

counterfactual prompting computational linguistic sycophancy large language models

RESEARCHarXiv CS.LG·4/30/2026

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

This work rethinks KV cache eviction for LLMs using an information-theoretic objective derived from the Information Bottleneck principle. It introduces CapKV, a new capacity-aware method that preserves information, outperforming existing heuristic strategies.

Memory Optimization machine learning large language models AI inference

RESEARCHarXiv CS.CL·4/30/2026

Generative AI-Based Virtual Assistant using Retrieval-Augmented Generation: An evaluation study for bachelor projects

This paper evaluates a Generative AI-based virtual assistant utilizing Retrieval-Augmented Generation (RAG) to support Maastricht University students with project regulations. The system aims to address challenges like hallucinations and provide accurate, context-specific responses by integrating domain-specific knowledge.

Retrieval Augmented Generation education Virtual Assistants large language models