large language models

262 items

ARTICLETogether AI Blog·8d ago

Serving MiniMax-M3 for efficient inference: Unlocking 1M-Token Context and Multimodality Without Regrets

Together achieved efficient inference for MiniMax-M3, unlocking 1M-token context and multimodality. This was accomplished through KV-block-major sparse attention, paged MSA decode, optimized index scoring, and a Rust-based multimodal gateway.

System Design Optimization Multimodality large language models

RESEARCHarXiv CS.AI·4/7/2026

Automated Analysis of Global AI Safety Initiatives: A Taxonomy-Driven LLM Approach

Este trabalho apresenta um framework automatizado para comparar documentos de política de segurança de IA usando LLMs e uma taxonomia compartilhada, avaliando a estabilidade e validade da análise.

Policy Analysis Crosswalk Framework Automated Analysis large language models

RESEARCHarXiv CS.AI·4/8/2026

ReVEL: Multi-Turn Reflective LLM-Guided Heuristic Evolution via Structured Performance Feedback

ReVEL propõe um framework híbrido que integra LLMs como raciocinadores multi-turno dentro de algoritmos evolutivos para evoluir heurísticas eficazes para problemas de otimização NP-difíceis. O método utiliza agrupamento de perfis de desempenho e reflexão guiada por feedback para que o LLM analise comportamentos e gere refinamentos direcionados.

Otimização Combinatória Inteligência Artificial Algoritmos Evolutivos Heurísticas

RESEARCHarXiv CS.AI·4/8/2026

Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya

Grandes modelos de linguagem (LLMs) falham em raciocínio sistemático e frequentemente alucinam, expondo uma lacuna epistêmica. Pramana é uma nova abordagem que ensina metodologia epistemológica explícita a LLMs, através de fine-tuning na lógica Navya-Nyaya, um framework de raciocínio indiano milenar.

Epistemic Reasoning hallucination large language models Fine-tuning

RESEARCHarXiv CS.AI·4/7/2026

Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models

Este artigo introduz uma nova estrutura de governança baseada em energia para LLMs, que conecta a dinâmica de inferência de transformers a modelos de satisfação de restrições, desafiando métodos atuais de segurança de IA. A pesquisa identifica uma janela de pré-comprometimento de 57 tokens em Phi-3-mini-4k-instruct, demonstrando que tais sinais existem, mas são específicos do modelo, tarefa e configuração, e propõe uma taxonomia de comportamento de inferência.

Transformer Architecture Inference Dynamics energy-based models Pre-commitment Signals

RESEARCHarXiv CS.CL·4/6/2026

Revealing the Learning Dynamics of Long-Context Continual Pre-training

Este artigo investiga sistematicamente as dinâmicas de aprendizado do Pré-treinamento Contínuo de Contexto Longo (LCCP) usando o modelo industrial Hunyuan-A13B, rastreando sua evolução por 200 bilhões de tokens. Ele propõe uma estrutura hierárquica para analisar o LCCP em níveis comportamental, probabilístico e mecanicista, abordando as limitações das metodologias atuais de avaliação e pré-treinamento.

Long-Context Continual Pre-training Model Evaluation Pre-training Dynamics large language models

RESEARCHarXiv CS.CL·4/6/2026

An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages

Este estudo empírico investiga o aprendizado em contexto (ICL) de muitos exemplos para tradução automática de inglês para dez idiomas de baixo recurso. Os achados mostram que o ICL se torna mais eficaz com o aumento do número de exemplos, e a recuperação baseada em BM25 melhora substancialmente a eficiência dos dados.

LLMs Many-Shot Learning NLP machine translation

ARTICLEOpenAI Blog·4/29/2026

Where the goblins came from

This article analyzes how 'goblin outputs' or personality-driven quirks spread in AI models like GPT-5. It details the timeline, root cause, and fixes for these behaviors.

model debugging AI behavior large language models

RESEARCHarXiv CS.AI·4/23/2026

ThermoQA: A Three-Tier Benchmark for Evaluating Thermodynamic Reasoning in Large Language Models

ThermoQA is a new three-tier benchmark of 293 open-ended engineering thermodynamics problems introduced to evaluate thermodynamic reasoning in LLMs. Leading LLMs like Claude Opus 4.6 and GPT-5.4 achieve high scores, but cross-tier degradation confirms that property memorization does not imply thermodynamic reasoning, with the dataset and code being open-source.

Dataset Benchmarking large language models AI evaluation

RESEARCHarXiv CS.CL·28d ago

Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models

This work investigates the use of large language models (LLMs) for smart city tasks, leveraging remote sensing imagery to characterize the built environment across multiple spatial scales. The findings highlight the potential of integrating remote sensing with LLMs to assist smart cities and decision-making.

Built Environment Urban Planning Remote sensing large language models

RESEARCHarXiv CS.CL·28d ago

Effective Explanations Support Planning Under Uncertainty

This research proposes a computational model that uses a large language model and a planning agent to convert explanations into action plans for navigation under partial observability. Experiments confirm that higher-scored explanations significantly improve human navigation and are judged more helpful.

Planning Explanation Generation human-AI interaction AI

RESEARCHarXiv CS.CL·28d ago

Sanity Checks for Long-Form Hallucination Detection

This research paper introduces a controlled-invariance methodology for hallucination detection in large language models. Using oracle tests like extsc{Force} and extsc{Remove}, it investigates whether detection methods evaluate reasoning or merely surface correlates of the final answer.

hallucination detection Chain-of-Thought large language models LLM evaluation

RESEARCHarXiv CS.CL·28d ago

Change My View? The Dynamics of Persuasion and Polarization in Online Discourse

This study uses large language models to analyze debates on Reddit's r/ChangeMyView, where belief revision is publicly signaled. The research reveals that rhetorical strategies such as concession and empathetic alignment significantly increase the prospect of belief change.

social media online discourse rhetoric large language models

NEWSGoogle DeepMind Blog·23d ago

Introducing Gemini Omni

This is the announcement of Gemini Omni, a new iteration within Google's family of AI models.

New Product Google AI Gemini AI

NEWSHugging Face Blog·4/28/2026

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

NVIDIA introduces Nemotron 3 Nano Omni, a new long-context multimodal AI model. It provides intelligence for documents, audio, and video agents.

multimodal AI large language models NVIDIA AI agents

NEWSTwo Minute Papers (YouTube)·5/6/2026

DeepSeek V4 AI Beats Billion Dollar Systems…For Free

DeepSeek V4 AI has reportedly surpassed expensive, established AI systems, and is available at no cost. This development highlights advancements in accessible and high-performing artificial intelligence.

DeepSeek AI models open-source AI large language models

DeepSeek V4 AI Beats Billion Dollar Systems…For Free

RESEARCHarXiv CS.AI·4/7/2026

IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking

This research introduces IC3-Evolve, a novel method for hardware model checking. It leverages proof- and witness-gated offline LLM-driven heuristic evolution to enhance the efficiency of the IC3 algorithm.

Heuristics formal methods large language models model checking

RESEARCHarXiv CS.CL·4/7/2026

Knowledge Packs: Zero-Token Knowledge Delivery via KV Cache Injection

"Knowledge Packs" introduces a zero-token knowledge delivery method for large language models (LLMs) by directly injecting information into the KV cache. This technique aims to enhance LLM performance and reduce inference costs by efficiently integrating external knowledge without consuming context tokens.

Knowledge Injection machine learning AI large language models

RESEARCHHugging Face (YouTube)·4/16/2026

Hugging Face Journal Club: Embarrassingly Simple Self-Distillation Improves Code Generation

This content from the Hugging Face Journal Club discusses an "embarrassingly simple" self-distillation method that significantly improves code generation. It highlights advancements in leveraging large language models for programming tasks.

machine learning code generation Self-Distillation large language models

Hugging Face Journal Club: Embarrassingly Simple Self-Distillation Improves Code Generation

RESEARCHQwen Blog·3/5/2025

QwQ-32B: Embracing the Power of Reinforcement Learning

O conteúdo aborda o potencial do Aprendizado por Reforço (RL) em escala para aprimorar o desempenho e as capacidades de raciocínio de modelos de IA, superando métodos convencionais. A pesquisa explora especificamente o impacto do RL na inteligência de Grandes Modelos de Linguagem (LLMs), citando exemplos como o DeepSeek R1.

model performance deep learning reinforcement learning large language models