ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling
When language models use test-time sampling and majority vote, reasoning trajectories concentrate into non-independent
When language models use test-time sampling and majority vote, reasoning trajectories concentrate into non-independent
This research addresses the Stability-Expressivity Gap in Spoken Language Models (SLMs) for low-resource languages, caused by the extensive use of synthetic data. While synthetic data improves phonetic accuracy, it degrades prosodic expressivity, a phenomenon termed Synthetic Erosion. The paper introduces self-alignment frameworks to recover expressivity.
This research focuses on developing more efficient methods for sampling from Diffusion Probabilistic Models, aiming to reduce the computational cost and time associated with generating high-quality samples. It explores novel algorithms to accelerate the sampling process while maintaining the fidelity of the generated data.
This content delves into Andrej Karpathy's
Tian AI features a self-evolution engine that analyzes and modifies its own Python code based on operational experience. This innovative system aims to achieve the "holy grail" of AI research by enabling AI to continuously improve itself.
This work explores methods for neural models to learn cause-and-effect relationships, even in scenarios where data-generating interventions are unknown. The research aims to enhance artificial intelligence's ability to infer causality from complex data.
Este estudo argumenta, com base na Desigualdade de Processamento de Dados, que LLMs de agente único são mais eficientes em termos de informação do que sistemas multiagente sob orçamentos de token de raciocínio iguais. A pesquisa testa empiricamente esta previsão, que sugere que sistemas multiagente se tornam competitivos quando a utilização de contexto de um único agente é degradada ou mais poder computacional é despendido.
This survey provides an optimizer-agnostic view of rollout strategies for RL-based post-training of reasoning LLMs. It formalizes rollout pipelines with a unified notation and introduces the Generate-Filter-Control-Replay (GFCR) lifecycle taxonomy, decomposing pipelines into four modular stages.
NVIDIA engineers and researchers leverage Codex with GPT-5.5 to build production systems. They also use these tools to turn research ideas into runnable experiments.
Este artigo explora o uso de busca evolucionária impulsionada por LLMs para desenvolver automaticamente métodos de Quantificação de Incerteza (UQ) não supervisionados. Os métodos evoluídos superam baselines manuais em verificação de alegações, demonstrando generalização robusta e estratégias distintas entre diferentes modelos de LLM.
This work introduces GELATO, a novel approach to multimodal embedding models that extends VLM-style architectures. It results in the jina-embeddings-v5-omni suite, which efficiently encodes text, image, audio, and video into a single semantic embedding space by freezing backbone text models and training only connecting components.
This paper introduces OSCToM, an approach for modeling nested belief conflicts in LLM-based Theory of Mind tasks. It combines reinforcement learning and compositional surrogate models to generate these conflicts, with OSCToM-8B showing the best results in experiments.
This paper introduces novel approaches for creating high-quality embeddings for logical statements, crucial for training neural networks to efficiently rank choices made by logical reasoners. These methods involve generating anchors with repeated terms, balancing easy, medium, and hard examples for triplet loss training, and periodically emphasizing the hardest examples.
This research proposes a modular framework to address scalable uncertainty reasoning in Knowledge Graphs, where real-world data often inherently contains uncertainty. It tackles three levels of uncertainty—imprecise attributes, probabilistic triple existence, and incomplete schema knowledge—through tailored techniques like probabilistic literals, probabilistic circuits, and geometric embeddings.
AgentCo-op is a retrieval-based synthesis framework that composes interoperable multi-agent workflows from reusable skills, tools, and external agents. It applies bounded self-guided local repair to components upon execution failure and has been demonstrated in genomics case studies to coordinate specialized agents for collaborative discovery.
This content explores the evolution of AI methodologies, discussing the decline of traditional scaling approaches and the emergence of new strategies, exemplified by the birth of Adaption Labs. Presented by Sara Hooker, the HF ML Club India episode delves into significant shifts within the field of artificial intelligence.

LangChain Labs is a new applied research effort focused on continual learning for agents. It aims, with partners, to advance open research on self-improving AI systems.

This article describes findings from 500 AI agent memory experiments, indicating that the primary challenge is not recall but rather the binding problem. The research suggests that improving how AI agents connect disparate pieces of information is crucial for advancing their cognitive abilities.
This content explores the concept of multi-agent autoresearch, detailing how multiple AI agents can collaborate to conduct research tasks. It specifically focuses on leveraging open source models to facilitate and enhance these automated research processes.

This content from the Hugging Face Journal Club discusses an "embarrassingly simple" self-distillation method that significantly improves code generation. It highlights advancements in leveraging large language models for programming tasks.
