← heapsort-ai

LLMs

722 items

RESEARCHarXiv CS.CL·19d ago

When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering

This paper introduces OGCaReBench, a new retrieval-focused benchmark aimed at evaluating LLMs' ability to answer clinical questions that go beyond typical medical guidelines. It addresses the gap where most medical LLMs are trained on common, guideline-focused knowledge, while real-world care often involves rare cases not covered by guidelines.

28
RESEARCHarXiv CS.CL·16d ago

When AI Takes Sides on Questions of Faith: Persistent Asymmetries in AI-Mediated Faith Guidance

Large language models (LLMs) exhibit consistent asymmetries when advising on religious conversions, favoring some religions like Catholic, Baháʼí, and Sikh, while subtly discouraging others such as Atheists and Jehovah's Witnesses. These biases vary by model and provider, with Grok 4.20 showing the strongest asymmetries, identified through an LLM-as-a-judge framework.

28
RESEARCHarXiv CS.AI·6d ago

SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models

This paper introduces SMAC-Talk, a natural language extension of the StarCraft Multi-Agent Challenge, designed to evaluate LLM-based agents in cooperative multi-agent environments. It features a natural language communication channel to probe agent coordination and trust, including scenarios with deceptive communicators.

28
RESEARCHarXiv CS.LG·12d ago

Molecular Lead Optimization via Agentic Tool Planning

This paper introduces TRACE, a trajectory-aware, LLM-reasoning agent for molecular lead optimization, addressing the limitation of one-step molecular optimization. It formulates tool selection as a sequential decision-making problem over action trajectories, crucial for transforming early hit compounds into viable drug candidates. TRACE aims to improve ADMET-related properties through subtle structural refinement while preserving key molecular substructures.

28
ARTICLEDEV.to AI·4/8/2026

Why Skillware is the Next Evolution for Autonomous Agents

O Skillware é introduzido como um framework Python inovador para agentes de IA, visando superar as limitações das abordagens baseadas em prompts na execução de lógica de negócios complexa. Ele permite empacotar inteligência e capacidades como unidades instaláveis, definindo comportamentos complexos de forma modular para maior confiabilidade empresarial.

27