LLM

611 items

ARTICLEDEV.to AI·4/18/2026

Opus 4.7 Uses 35% More Tokens Than 4.6. Here's What I'm Doing About It.

Claude Opus 4.7's new tokenizer is causing an effective 35% price increase for the same work due to higher token consumption compared to version 4.6. While reasoning improvements are real for complex tasks, the author plans to use 4.7 selectively and stick with 4.6 for tasks where token efficiency is key.

AI cost Claude Tokenization LLM

ARTICLE↑ trendingReddit r/LocalLLaMA·4/21/2026

llama.cpp is the linux of llm

The content posits that llama.cpp serves a role akin to Linux for Large Language Models, suggesting it's a foundational and open-source platform. It questions whether this analogy accurately describes llama.cpp's significance in the LLM ecosystem.

open-source AI inference LLM

ARTICLEDEV.to AI·4/8/2026

I Built a Personal Second Brain with Markdown Files and Claude Code — Here's How

O conteúdo descreve a criação de um 'Segundo Cérebro Pessoal' utilizando LLMs (Claude Code) para construir uma base de conhecimento em arquivos Markdown. Ele processa diversas fontes como PDFs e transcrições de YouTube, gerando páginas wiki estruturadas e interconectadas no Obsidian.

Obsidian knowledge management AI Personal Second Brain

ARTICLEDEV.to AI·7d ago

I Plugged the Same Site Into 7 AI-Citation Trackers. They Reported 7 Different Numbers.

The author conducted an experiment using seven different AI-citation trackers to determine their website's visibility to AI search. The trackers yielded vastly different citation counts, with the author highlighting the inconsistency and preferring a tool for its transparency rather than its accuracy.

tool comparison citation tracking website analytics AI Search

ARTICLEML Mastery·29d ago

Implementing Prompt Compression to Reduce Agentic Loop Costs

Agentic loops in production can lead to high costs, particularly with LLM and external API usage, where billing is often tied to token consumption. Implementing prompt compression offers an effective strategy to reduce these expenses.

prompt-engineering API usage Cost Optimization AI agents

Implementing Prompt Compression to Reduce Agentic Loop Costs

RESEARCHarXiv CS.CL·4/8/2026

Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation

Este artigo propõe OmniScore, uma família de métricas determinísticas desenvolvidas com modelos pequenos, para avaliar texto gerado de forma mais eficiente e reprodutível do que LLMs-juízes. Ele aproxima o comportamento de LLMs-juízes, preserva baixa latência e consistência, e suporta avaliações multidimensionais em 107 idiomas.

OmniScore métricas de IA multilíngue avaliação de texto

RESEARCHarXiv CS.AI·4/30/2026

Persuadability and LLMs as Legal Decision Tools

This research explores how Large Language Models (LLMs) respond to legal arguments, examining factors that lead them to decide difficult questions. It focuses on their persuadability as potential legal decision-makers, emphasizing the need for decisions based on merit rather than advocacy skills.

ethics Decision-making persuasion legal tech

RESEARCHarXiv CS.AI·5/6/2026

Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy

This paper introduces the Virtual Speech Therapist (VST), an intelligent agent-based platform that streamlines stuttering assessment and delivers customized therapy through automated and adaptive AI-driven workflows. VST integrates deep learning for stuttering classification and multi-agent LLM reasoning to generate and refine individualized therapy plans, with a critic agent ensuring clinical safety and adherence to guidelines.

deep learning AI in healthcare speech therapy stuttering

RESEARCHarXiv CS.AI·5/6/2026

Effect-Transparent Governance for AI Workflow Architectures: Semantic Preservation, Expressive Minimality, and Decidability Boundaries

This research presents a machine-checked formalization of AI workflow architectures with effect-transparent governance, demonstrating that governance can be imposed without losing computational expressivity. It defines a governance operator G for mediating effectful directives like memory access and LLM queries, proving seven key properties including governed Turing completeness and a decidability boundary.

AI architecture Workflow formal methods AI Governance

RESEARCHarXiv CS.AI·5/6/2026

Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents

A new algorithm is presented that learns correct sequential behavior from just 2-10 execution traces to validate new executions in autonomous agents. It combines compiler theory with multimodal LLM-powered semantic understanding to construct a generalized ground truth model, achieving high accuracy in detecting product bugs.

validation learning autonomous agents Algorithms

RESEARCHarXiv CS.LG·5/6/2026

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

This survey provides an optimizer-agnostic view of rollout strategies for RL-based post-training of reasoning LLMs. It formalizes rollout pipelines with a unified notation and introduces the Generate-Filter-Control-Replay (GFCR) lifecycle taxonomy, decomposing pipelines into four modular stages.

Rollout Strategies reinforcement learning machine learning AI research

RESEARCHarXiv CS.AI·5/6/2026

A Knowledge-Driven LLM-Based Decision-Support System for Explainable Defect Analysis and Mitigation Guidance in Laser Powder Bed Fusion

This work introduces a knowledge-driven, LLM-based decision-support system for explainable defect diagnosis and mitigation guidance in manufacturing, using Laser Powder Bed Fusion (LPBF) as a case study. The system integrates an ontological knowledge base of 27 LPBF defect types, supporting natural language queries and literature-backed explanations. It also features a multimodal module for interpreting microscopic defect images.

knowledge-driven AI defect analysis decision support Manufacturing

RESEARCHarXiv CS.CL·4/7/2026

VIGIL: An Extensible System for Real-Time Detection and Mitigation of Cognitive Bias Triggers

VIGIL é uma nova extensão de navegador que detecta e mitiga em tempo real gatilhos de vieses cognitivos em informações online. Desenvolvido para combater os riscos de desinformação da IA generativa, ele oferece reformulações impulsionadas por LLMs e foca na integridade do discurso cívico.

disinformation cognitive bias browser extension Generative AI

RESEARCHarXiv CS.AI·4/8/2026

Uncertainty-Guided Latent Diagnostic Trajectory Learning for Sequential Clinical Diagnosis

Este artigo aborda o desafio do diagnóstico clínico sequencial sob incerteza, onde a maioria dos sistemas baseados em LLMs não modela a aquisição progressiva de evidências. Os autores propõem o framework Latent Diagnostic Trajectory Learning (LDTL), utilizando agentes LLM para planejamento e diagnóstico, tratando sequências de ações como caminhos latentes.

Diagnóstico Clínico Aprendizado Sequencial Trajetória Latente Incerteza

RESEARCHarXiv CS.CL·4/7/2026

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

A pesquisa aborda a queda de diversidade em sistemas de co-evolução de LLMs, onde um modelo gera problemas e outro os resolve, comprometendo o aprendizado de currículo autônomo. Para resolver isso, introduz o 'vocabulary dropout', uma máscara aleatória para manter a diversidade, resultando em melhorias no desempenho de solvers em raciocínio matemático.

mathematical reasoning diversity Co-evolution self-play

RESEARCHarXiv CS.CL·4/7/2026

Cultural Authenticity: Comparing LLM Cultural Representations to Native Human Expectations

Este artigo introduz um framework centrado no ser humano para avaliar o alinhamento das representações culturais de LLMs com as expectativas das populações nativas. Ele estabelece vetores de importância cultural a partir de pesquisas globais e os usa para computar e comparar vetores de representação de modelos como Gemini 2.5 Pro, GPT-4o e Claude 3.5 Haiku.

Representação Cultural Avaliação de IA Estudo Humano Diversidade

RESEARCHarXiv CS.CL·4/8/2026

This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA

Este estudo de pesquisa avalia a sensibilidade de Grandes Modelos de Linguagem (LLMs) à forma como as perguntas de pacientes são formuladas em cenários de QA médica. Usando um ambiente RAG controlado, a pesquisa investiga como o enquadramento (positivo vs. negativo) e o estilo da linguagem afetam a consistência das respostas dos LLMs.

prompt-engineering RAG linguagem natural medical QA

RESEARCHarXiv CS.CL·4/6/2026

Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis

Este estudo aborda os riscos de LLMs no suporte à saúde mental, focando em usuários com psicose, onde podem reforçar delírios e alucinações. Propõe um método escalável de avaliação de segurança usando critérios clínicos e LLMs como avaliadores (LLM-as-a-Judge/Jury), demonstrando alinhamento com o consenso humano.

LLM-as-a-judge psicose Saúde Mental avaliação automatizada

RESEARCHarXiv CS.LG·4/6/2026

LLM Reasoning with Process Rewards for Outcome-Guided Steps

Este conteúdo apresenta o PROGRS, um framework para melhorar o raciocínio matemático em LLMs, combinando modelos de recompensa de processo (PRMs) com a priorização da correção do resultado final. Ele busca resolver o problema de PRMs que podem recompensar raciocínios intermediários fluentes, mas que levam a respostas incorretas, otimizando o aprendizado com feedback mais alinhado.

mathematical reasoning Process Rewards reinforcement learning AI

RESEARCHarXiv CS.CL·4/6/2026

Train Yourself as an LLM: Exploring Effects of AI Literacy on Persuasion via Role-playing LLM Training

Este estudo apresenta o LLMimic, um tutorial gamificado e interativo que permite aos participantes simular o treinamento de um LLM para aumentar a alfabetização em IA. A pesquisa avalia como essa intervenção proativa mitiga a persuasão por IA em cenários realistas, como doações ou recomendações, em comparação com um grupo de controle.

human-computer interaction role-playing gamification AI Training