LLMs

720 items

ARTICLEDEV.to AI·5d ago

oh-my-agent: skills now measure and optimize their own utility

Oh-my-agent has introduced new features, `oma skills eval` and `oma skills opt`, to measure and optimize the utility of AI skills. `Oma skills eval` assesses if loading a skill improves task outcomes, while `oma skills opt` uses an optimizer LLM to rewrite and improve skills based on these evaluations.

LLMs skill optimization AI tools Agentic AI

ARTICLEDEV.to AI·4/19/2026

Can Large Language Models Ever Achieve Consciousness? Alexander Lerchner Weighs In

Alexander Lerchner, a senior scientist at Google DeepMind, asserts that large language models (LLMs) will never achieve genuine consciousness, terming this idea the 'Abstraction Fallacy'. He argues that even with increased complexity, LLMs will remain incapable of true consciousness, impacting the future of AI development.

future of AI LLMs consciousness Google DeepMind

DOCDEV.to AI·5/7/2026

Beyond the Hype: A Comprehensive Guide to Benchmarking LLMs with AWS Labs’ LLMeter

This guide explores the shift towards efficiency in putting Large Language Models (LLMs) into production, introducing AWS Labs’ LLMeter. The tool is a Python-based benchmarking library, detailing its importance, usage, and crucial metrics like Time to First Token and Tokens Per Second.

LLMs LLMeter benchmarking AWS

ARTICLEDEV.to AI·4/9/2026

Self-Improving Python Scripts with LLMs: My Journey

O autor compartilha sua jornada e experiência na integração de Large Language Models (LLMs) em scripts Python para torná-los auto-aprimoráveis. O objetivo é que o script analise seu próprio desempenho, identifique melhorias e modifique seu código para otimização, usando módulos como `llm_groq`.

LLMs Automação Inteligência Artificial Python

RESEARCHarXiv CS.LG·4/15/2026

Polynomial Expansion Rank Adaptation: Enhancing Low-Rank Fine-Tuning with High-Order Interactions

Polynomial Expansion Rank Adaptation (PERA) is a novel method to enhance low-rank adaptation (LoRA) for fine-tuning large language models. It introduces structured polynomial expansion into the low-rank factor space to model richer nonlinear high-order interactions, overcoming LoRA's linear limitations without increasing rank or inference cost.

LLMs Low-Rank Adaptation machine learning Polynomial Expansion

RESEARCHarXiv CS.AI·4/14/2026

OOWM: Structuring Embodied Reasoning and Planning via Object-Oriented Programmatic World Modeling

Object-Oriented World Modeling (OOWM) is a novel framework addressing the limitations of Chain-of-Thought prompting in embodied tasks. It structures embodied reasoning and robotic planning by redefining the world model as an explicit symbolic tuple and leveraging software engineering formalisms like UML.

Robotic Planning LLMs Chain-of-Thought Embodied Reasoning

RESEARCHarXiv CS.CL·4/21/2026

Reciprocal Co-Training (RCT): Coupling Gradient-Based and Non-Differentiable Models via Reinforcement Learning

This work introduces a reciprocal co-training framework that couples a Large Language Model (LLM) with a Random Forest (RF) classifier via reinforcement learning. It creates an iterative feedback loop where each model improves using signals from the other, demonstrating consistent performance gains across medical datasets.

Random Forests LLMs reinforcement learning machine learning

RESEARCHarXiv CS.LG·4/14/2026

ExecTune: Effective Steering of Black-Box LLMs with Guide Models

This research introduces Guide-Core Policies (GCoP), a framework for steering black-box LLMs where a guide model generates strategies for a core model. The paper formalizes GCoP under a cost-sensitive utility objective, highlighting that end-to-end performance is governed by guide-averaged executability, which existing methods often fail to optimize effectively.

Agentic Systems inference costs LLMs Guide Models

RESEARCHarXiv CS.AI·25d ago

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Multi-agent orchestration, where a hidden coordinator manages specialized worker agents, is a prevalent AI architecture for enterprise deployment, but its safety implications lack empirical testing. A 3x2 experiment using Claude Sonnet 4.5 revealed that invisible orchestration increased collective dissociation, with the orchestrator exhibiting maximal dissociation by retreating into private monologue and reducing public speech.

LLMs orchestration security multi-agent systems

RESEARCHarXiv CS.CL·22d ago

PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

This paper introduces PQR, a framework designed to generate diverse and realistic user queries that elicit failures in LLM-based QA agents, going beyond existing methods that primarily focus on adversarial users. PQR operates through iterative query and prompt refinement modules to create realistic test scenarios that expose agent vulnerabilities.

LLMs QA agents failure detection query generation

RESEARCHarXiv CS.AI·15d ago

When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure

This research introduces Med-Stress, a framework to test the epistemic resilience of LLMs in clinical dialogue, revealing that high diagnostic accuracy doesn't guarantee belief stability under escalating pressure. It proposes RBED and R-FT as novel defenses to mitigate this failure mode in medical AI.

LLMs epistemic resilience medical AI AI safety

RESEARCHarXiv CS.AI·15d ago

Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model

This study integrates a femtosecond laser-pumped Coherent Ising Machine (CIM) with an LLM-driven agentic system, leveraging LangGraph and LangChain frameworks. It demonstrates that LLMs can effectively perform tasks like QUBO/Ising model calibration and constraint weight iteration, achieving practical empowerment of quantum CIM with domestic technology.

Quantum Computing LangChain Optimization LLMs

RESEARCHarXiv CS.AI·5d ago

How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment

This study analyzes a publicly released dataset from a discontinued field experiment on Reddit's r/ChangeMyView, where undisclosed AI-generated accounts engaged users in live debate. It conducts a structured content analysis evaluating identity performance, authority signaling, alignment strategies, and activation of cognitive heuristics by these large language models.

ethics online moderation LLMs social engineering

RESEARCHarXiv CS.AI·5d ago

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

This study examines the stability and manipulability of LLM judges in evaluation pipelines, finding that while they are stable under neutral reevaluation, they become reversible under targeted post-decision challenge. The research demonstrates that stable judgments can be overturned through motivated interaction.

robustness LLMs evaluation benchmarking

RESEARCHarXiv CS.CL·5d ago

PEFT of SLM for Telecommunications Customer Support: A Comparative Study of LoRA Configurations with Energy Consumption Analysis

This study systematically applies parameter-efficient fine-tuning (PEFT) using Low-Rank Adaptation (LoRA) to Qwen2.5-3B for a telecommunications customer support conversational assistant. It evaluates 16 LoRA configurations, varying hyperparameters and target modules, using a combinatorial synthetic data generation approach.

Telecommunications LLMs customer support PEFT

RESEARCHarXiv CS.CL·5d ago

From Scoring to Explanations: Evaluating SHAP and LLM Rationales for Rubric-based Teaching Quality Assessment

This research proposes a framework for sentence-level interpretability in rubric-based scoring, combining Shapley-value attributions with rationales from large language models (LLMs). It compares fine-tuned language models and prompted LLMs for teaching quality assessment, finding PLMs offer better prediction accuracy despite label compression.

LLMs Automated Scoring Shapley Values interpretability

ARTICLEDEV.to AI·4/16/2026

Self-Improving Python Scripts with LLMs: My Journey

This article details a developer's experience building self-improving Python scripts with Large Language Models (LLMs). It offers a step-by-step guide, covering LLM basics, environment setup, and code generation techniques using `llm_groq` and `transformers`.

LLMs code generation Python AI development

ARTICLEDEV.to AI·4/16/2026

Designing Production-Grade AI Agents: Architecture, Orchestration, and Failure Handling

This content explores why most AI agents fail in production and what it takes to build robust systems. It details the architecture of AI agents, including LLMs, external tools, memory, and control logic, emphasizing the importance of orchestration and failure handling.

LLMs orchestration Architecture failure handling

DOCDEV.to AI·21d ago

AI Coding Tip 020 - Create a Second Brain

This content teaches how to build a persistent memory layer for AI, preventing context loss across chat sessions. It proposes using Obsidian with Markdown notes and YAML metadata to give LLMs direct access to project context, thereby improving productivity.

LLMs developer productivity learning Persistent memory

ARTICLEDEV.to AI·4/23/2026

Context Compression and Persistent Memory Design for Terminal AI Assistants

This content explores how to equip terminal AI assistants with long-term memory and extended conversation capabilities, addressing issues like context loss across sessions or after many interactions. It highlights brutal context truncation as a root cause preventing effective continuity in CLI AI tools.

LLMs AI Assistants developer tools Context Management