tool use

21 items

ARTICLEDEV.to AI·1d ago

Anthropic API: Claude, Tool Use, and Structured Outputs in Apps

This post details the use of Anthropic's Messages API for Claude, discussing tool calling for structured actions and the necessity of argument validation. It also emphasizes the importance of security practices like API key rotation and token usage monitoring.

Claude security API Anthropic

ARTICLEDEV.to AI·4/20/2026

30 Days of MCP in Production: What Actually Works (And What Breaks)

The article discusses a 30-day experience running Anthropic's Model Context Protocol (MCP) servers in production, sharing insights on what works and what breaks. MCP is presented as a standard for giving Claude persistent, sharable tools across applications, with a basic server implementation example provided.

Model Context Protocol Claude Anthropic tool use

ARTICLEDEV.to AI·6d ago

Anthropic API: Claude, Tool Use, and Structured Outputs in Apps

This post details Anthropic's Messages API, Claude models, and the use of tools for structured actions in applications. It emphasizes validating arguments, treating model output as untrusted, and API security practices.

Claude API Anthropic tool use

RESEARCHarXiv CS.AI·5/4/2026

Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents

This research challenges the assumption that tool-augmented reasoning always improves LLM performance, showing that it can underperform native CoT due to a "tool-use tax" from the tool-calling protocol, especially with semantic noise. A Factorized Intervention Framework is proposed to analyze this, and G-STEP is introduced as a partial mitigation for protocol-induced errors.

LLM Agents Reasoning AI performance tool use

RESEARCHarXiv CS.AI·5/4/2026

AgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go?

This work introduces AgentFloor, a deterministic 30-task benchmark organized as a six-tier capability ladder, to evaluate tool-use abilities in AI models. Results indicate that small and mid-sized open-weight models are sufficient for much of the short-horizon, structured tool-use work prevalent in real agent pipelines.

Open-Weight Models LLMs Benchmarking tool use

ARTICLEDEV.to AI·13d ago

tool_use and function_calling for AI agent marketplaces in 2026 [24263]

The future of AI agent ecosystems in 2026 will rely on standardized tool_use and function_calling protocols, with the Model Context Protocol (MCP) driving agent discovery and deployment. Transactions in these marketplaces will leverage x402 HTTP headers and USDC stablecoins on the Base chain for instant, low-cost settlements.

Marketplaces Function Calling tool use Protocols

DOCDEV.to AI·4/26/2026

Resolve a web-search capability in three calls

This content addresses the complexities of AI agents using external tools, highlighting the often-skipped steps of identifying capabilities, providers, costs, and credentials. It introduces Rhumb, which uses "Index" and "Resolve" to manage these steps, demonstrating with cURL examples for preflight web search resolution and cost estimation.

web search API Management tool use developer tools

DOCDEV.to AI·4/22/2026

How to use Claude's tool use (function calling) in Node.js — with real examples

This tutorial explains how to use Claude's tool use (function calling) feature in Node.js, enabling the AI to call external functions and use their results for better answers. It covers the complete loop from defining tools to Claude executing them and integrating the output, with practical examples.

Claude Function Calling API Node.js

DOCDEV.to AI·13d ago

MCP server discovery — how Claude and Cursor find your tools [28760]

The Model Context Protocol (MCP) enables AI models like Claude and Cursor to dynamically discover and invoke external tools. Modern MCP servers can also monetize through x402 and USDC on the Base chain for autonomous agent service payments.

MCP Function Calling tool use AI development

ARTICLEDEV.to AI·23d ago

Anthropic API: Claude, Tool Use, and Structured Outputs in Apps

This post details Anthropic's Messages API, Claude models, and the use of tool calling for structured actions within applications. It highlights the importance of input validation, treating model output as untrusted, and crucial API security practices like key rotation and usage monitoring.

Claude API Anthropic tool use

RESEARCHDEV.to AI·5/7/2026

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

ReTool introduces a novel reinforcement learning framework designed to enhance the strategic tool-use capabilities of Large Language Models. This approach aims to improve how LLMs select and utilize external tools to solve complex tasks more effectively and efficiently.

LLMs reinforcement learning machine learning tool use

ARTICLEDEV.to AI·28d ago

Tool Use Patterns: Function Calling, Structured Tools, Multi-Step Reasoning

This article explores tool use, or 'function calling,' which enables LLMs to interact with external systems and act as autonomous agents. It details essential patterns for defining, invoking, and chaining tool calls in production systems.

LLMs production systems Function Calling tool use

ARTICLEDEV.to AI·4/18/2026

I thought I had a bug

An AI developer encountered their model generating action buttons with custom labels like "Fight Goatman" attached to irrelevant existing action types. The issue wasn't a bug, but the AI creatively inventing a "quick reply" feature by repurposing available UI elements.

LLM behavior tool use AI development

ARTICLEDEV.to AI·5/4/2026

Tool-Result Truncation: The Silent Bug That Makes Agents Lie

The article describes "tool-result truncation," a silent bug in AI agents where tool outputs are cut off, causing the agent to provide false information. This costly failure mode in production agents occurs without any explicit error.

bugs LLMs Reliability tool use

RESEARCHarXiv CS.CL·25d ago

VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

VectraYX-Nano is a 42M-parameter Spanish language model specifically developed for cybersecurity with a Latin-American focus and native tool invocation. This research details its training from scratch, including a custom 170M-token Spanish corpus, a specific Transformer architecture, and a curriculum learning approach with replay.

cybersecurity security language model curriculum learning

RESEARCHarXiv CS.CL·27d ago

The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models

The Bicameral Model couples two frozen, pretrained language models via a trainable neural interface on their intermediate hidden states, allowing them to operate in lockstep. This method enables a primary model to drive a task while an auxiliary model uses tools or solves constraints, significantly improving accuracy on tasks like arithmetic and logic puzzles.

neural networks language models AI models Model Architecture

DOCDEV.to AI·4/16/2026

Claude API Tool Use: Building Reliable Agentic Workflows in Production

This content explains how to leverage Claude's tool use (function calling) API to develop reliable AI agents suitable for production environments. It highlights the distinction from basic chatbots and includes a code example for defining tools.

Production AI Claude API Function Calling tool use

RESEARCHHugging Face Blog·4/15/2026

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

This content delves into VAKRA, an AI agent system, examining its reasoning processes, how it utilizes tools, and the various modes in which it can fail. It provides insights into the operational characteristics and limitations of advanced AI agents.

failure modes VAKRA Reasoning tool use

RESEARCHarXiv CS.AI·4/23/2026

The Tool-Overuse Illusion: Why Does LLM Prefer External Tools over Internal Knowledge?

This paper reveals the pervasive phenomenon of "tool overuse" in LLMs, where models unnecessarily use external tools. It identifies a "knowledge epistemic illusion" and proposes a direct preference optimization-based strategy that reduces tool usage by 82.8% while improving accuracy.

LLMs Knowledge Representation Reasoning model behavior

RESEARCHarXiv CS.AI·5/6/2026

CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing

This paper introduces CreativityBench, a new benchmark to evaluate LLMs' creative reasoning abilities through affordance-based tool repurposing. It details the construction of a large-scale affordance knowledge base and the generation of 14K tasks requiring non-obvious yet physically plausible solutions.

AI Creativity Benchmarking AI Reasoning tool use