Tool Use: How Agents Take Action | AgentEngineering.org

Tool use is how an agent turns reasoning into action.

That is the short answer.

If you want the more operational version, use this:

Tool use is the part of an agent system that lets the model request capabilities outside itself, and lets the runtime execute those requests against real data, software, and environments.

Without tools, an agent can still explain, summarize, plan, or simulate.

But it cannot reliably:

look up fresh state
change a record in a system
run a query
send a message
trigger a job
inspect logs
verify whether an action actually worked

That is why tool use is one of the core components of agent engineering.

It is the boundary where the system moves from language generation into controlled interaction with the world.

Why Agents Need Tools

Planning alone does not let a system act.

As Planning and Task Decomposition explains, an agent can break a goal into sensible next steps. But a plan still remains theory until the system can inspect real state or cause a real change.

An agent can produce a perfect plan in text and still be useless if it has no way to inspect live state or carry out the next step.

Suppose an agent decides:

check the status of the failed billing job
inspect logs
look for a recent config change
retry the job if the dependency recovered

That is a reasonable plan.

But the plan only matters if the system can actually:

query the job runner
read the logs
inspect the deployment system
call the retry endpoint

This is the difference between talking about work and doing work.

So tool use is not an optional extra attached to agent systems.

It is how the act step becomes real.

What Counts as a Tool

In agent systems, a tool is any external capability the runtime makes available to the model.

That capability might be:

a function in application code
an API call
a database query
a search operation
a file retrieval step
a code execution environment
a browser or computer control action

The important point is not whether the implementation is called a function, tool, action, command, skill, or capability.

The important point is that the model cannot perform that operation by itself. It has to request it through a defined interface.

That interface tells the model:

what the tool does
when it should be used
what inputs it requires
what output shape it returns
what limits or constraints apply

So tool use is not magic.

It is an interface contract between the model and the application runtime.

What Actually Happens During a Tool Call

Many people imagine tool use as one event:

the model picks a tool

In practice, it is a small protocol.

The usual flow looks like this:

the application provides the model with a set of tools and their schemas
the model decides whether one of those tools is needed
the model emits a tool call with arguments
the application runtime validates and executes the call
the tool result is returned to the model as new context
the model continues reasoning, asks for another tool, or produces a final answer

That matters because the model is not directly reaching into your systems.

Your application is still the thing doing the real work.

That is the trust boundary:

the model can propose an action, but the runtime must decide whether that action is valid, allowed, executable, and complete.

The runtime decides:

whether the call is valid
whether it is allowed
how it is executed
what happens if it fails
how the result is returned

This is why tool use belongs to both model design and systems design.

A Running Example: Resolving a Failed Nightly Job

Suppose a user asks an operations agent:

Figure out why the nightly billing job failed and take the safest next action.

The agent may need tools such as:

get_job_run
read_logs
get_recent_deploys
check_dependency_health
retry_job
open_incident

Now look at what has to go right.

The system does not only need the model to say use retry_job.

It also needs:

the right job identifier
the right environment
confidence that the failure cause is understood well enough
a permission rule saying retry is allowed automatically
a way to tell whether the retry succeeded

If the model chooses the correct tool but forms the wrong arguments, or if the runtime executes the action without checking whether it is allowed, the system still fails.

That is the key lesson.

Tool selection is only the beginning.

The Hard Part Is Not Just Choosing a Tool

When teams first add tool use, they often focus on the visible moment where the model names a function.

That is too narrow.

A large share of real failures happen after the right tool was broadly chosen.

For example:

the tool requires arguments the model does not actually have
the model fills in missing values by guessing
the arguments are syntactically valid but semantically wrong
the tool result is ambiguous and the model treats it as success
the model retries a dangerous action without approval
the system exposes too many tools and the model picks a plausible but unnecessary one

This is why tool reliability is not mainly a model-intelligence question.

It is an interface-quality question.

Good tool use depends on:

clear tool definitions
strong schemas
good defaults in the runtime
bounded permissions
result checking
explicit error handling

The Action Contract

The simplest way to evaluate a tool step is to treat it as a contract.

Use this test before trusting an agent action.

1. What Capability Is Being Requested?

The tool should describe one clear capability.

If the tool definition is vague, overlapping, or does too many things, the model has to guess when to use it.

2. What Exact Inputs Does Execution Require?

The system should know which inputs are required, which are optional, and which values should never be guessed.

If the runtime already knows a value, it should usually inject it itself instead of making the model reconstruct it.

3. What Limits or Approvals Apply Before Execution?

Not every tool call should execute just because the model emitted it.

Some actions need:

schema validation
policy checks
environment checks
rate limits
human approval
dry-run behavior

4. What Signals Prove the Action Worked or Failed?

The system should know how success is measured.

A tool call returning without an exception is not always proof that the intended outcome happened.

That is the Action Contract.

A tool step is much stronger when the capability, inputs, limits, and verification signals are all explicit.

Apply that to retry_job in the nightly-job example:

capability: retry this specific failed job run
inputs: job ID, environment, and retry mode
limits: only allowed in production for this job class after read checks pass, or else escalate
verification: the job run moves from failed to successful, with no downstream dependency still reporting error

If one of those four pieces is unclear, the call is weaker than it looks.

Why Argument Formation Matters So Much

A tool call is not useful if the arguments are weak.

This is one of the easiest places to underestimate the engineering work.

Go back to the nightly-job example.

A poor retry_job interface might ask the model to provide:

job ID
environment
retry mode
root-cause confidence
approval status

all in one call, even though the application may already know some of those values or be able to derive them deterministically.

That design forces the model to synthesize too much.

A stronger system often moves some of that burden out of the model and into code.

For example:

retrieve the job ID and environment from the incident context
determine whether approval is required from policy code
allow the model to choose only the retry mode if that is the actual decision

This is a general rule.

Do not make the model guess inputs that the system already knows or can derive more reliably.

Validation, Permissions, and Bounded Execution

This is where tool use stops being a product demo and becomes engineering.

If the model emits a call, the runtime should still be able to say:

the schema is invalid
the request violates policy
the action exceeds allowed scope
the user must approve this first
the target system is unavailable
the result is inconclusive

That means safe tool use usually needs more than a tool schema.

It needs execution boundaries.

Common boundaries include:

strict input validation
allowlists for tools available in the current step
environment scoping such as read-only versus write access
approval checkpoints for sensitive actions
retry limits and timeout rules
separation between lookup tools and mutation tools

This does not make the system less agentic.

It makes the system governable.

Tool Results Are New Observations, Not Final Truth

A tool result should feed the loop back into sensing and reasoning.

That means the agent should treat the result as new state to interpret.

For example:

a search tool might return conflicting records
a deployment API might report accepted but not completed
a retry action might start successfully but still fail downstream
a database query might return stale data

So a good agent does not think:

I called the tool, therefore the task is done.

It thinks:

I have new evidence. What does this change about the next step?

This is one reason tool use connects directly to The Sense-Think-Act Loop. Tool results become new sensed state. They also connect to memory, because the system may need to preserve what was tried, what changed, and what still remains unresolved.

Tool Use Is Not the Same as a Workflow Step

This matters because both involve actions, but they solve different problems.

A workflow step is usually predefined by the designer. That is the same core distinction behind LLMs, Workflows, and Agents: What Actually Changes?: action capability is not the same thing as control structure.

Tool use is the action capability available to the runtime.

A workflow might say:

check account status
check refund eligibility
ask for approval if needed
issue refund

Inside that workflow, the actual operations may still happen through tools.

In a more agentic system, the runtime may decide:

which tool to use
in what order
whether another lookup is needed first
whether the action should be deferred, escalated, or abandoned

So tool use is part of the action layer.

Workflows and plans decide how that layer is used.

What Strong Tool Use Looks Like

A well-engineered tool layer usually has these properties:

The Tool Set Is Small Enough to Be Legible

Too many overlapping tools make selection worse.

Each Tool Has a Clear Job

Tools should not hide several unrelated capabilities behind one vague name.

Inputs Are Structured Around Real Needs

The schema should match what execution actually requires.

The Runtime Enforces Limits

Policies and approvals should live in code, not only in prompts.

Results Are Returned in a Way the Agent Can Reason About

The agent needs outputs that make success, failure, and ambiguity visible.

Every Action Has a Verification Path

If the system cannot check whether the action worked, it has not really closed the loop.

Why This Matters for the Next Articles

Once tool use is in place, several later topics become easier to understand.

Memory matters because the agent needs to remember prior actions, results, and unresolved state.

ReAct matters because tool use is one of the clearest cases of a think-act-observe loop.

Guardrails matter because the runtime needs bounded execution, typed outputs, and permission control.

Evaluation matters because a final answer can look good even when the trajectory used the wrong tools, passed weak arguments, or failed to verify success.

So tool use is not just another feature on the list.

It is the point where agent design starts touching reality.

FAQ

Can an agent still be useful without tools?

Yes, but only within narrower limits. A tool-less agent can still explain, brainstorm, summarize, or help structure work. What it cannot do reliably is inspect fresh system state, retrieve protected data, or take external actions.

Is tool use just function calling?

Function calling is one common interface mechanism for tool use, but the larger concept is broader. Tool use includes the whole runtime loop around capability exposure, argument formation, execution, result handling, and continued reasoning.

If every action goes through predefined tools, is the system still an agent?

Yes. Agents do not stop being agents because their action space is bounded. In practice, bounded action spaces are often what make agent behavior usable, auditable, and safe.

Why do tool calls fail even when the model picked the right tool?

Because the failure often happens at the next layer: missing arguments, invalid values, ambiguous results, unsafe execution, missing permissions, or weak post-action checks.

Should the model supply every tool argument?

No. If the runtime already knows a value or can derive it deterministically, it should usually do that itself. The model should be responsible mainly for the parts that actually require interpretation or choice.

How many tools should an agent have at once?

There is no universal number, but smaller and clearer tool sets are generally easier for the model to use well. Too many overlapping tools increase confusion and misfires.

Where should permissions and approvals live?

In the runtime layer, not only in the prompt. Prompts can express intent, but policy enforcement has to happen in code and system controls.

Are tool results always trustworthy?

No. A tool result is new evidence, not guaranteed truth. It may be partial, stale, ambiguous, or operationally incomplete, so the agent still needs to reason about what the result means.

How does tool use connect to memory?

Tool use creates state that may need to persist across steps or sessions: what was queried, what changed, what failed, and what still needs follow-up.

How does tool use connect to evaluation?

A strong evaluation does not only ask whether the final answer looked correct. It asks whether the system chose the right tools, formed good arguments, stayed within bounds, and verified the outcome properly.