ReAct and the Basic Reasoning Loop | AgentEngineering.org

ReAct is a way of running an agent loop where the system reasons about the next move, takes an action, looks at what happened, and then reasons again.

That is the short answer.

If you want the more practical version, use this:

ReAct is a repeated thought-action-observation cycle for tasks where the next step depends on what the previous step discovered.

That is why it matters.

A broad loop like The Sense-Think-Act Loop tells you that agents observe, reason, and act. ReAct is more specific. It gives the agent a concrete pattern for doing that work over multiple turns.

This article builds on Goals, Constraints, and Success Conditions, Planning and Task Decomposition, and Tool Use: How Agents Take Action. Those articles explain what the agent is trying to achieve, how it breaks work down, and how it acts. This one explains the first practical reasoning pattern that ties those pieces together during execution.

ReAct Is More Specific Than the General Agent Loop

One source of confusion is that people use ReAct as if it means any agent that loops.

That is too broad.

The general agent loop says:

inspect state
think about the task
take an action
use feedback

That is an architecture pattern.

ReAct is a more specific execution pattern inside that architecture.

It says the agent should:

state or form the next local reasoning step
take one concrete action or tool step
inspect the observation that comes back
use that observation to decide the next move

So the important difference is not that ReAct loops.

Many systems loop.

The important difference is that ReAct makes the next step depend explicitly on the evidence returned from the previous one.

That is why ReAct sits between the very broad control loop and more advanced patterns like planner-first architectures, reflection systems, or search-based reasoning.

The Micro-Hypothesis Loop

The simplest way to make ReAct legible is to treat each turn as a tiny test.

Call this The Micro-Hypothesis Loop.

It has three parts.

1. Hypothesis

The agent forms the next local theory about what would move the task forward.

That might sound like:

the billing job probably failed because a dependency timed out
the answer is probably in the account history
I likely need one more search before I can finalize the recommendation

This is the thought part of ReAct.

The purpose is not to create a perfect master plan.

The purpose is to make the next move explicit enough to test.

2. Probe

The agent uses a tool or action to test that local theory.

That might mean:

query logs
call a search tool
inspect a database record
ask a follow-up question
run a validation step

This is the action part.

The important thing is that the action is chosen because it is expected to reduce uncertainty or move the task forward.

3. Evidence

The tool or environment returns an observation.

The agent then has to decide:

did the result support the hypothesis?
did it weaken the hypothesis?
did it expose a different problem?
is the task complete now?

This is the observation part.

That observation becomes the input to the next loop.

So ReAct is not just think, then act.

It is:

form a local hypothesis, run a probe, absorb the evidence, and choose the next move from there.

That is the real job of the loop.

A Running Example: Investigating a Failed Billing Job

Suppose an operations agent is given this goal:

Figure out why the nightly billing job failed and take the safest next step.

That goal already assumes the boundaries described in Goals, Constraints, and Success Conditions: the system should act toward recovery, stay inside its permissions, and stop when the evidence is good enough.

A ReAct-style run might look like this.

Turn 1

Hypothesis: the job may have failed because the payment provider timed out
Probe: read the job logs and dependency health
Evidence: logs show repeated timeout errors from the provider API

Turn 2

Hypothesis: this may be a transient upstream outage rather than a local code issue
Probe: check the provider status page and recent internal deploys
Evidence: provider status degraded, no recent internal deploy affecting billing

Turn 3

Hypothesis: one retry may be safe because the likely cause is transient and the system is inside retry policy
Probe: retry the job once
Evidence: retry succeeds, backlog clears

Turn 4

Hypothesis: the run can close if the recovery is confirmed and logged
Probe: verify downstream records and write the incident summary
Evidence: records are healthy, summary logged

This is why ReAct is useful.

The agent did not know the exact path in advance.

Each move depended on what the previous move discovered.

That is the natural shape of exploratory or uncertain work.

ReAct vs Chain-of-Thought

Another common confusion is to treat ReAct as if it were just chain-of-thought with a fancier name.

That is not quite right.

Chain-of-thought means the model reasons step by step in text.

ReAct keeps that explicit intermediate reasoning, but it adds something important:

an external action
an observation from the environment
a new reasoning step updated by that observation

So chain-of-thought stays inside the model’s internal text process.

ReAct pushes that process out into the world and lets the world push back.

That is why ReAct is often more grounded on dynamic tasks. The model does not only continue its own reasoning trace. It has to react to evidence returned by tools, searches, APIs, or users.

ReAct vs One-Shot Tool Calling

This is the comparison many readers need most.

A one-shot function-calling system does something like this:

read the user request
choose a tool
emit the arguments
return the tool result or continue once

That can be very useful. It is often the right design.

But it is not automatically ReAct.

ReAct keeps deciding what to do after each result.

That means the difference is not merely uses tools versus does not use tools.

The difference is whether the system is running a repeated reasoning loop around the tool results.

The contrast is easier in table form.

Dimension	One-shot tool calling	ReAct
Main pattern	Request a tool and execute it	Repeated thought-action-observation loop
Typical horizon	One step or a small fixed sequence	Multi-step and adaptive
Role of observations	Often just the result of a call	Feeds the next reasoning step
Best for	Clear, well-scoped actions	Dynamic tasks where the next step is unclear upfront
Main cost	Tool interface complexity	Repeated model calls, latency, and context growth

This is why Tool Use: How Agents Take Action is necessary but not sufficient context.

Tool use explains how the agent can act.

ReAct explains how it decides what to do next after acting.

Where ReAct Helps

ReAct is strongest when the environment pushes back.

In other words, it helps when the task cannot be solved from one static prompt and one obvious tool call.

Good cases include:

live investigation
research with changing evidence
support tasks that need follow-up questions
debugging tasks where each observation changes the next move
document or database exploration where the useful path is not known upfront

The common property is not just multi-step.

The common property is:

the next step depends on what the last step revealed.

That is when the micro-hypothesis loop pays for itself.

It is also why ReAct often feels like the first real reasoning pattern in agent engineering. It gives the agent a way to navigate uncertainty without requiring a full upfront plan.

Where ReAct Breaks

ReAct is not free.

It buys adaptability by paying in tokens, latency, and control complexity.

The most common failure modes are these.

Infinite or Useless Loops

The agent can keep retrying, re-searching, or rechecking without making meaningful progress.

If the system does not have good stop conditions, it can spin forever.

Context Growth

Every new thought, action, and observation adds more context to the run.

Over longer trajectories, that can create context rot, higher token costs, and weaker reasoning.

This is one reason Short-Term Context, Retrieval, and Long-Term Memory becomes important after ReAct. The loop only stays coherent if the runtime manages what gets carried forward.

Cascading Bad Assumptions

If the agent forms the wrong local hypothesis early, later turns can inherit that mistake.

Now the loop is not adapting intelligently. It is digging deeper into the wrong branch.

Premature Stop Conditions

The agent may stop because it found a plausible answer, not because it found enough evidence.

That is why clear success conditions and verification matter so much.

Latency and Cost

ReAct often makes many serial calls.

That means it can be much slower and more expensive than a fixed workflow or a planner-first architecture for predictable tasks.

So the practical lesson is simple:

ReAct is powerful for uncertain paths, but wasteful for obvious ones.

ReAct vs Planner-First Patterns

ReAct is not the only way to structure multi-step reasoning.

Some systems try to reduce repeated model calls by planning more of the task upfront, then executing that plan with less back-and-forth.

That is the basic attraction of planner-first patterns.

The tradeoff looks like this:

ReAct adapts more easily in the middle of the run
planner-first patterns can be cheaper and faster when the task structure is more predictable

So if the work is highly dynamic, ReAct usually has the better shape.

If the work is long, expensive, or mostly predictable, a more explicit planning architecture may be better.

That does not make ReAct obsolete.

It just means ReAct should be treated as one architectural choice, not as the default answer to every agent problem.

What ReAct Points To Next

ReAct is an important threshold concept because it is where a reader first sees the difference between:

an agent that can act once
and an agent that can keep updating its next move from feedback

Once that idea is clear, the next questions become more interesting.

For example:

how should the runtime manage context across long ReAct trajectories?
how do you evaluate whether each turn was good, not just whether the final answer looked fine?
when should the agent reflect on its own failures instead of just looping again?
when should the loop be replaced or augmented by stronger planning patterns?

Those questions lead naturally into memory, evaluation, reflection, and orchestration design.

FAQ

Is ReAct the same thing as the sense-think-act loop?

No. The sense-think-act loop is the broader control model. ReAct is a more specific pattern for repeatedly running thought, action, and observation in a grounded way.

Is ReAct the same thing as chain-of-thought?

No. Chain-of-thought is reasoning in text. ReAct combines reasoning with external actions and observations, so the next reasoning step can be updated by real evidence.

Is ReAct the same thing as function calling?

No. Function calling is a capability for structured tool requests. ReAct is a multi-step reasoning pattern that may use function calling inside a longer loop.

Do all agents need ReAct?

No. Many tasks are better handled by a fixed workflow or a simple tool call. ReAct is useful when the next step depends on what the previous step discovered.

Why does ReAct get expensive?

Because every loop usually means another model call, more accumulated context, and more serial latency.

What kinds of tasks are good for ReAct?

Tasks with uncertain paths, live feedback, exploratory search, debugging, investigation, and multi-step evidence gathering.

What kinds of tasks are bad for ReAct?

Predictable tasks with known paths, strict latency requirements, simple one-step actions, or work that can be handled more safely by a deterministic workflow.

How do you stop a ReAct agent from looping forever?

Use explicit success conditions, iteration limits, time or cost budgets, escalation rules, and verification checks that distinguish progress from repetition.

Why can ReAct fail even when the agent has the right tools?

Because the problem may be in the reasoning loop rather than the tool interface. The agent may form the wrong hypothesis, misread the observation, or stop too early.

What comes after ReAct in agent design?

Usually better context management, stronger memory design, evaluation of trajectories instead of only outputs, reflection loops, and more structured planning or orchestration patterns.