Article

ReAct and the Basic Reasoning Loop

ReAct is a reasoning pattern where an agent thinks about the next move, takes an action, inspects the observation, and repeats. It is useful when the next step depends on what the last step discovered.

ReAct is a way of running an agent loop where the system reasons about the next move, takes an action, looks at what happened, and then reasons again.

That is the short answer.

If you want the more practical version, use this:

ReAct is a repeated thought-action-observation cycle for tasks where the next step depends on what the previous step discovered.

That is why it matters.

A broad loop like The Sense-Think-Act Loop tells you that agents observe, reason, and act. ReAct is more specific. It gives the agent a concrete pattern for doing that work over multiple turns.

This article builds on Goals, Constraints, and Success Conditions, Planning and Task Decomposition, and Tool Use: How Agents Take Action. Those articles explain what the agent is trying to achieve, how it breaks work down, and how it acts. This one explains the first practical reasoning pattern that ties those pieces together during execution.

ReAct Is More Specific Than the General Agent Loop

One source of confusion is that people use ReAct as if it means any agent that loops.

That is too broad.

The general agent loop says:

That is an architecture pattern.

ReAct is a more specific execution pattern inside that architecture.

It says the agent should:

  1. state or form the next local reasoning step
  2. take one concrete action or tool step
  3. inspect the observation that comes back
  4. use that observation to decide the next move

So the important difference is not that ReAct loops.

Many systems loop.

The important difference is that ReAct makes the next step depend explicitly on the evidence returned from the previous one.

That is why ReAct sits between the very broad control loop and more advanced patterns like planner-first architectures, reflection systems, or search-based reasoning.

The Micro-Hypothesis Loop

The simplest way to make ReAct legible is to treat each turn as a tiny test.

Call this The Micro-Hypothesis Loop.

It has three parts.

1. Hypothesis

The agent forms the next local theory about what would move the task forward.

That might sound like:

This is the thought part of ReAct.

The purpose is not to create a perfect master plan.

The purpose is to make the next move explicit enough to test.

2. Probe

The agent uses a tool or action to test that local theory.

That might mean:

This is the action part.

The important thing is that the action is chosen because it is expected to reduce uncertainty or move the task forward.

3. Evidence

The tool or environment returns an observation.

The agent then has to decide:

This is the observation part.

That observation becomes the input to the next loop.

So ReAct is not just think, then act.

It is:

form a local hypothesis, run a probe, absorb the evidence, and choose the next move from there.

That is the real job of the loop.

A Running Example: Investigating a Failed Billing Job

Suppose an operations agent is given this goal:

Figure out why the nightly billing job failed and take the safest next step.

That goal already assumes the boundaries described in Goals, Constraints, and Success Conditions: the system should act toward recovery, stay inside its permissions, and stop when the evidence is good enough.

A ReAct-style run might look like this.

Turn 1

Turn 2

Turn 3

Turn 4

This is why ReAct is useful.

The agent did not know the exact path in advance.

Each move depended on what the previous move discovered.

That is the natural shape of exploratory or uncertain work.

ReAct vs Chain-of-Thought

Another common confusion is to treat ReAct as if it were just chain-of-thought with a fancier name.

That is not quite right.

Chain-of-thought means the model reasons step by step in text.

ReAct keeps that explicit intermediate reasoning, but it adds something important:

So chain-of-thought stays inside the model’s internal text process.

ReAct pushes that process out into the world and lets the world push back.

That is why ReAct is often more grounded on dynamic tasks. The model does not only continue its own reasoning trace. It has to react to evidence returned by tools, searches, APIs, or users.

ReAct vs One-Shot Tool Calling

This is the comparison many readers need most.

A one-shot function-calling system does something like this:

  1. read the user request
  2. choose a tool
  3. emit the arguments
  4. return the tool result or continue once

That can be very useful. It is often the right design.

But it is not automatically ReAct.

ReAct keeps deciding what to do after each result.

That means the difference is not merely uses tools versus does not use tools.

The difference is whether the system is running a repeated reasoning loop around the tool results.

The contrast is easier in table form.

DimensionOne-shot tool callingReAct
Main patternRequest a tool and execute itRepeated thought-action-observation loop
Typical horizonOne step or a small fixed sequenceMulti-step and adaptive
Role of observationsOften just the result of a callFeeds the next reasoning step
Best forClear, well-scoped actionsDynamic tasks where the next step is unclear upfront
Main costTool interface complexityRepeated model calls, latency, and context growth

This is why Tool Use: How Agents Take Action is necessary but not sufficient context.

Tool use explains how the agent can act.

ReAct explains how it decides what to do next after acting.

Where ReAct Helps

ReAct is strongest when the environment pushes back.

In other words, it helps when the task cannot be solved from one static prompt and one obvious tool call.

Good cases include:

The common property is not just multi-step.

The common property is:

the next step depends on what the last step revealed.

That is when the micro-hypothesis loop pays for itself.

It is also why ReAct often feels like the first real reasoning pattern in agent engineering. It gives the agent a way to navigate uncertainty without requiring a full upfront plan.

Where ReAct Breaks

ReAct is not free.

It buys adaptability by paying in tokens, latency, and control complexity.

The most common failure modes are these.

Infinite or Useless Loops

The agent can keep retrying, re-searching, or rechecking without making meaningful progress.

If the system does not have good stop conditions, it can spin forever.

Context Growth

Every new thought, action, and observation adds more context to the run.

Over longer trajectories, that can create context rot, higher token costs, and weaker reasoning.

This is one reason Short-Term Context, Retrieval, and Long-Term Memory becomes important after ReAct. The loop only stays coherent if the runtime manages what gets carried forward.

Cascading Bad Assumptions

If the agent forms the wrong local hypothesis early, later turns can inherit that mistake.

Now the loop is not adapting intelligently. It is digging deeper into the wrong branch.

Premature Stop Conditions

The agent may stop because it found a plausible answer, not because it found enough evidence.

That is why clear success conditions and verification matter so much.

Latency and Cost

ReAct often makes many serial calls.

That means it can be much slower and more expensive than a fixed workflow or a planner-first architecture for predictable tasks.

So the practical lesson is simple:

ReAct is powerful for uncertain paths, but wasteful for obvious ones.

ReAct vs Planner-First Patterns

ReAct is not the only way to structure multi-step reasoning.

Some systems try to reduce repeated model calls by planning more of the task upfront, then executing that plan with less back-and-forth.

That is the basic attraction of planner-first patterns.

The tradeoff looks like this:

So if the work is highly dynamic, ReAct usually has the better shape.

If the work is long, expensive, or mostly predictable, a more explicit planning architecture may be better.

That does not make ReAct obsolete.

It just means ReAct should be treated as one architectural choice, not as the default answer to every agent problem.

What ReAct Points To Next

ReAct is an important threshold concept because it is where a reader first sees the difference between:

Once that idea is clear, the next questions become more interesting.

For example:

Those questions lead naturally into memory, evaluation, reflection, and orchestration design.

FAQ

Is ReAct the same thing as the sense-think-act loop?

No. The sense-think-act loop is the broader control model. ReAct is a more specific pattern for repeatedly running thought, action, and observation in a grounded way.

Is ReAct the same thing as chain-of-thought?

No. Chain-of-thought is reasoning in text. ReAct combines reasoning with external actions and observations, so the next reasoning step can be updated by real evidence.

Is ReAct the same thing as function calling?

No. Function calling is a capability for structured tool requests. ReAct is a multi-step reasoning pattern that may use function calling inside a longer loop.

Do all agents need ReAct?

No. Many tasks are better handled by a fixed workflow or a simple tool call. ReAct is useful when the next step depends on what the previous step discovered.

Why does ReAct get expensive?

Because every loop usually means another model call, more accumulated context, and more serial latency.

What kinds of tasks are good for ReAct?

Tasks with uncertain paths, live feedback, exploratory search, debugging, investigation, and multi-step evidence gathering.

What kinds of tasks are bad for ReAct?

Predictable tasks with known paths, strict latency requirements, simple one-step actions, or work that can be handled more safely by a deterministic workflow.

How do you stop a ReAct agent from looping forever?

Use explicit success conditions, iteration limits, time or cost budgets, escalation rules, and verification checks that distinguish progress from repetition.

Why can ReAct fail even when the agent has the right tools?

Because the problem may be in the reasoning loop rather than the tool interface. The agent may form the wrong hypothesis, misread the observation, or stop too early.

What comes after ReAct in agent design?

Usually better context management, stronger memory design, evaluation of trajectories instead of only outputs, reflection loops, and more structured planning or orchestration patterns.