AI Agent Frameworks | AgentEngineering.org

The phrase AI agent frameworks is now doing too much work.

People use it to describe:

orchestration frameworks
agent SDKs
workflow engines
managed agent platforms
even productized agent builders

That is why so many framework comparisons feel noisy.

They are often comparing unlike things.

A team asks:

Which framework should we use?

but the more useful question is:

Which layer of control do we actually need?

That is the real category problem.

This article is not a leaderboard.

It is a map.

It explains what an AI agent framework actually is, what does not belong in the same bucket, and how to choose among the current options without letting marketing language do your architecture for you.

This article connects naturally to When to Use a Workflow Instead of an Agent, Supervisor, Router, and Planner-Executor Patterns, What Is Agent Engineering?, How Good Agent Memory Actually Works in Production, and OpenAI Codex as a Coding-Agent Platform. Those pieces explain when autonomy helps, how orchestration patterns differ, what the broader discipline is, how memory surfaces shape architecture, and how one concrete platform should be understood. This one focuses on the category between those ideas: the frameworks and build surfaces teams reach for when they want more structure than raw API calls.

What Counts as an AI Agent Framework

At a useful level, an AI agent framework is a developer layer that gives you reusable structure for building agent systems.

That usually means it helps with some combination of:

state
tool use
control flow
agent coordination
execution boundaries
tracing, evaluation, or deployment hooks

The important part is not the branding.

It is the added structure.

A plain model API client is not usually enough on its own.

A full managed platform is often more than a framework.

So the right definition is somewhere in the middle:

an AI agent framework is a reusable software layer for building and operating agent behavior beyond direct model calls

That still leaves a lot of room.

Which is exactly why the category gets messy fast.

Why the Category Is Confusing

Current documentation surfaces are making the confusion worse, not better.

Some tools present themselves as:

low-level orchestration frameworks
open-source multi-agent frameworks
model-native agent SDKs
production platforms for agents

and many of those claims are directionally true.

The problem is that they are not claims about the same layer.

For example:

LangGraph describes itself as a low-level orchestration framework and runtime for long-running, stateful agents
CrewAI describes itself as an open-source framework for orchestrating autonomous agents and building workflows
the OpenAI Agents SDK positions itself as a code-first SDK for orchestration, tools, approvals, state, and specialist handoffs
Google ADK presents itself as a framework that starts with prompts and tool calls, then grows into multi-agent orchestration, graph workflows, evaluation, and deployment
Microsoft Agent Framework explicitly splits agents and workflows as two primary capability categories

Those are not fake distinctions.

They are real.

But they are also distinctions between different kinds of build surfaces.

So when someone compares LangGraph vs CrewAI vs OpenAI Agents SDK as if they were the same kind of tool, the comparison is already partially wrong.

They overlap.

They do not occupy the same architectural layer.

The S.T.A.C.K. Lens

A better way to compare AI agent frameworks is the S.T.A.C.K. lens:

State
Tools
Abstraction
Control
Kernel

This is not a feature checklist.

It is a way to ask what kind of system layer a framework is really offering you.

State

How does the framework think about state?

Does it give you:

simple conversation history
durable workflow state
resumable execution
memory primitives
checkpointing across long-running jobs

This matters because state is where a lot of the real platform difference begins.

Some tools are lightweight around state.

Others are built around persistent, resumable runs.

That is not a cosmetic difference.

It changes what kinds of systems they fit well.

Tools

What tool model does the framework assume?

Does it mainly expose:

function tools
MCP integration
workflow steps
graph nodes
agent-to-agent delegation

Different frameworks are opinionated here.

Some treat tools as the center of the loop.

Some treat them as one component inside a broader graph or workflow model.

If your system is mostly about tool execution and bounded agent turns, one kind of framework fits better.

If your system is really about a larger event-driven workflow, another category fits better.

Abstraction

How much architecture is already decided for you?

This is where framework selection gets real.

Some tools give you:

high-level agents
crews
prebuilt patterns
developer-friendly defaults

Others stay deliberately low-level and expect you to compose the system yourself.

Neither is automatically better.

High abstraction can speed up early work.

Low abstraction can preserve control when the system becomes weird, expensive, or reliability-sensitive.

Control

Where does explicit control live?

This is one of the most important questions.

Does the framework make it easy to express:

routing
retries
approvals
interrupts
branching
deterministic workflow edges

or does it encourage looser autonomous behavior and hide the control plane behind easier abstractions?

If your team needs exact execution semantics, the answer matters more than almost any benchmark table.

Kernel

What is the real runtime kernel of the framework?

In other words:

what is the deepest thing it is actually built around?

That kernel might be:

a graph runtime
an agent-turn loop
a workflow engine
a model-native orchestration SDK
a broader hosted platform

This is the layer many comparisons miss.

Frameworks that look similar on the surface can feel very different because their kernel is different.

That is usually the real reason one tool fits and another does not.

The Main Categories That Matter Right Now

The market is crowded, but the useful categories are not endless.

They cluster into a few main groups.

1. Low-Level Orchestration Frameworks

This is the category LangGraph represents most clearly.

Its own docs position it as a low-level orchestration framework and runtime for long-running, stateful agents, with emphasis on:

durable execution
interrupts
memory
stateful workflows
production deployment

That is not the same thing as a high-level multi-agent toolkit.

It is closer to an orchestration substrate.

This category fits teams that want:

strong control over execution paths
explicit state handling
durable long-running behavior
room to build their own architecture rather than inherit one

The tradeoff is predictable:

you get more control, but you also own more design work.

2. Higher-Level Multi-Agent Abstraction Frameworks

CrewAI is the clearest example here.

Its current docs are built around the split between:

Flows
Crews

That is already a strong category signal.

CrewAI is not only giving you primitives.

It is giving you a more opinionated model for how autonomous teams of agents fit inside a larger workflow.

That is useful for teams that want:

agent teams quickly
role-based delegation
higher-level collaboration patterns
a clearer out-of-the-box multi-agent story

The tradeoff is also clear:

the more abstraction you inherit, the more carefully you need to inspect whether that abstraction still fits once the system gets more complex.

3. Model-Native Agent SDKs and Runtimes

The OpenAI Agents SDK sits here most clearly.

Its docs are unusually explicit about the category boundary:

use the plain client libraries for direct model requests
use the Agents SDK when your application owns orchestration, tool execution, approvals, and state

That is an important distinction.

This is less of a generic framework for every architecture and more of a model-native runtime/SDK layer for agent systems.

It fits teams that want:

code-first orchestration
tool use and handoffs
human review and guardrails
model-native runtime patterns

without necessarily adopting a graph-first or crew-first abstraction.

This category is often misunderstood because people compare it directly to orchestration frameworks as if they were solving the exact same layer.

They are not.

4. Platform-Backed Agent Frameworks

Google ADK and Microsoft Agent Framework belong more here.

They are frameworks.

But they are also clearly tied to broader platform stories.

Google ADK presents itself as a framework that can start simply, then expand into:

multi-agent orchestration
graph-based workflows
evaluation
deployment to Google services

Microsoft Agent Framework is similarly explicit that it combines:

agents
workflows
state management
middleware
MCP clients

and positions itself as the successor to AutoGen and Semantic Kernel.

These are not just lightweight libraries.

They are broader ecosystem-backed development surfaces.

That can be a strength if your team wants:

tighter platform alignment
more built-in production paths
a larger integrated operating story

It can also be a constraint if you wanted a thinner, less ecosystem-shaped layer.

5. Framework-Adjacent Application and Agent Builders

This is where tools like Mastra, Pydantic AI, and LlamaIndex become interesting.

They are real framework surfaces.

But they also carry stronger identity around a specific developer motion.

Mastra frames itself as a modern TypeScript framework and platform for AI-powered applications and agents, with strong emphasis on:

evals
observability
deployment
application integration

Pydantic AI frames itself as a Python agent framework focused on type-safe, production-grade agent development with broad model support and tight observability integration.

LlamaIndex still reads most clearly as a framework for building agentic systems over your data, with workflows and context augmentation as first-class concerns.

These are not all the same kind of product.

But they share a trait:

they are framework surfaces shaped around a particular developer center of gravity:

TypeScript application builders
Python typed-agent builders
data-centric agent builders

That matters because sometimes the best framework choice is not about a general category at all.

It is about where your team already lives.

What Most Teams Get Wrong

The most common mistake is to ask:

Which framework is winning?

That is usually the wrong first question.

The better first questions are:

how much control do we need?
how explicit should execution be?
what state model do we need?
are we building a workflow-heavy system, a tool-heavy system, or a data-heavy one?
do we need a framework at all?

A lot of framework pain comes from abstraction mismatch.

Teams choose:

a high-level framework when they really need explicit orchestration
a low-level orchestration tool when they really wanted a faster app-builder surface
a platform-backed framework without realizing they are also choosing ecosystem gravity

That is why the category feels harder than it should.

People are often choosing a story, not a layer.

How to Choose Without Fooling Yourself

A simple selection rule is:

Start Lower When Reliability and Control Matter Most

If your system is:

long-running
stateful
approval-sensitive
reliability-critical

then lower-level orchestration and explicit workflow control usually age better.

You will do more upfront work.

You will also understand your system better.

Start Higher When Speed and Team Throughput Matter Most

If your team wants:

faster iteration
clearer defaults
easier multi-agent composition
simpler onboarding

then higher-level abstraction can be worth it.

Just do not confuse speed of first demo with long-term control quality.

Prefer Platform Alignment Only When It Actually Helps

A platform-backed framework can help if:

you already live in that ecosystem
you want its deployment path
you want its observability or control primitives

But platform alignment is not automatically architectural clarity.

Sometimes it is just gravity.

Avoid the Framework If Plain Workflows Are Enough

This point matters more than most framework vendors would like.

If the job is mostly:

deterministic
bounded
easy to express as workflow steps
not truly agentic

then When to Use a Workflow Instead of an Agent still applies.

You do not get engineering points for introducing a framework you do not need.

My View

AI agent frameworks is a real category.

It is also a messy one.

The way to make it cleaner is not to hunt for the one best framework.

It is to stop pretending every framework is solving the same problem.

Some are:

orchestration substrates
multi-agent abstraction layers
model-native SDK runtimes
platform-backed development surfaces
application-centered builder frameworks

That is a healthier map.

And once you use that map, a lot of the noise disappears.

The right framework choice is usually not about popularity.

It is about what layer of control your team actually needs.

FAQ

Do most teams need an AI agent framework?

No.

Some teams need one.

Many teams first need a smaller amount of explicit workflow code, better tool design, and clearer control boundaries before a framework adds real value.

What is the difference between a framework and an SDK here?

An SDK usually gives you programmatic access to a model-native or product-native runtime surface.

A framework usually gives you broader reusable structure for building agent behavior, state, control flow, or orchestration.

The problem is that many current tools blur the line.

Which framework is best?

There is no stable single answer.

The better question is:

which control surface fits the system you are actually building?

That is why low-level orchestration tools, higher-level multi-agent frameworks, model-native SDKs, and platform-backed frameworks should not all be compared as if they were interchangeable.

Do I need a framework before I can build a real agent system?

No.

A lot of real systems start with:

plain model calls
explicit workflows
tool functions
tracing and evaluation

and only adopt a framework when the control surface gets too repetitive to manage cleanly by hand.

How should I think about frameworks versus workflows?

A workflow is usually an execution shape.

A framework is a reusable software layer.

Some frameworks are built around workflows.

Some are built around agent loops.

Some combine both.

That is one reason the category is so easy to flatten by mistake.

The better question is:

which framework category fits your needed state model, tool model, abstraction level, and control surface?