Updated 2025 Edition

The AI Agent Index

Transparency gaps across 30 agents and 45 fields

Agentic AI systems are increasingly capable of performing complex tasks with limited human involvement. The 2025 AI Agent Index documents the origins, design, capabilities, ecosystem, and safety features of 30 prominent AI agents based on publicly available information and correspondence with developers.

At a Glance

Key Findings

24 / 30

Rapid Deployment

Agents launched or received major agentic updates in 2024-2025, with releases accelerating sharply. Autonomy levels are rising in parallel.

L1 → L5

Autonomy Split

Chat agents maintain lower autonomy (Level 1-3), browser agents operate at Level 4-5 with limited intervention, and enterprise agents move from Level 1-2 in design to Level 3-5 when deployed.

4 / 13

Transparency Gap

Of the 13 agents exhibiting frontier levels of autonomy, only 4 disclose any agentic safety evaluations. Developers share far more information about capabilities than safety practices.

3 Models

Foundation Model Concentration

Almost all agents depend on GPT, Claude, or Gemini model families, creating structural dependencies across the ecosystem.

No Standards

Web Conduct

There are no established standards for how agents should behave on the web. Some agents are explicitly designed to bypass anti-bot protections and mimic human browsing.

US & China

Geographic Divergence

Agent development concentrates in the US (21/30) and China (5/30), with markedly different approaches to safety frameworks and compliance documentation.

From the Paper

Figures & Analysis

2025 marked a substantial rise in attention to AI agents

2025 marked a substantial rise in attention to AI agents

24/30 agents were released or received major agentic updates in 2024-2025. Papers in Google Scholar mentioning “AI agent” or “agentic AI” exceeded the total from all prior years combined. Enterprise platforms emerged more recently than chat agents, reflecting a second wave targeting business automation.

For 198 out of 1,350 fields, no public information was found

For 198 out of 1,350 fields, no public information was found

Missing information concentrates in Ecosystem Interaction and Safety categories. Only 4 agents provide agent-specific system cards. 25/30 disclose no internal safety results, and 23/30 have no third-party testing. Meanwhile, 9/30 agents report capability benchmarks but often lack corresponding safety disclosure.

Autonomy levels differ systematically by agent category

Autonomy levels differ systematically by agent category

Chat agents maintain Level 1–3 autonomy with turn-based interaction. Browser agents operate at Level 4–5 with limited mid-execution intervention. Enterprise platforms show a design/deployment split: users configure agents at Level 1–2, but deployed agents often run at Level 3–5 triggered by events without human involvement.

US and Chinese developers take markedly different approaches to safety disclosures

US and Chinese developers take markedly different approaches to safety disclosures

21/30 agents are US-incorporated, 5/30 Chinese. Chinese agents typically lack documented safety frameworks (1/5) and compliance standards (1/5), though this may reflect documentation practices rather than absence. Only half of all developers (15/30) publish AI safety frameworks. Enterprise assurance standards (SOC 2, ISO 27001) are more widely adopted than agent-specific safety frameworks.

Most agents use a small set of closed-source frontier models

Most agents use a small set of closed-source frontier models

Only frontier labs and Chinese developers run their own models; the majority rely on GPT, Claude, or Gemini families, creating structural dependencies across the ecosystem. 20/30 agents support MCP for tool integration, with enterprise agents leading at 13/13. 23/30 agents are fully closed source at the product level.

The multi-layered agent ecosystem makes evaluation of agentic risks difficult

The multi-layered agent ecosystem makes evaluation of agentic risks difficult

Individual developers often control only a subset of inputs and processes. Agentic evaluations depend on downstream context including tools and autonomy level, making model-level evaluation insufficient. The distributed architecture distributes responsibility across multiple actors, reducing clarity over who is accountable for agentic risks.

How We Constructed the Index

Methodology

Thirty AI agents were systematically selected based on three criteria:

  • Agency: autonomy, goal complexity, environmental interaction, generality
  • Impact: public interest, market significance, developer significance
  • Practicality: public availability, deployability, general purpose

Each agent was annotated across 45 fields of information, organised into six categories, by seven subject-matter experts, using only publicly available information and developer correspondence.
Agents span three types: Chat (12), Browser (5), and Enterprise (13).