Updated 2025 Edition

The AI Agent Index

Transparency gaps across 30 agents and 45 fields

Agentic AI systems are increasingly capable of performing complex tasks with limited human involvement. The 2025 AI Agent Index documents the origins, design, capabilities, ecosystem, and safety features of 30 prominent AI agents based on publicly available information and correspondence with developers.

At a Glance

Key Findings

24 / 30

Rapid Deployment

Agents launched or received major agentic updates in 2024-2025, with releases accelerating sharply. Autonomy levels are rising in parallel.

L1 → L5

Autonomy Split

Chat agents maintain lower autonomy (Level 1-3), browser agents operate at Level 4-5 with limited intervention, and enterprise agents move from Level 1-2 in design to Level 3-5 when deployed.

4 / 13

Transparency Gap

Of the 13 agents exhibiting frontier levels of autonomy, only 4 disclose any agentic safety evaluations. Developers share far more information about capabilities than safety practices.

3 Models

Foundation Model Concentration

Almost all agents depend on GPT, Claude, or Gemini model families, creating structural dependencies across the ecosystem.

No Standards

Web Conduct

There are no established standards for how agents should behave on the web. Some agents are explicitly designed to bypass anti-bot protections and mimic human browsing.

US & China

Geographic Divergence

Agent development concentrates in the US (21/30) and China (5/30), with markedly different approaches to safety frameworks and compliance documentation.

From the Paper

Figures & Analysis

2025 marked a substantial rise in attention to AI agents

2025 marked a substantial rise in attention to AI agents

24/30 agents were released or received major agentic updates in 2024-2025. Papers in Google Scholar mentioning “AI agent” or “agentic AI” exceeded the total from all prior years combined. Enterprise platforms emerged more recently than chat agents, reflecting a second wave targeting business automation.

For 198 out of 1,350 fields, no public information was found

For 198 out of 1,350 fields, no public information was found

Missing information concentrates in Ecosystem Interaction and Safety categories. Only 4 agents provide agent-specific system cards. 25/30 disclose no internal safety results, and 23/30 have no third-party testing. Meanwhile, 9/30 agents report capability benchmarks but often lack corresponding safety disclosure.

Autonomy levels differ systematically by agent category

Autonomy levels differ systematically by agent category

Chat agents maintain Level 1–3 autonomy with turn-based interaction. Browser agents operate at Level 4–5 with limited mid-execution intervention. Enterprise platforms show a design/deployment split: users configure agents at Level 1–2, but deployed agents often run at Level 3–5 triggered by events without human involvement.

US and Chinese developers take markedly different approaches to safety disclosures

US and Chinese developers take markedly different approaches to safety disclosures

21/30 agents are US-incorporated, 5/30 Chinese. Chinese agents typically lack documented safety frameworks (1/5) and compliance standards (1/5), though this may reflect documentation practices rather than absence. Only half of all developers (15/30) publish AI safety frameworks. Enterprise assurance standards (SOC 2, ISO 27001) are more widely adopted than agent-specific safety frameworks.

Most agents use a small set of closed-source frontier models

Most agents use a small set of closed-source frontier models

Only frontier labs and Chinese developers run their own models; the majority rely on GPT, Claude, or Gemini families, creating structural dependencies across the ecosystem. 20/30 agents support MCP for tool integration, with enterprise agents leading at 13/13. 23/30 agents are fully closed source at the product level.

The multi-layered agent ecosystem makes evaluation of agentic risks difficult

The multi-layered agent ecosystem makes evaluation of agentic risks difficult

Individual developers often control only a subset of inputs and processes. Agentic evaluations depend on downstream context including tools and autonomy level, making model-level evaluation insufficient. The distributed architecture distributes responsibility across multiple actors, reducing clarity over who is accountable for agentic risks.

How We Constructed the Index

Methodology

Thirty AI agents were systematically selected based on three criteria:

  • Agency: autonomy, goal complexity, environmental interaction, generality
  • Impact: public interest, market significance, developer significance
  • Practicality: public availability, deployability, general purpose

Each agent was annotated across 45 fields, organised into six categories, by seven subject-matter experts, using only publicly available information and developer correspondence.
Agents span three types: Chat (12), Browser (5), and Enterprise (13).

All 30 Agents

Agent Overview

Mobile-Agent

Browser

Alibaba

Claude

Chat

Anthropic

Claude Code

Chat

Anthropic

Browser User

Enterprise

Browser Use

UI-TARS-desktop

Browser

ByteDance

Glean Agents

Enterprise

Glean

Gemini

Chat

Google

Gemini CLI

Chat

Google

Gemini Enterprise

Enterprise

Google

Breeze Agents

Enterprise

HubSpot

watsonx Orchestrate

Enterprise

IBM

Kimi OK Computer

Chat

Moonshot AI

Manus

Chat

Butterfly Effect

Copilot Agents

Enterprise

Microsoft

MiniMax Agent

Chat

MiniMax

n8n AI agent builder

Enterprise

n8n

Agent Builder

Enterprise

OpenAI

ChatGPT

Chat

OpenAI

ChatGPT Agent

Chat

OpenAI

ChatGPT Atlas

Browser

OpenAI

Codex

Chat

OpenAI

Opera Neon

Browser

Opera

Perplexity

Chat

Perplexity

Comet

Browser

Perplexity

Agentforce

Enterprise

Salesforce

Joule Agents

Enterprise

SAP

AI Agents

Enterprise

ServiceNow

Action Agent

Enterprise

Writer

AutoGLM

Chat

Z.ai

Zapier Agents

Enterprise

Zapier