The AI Agent Index

Documenting the technical and safety features of deployed agentic AI systems

MetaGPT


Basic information

Website: https://web.archive.org/web/20241221061253/https://www.deepwisdom.ai/

Short description: A meta-programming framework incorporating efficient human workflows into LLM-based multi-agent collaborations. Acts as an AI software company developing diverse range of software solutions.

Intended uses: What does the developer say it’s for? MetaGPT adopts an assembly line paradigm to assign different roles to GPTs to form a collaborative software entity for solving complex tasks by efficiently breaking them down into subtasks involving many agents working together [source]

Date(s) deployed: June 30, 2023 [source]


Developer

Website: https://www.deepwisdom.ai/metagpt

Legal name: MetaGPT LLC [source] (see Terms of Service)

Entity type: Corporation

Country (location of developer or first author’s first affiliation): Incorporation: Delaware, USA (7606285) [source]

Safety policies: What safety and/or responsibility policies are in place? Unknown


System components

Backend model: What model(s) are used to power the system? Supports a variety of backend models [source]

Publicly available model specification: Is there formal documentation on the system’s intended uses and how it is designed to behave in them? None

Reasoning, planning, and memory implementation: How does the system ‘think’? MetaGPT assigns specific roles (e.g. Engineer, Tester etc.) to each agent and initialize specialized skills and context for each role. A structured communication protocol is used between agents for collaboration, where all agents publish messages in common pool and agents can subscribe to receive messages during task solving process. The system employs iterative programming via self-correcting executable feedback (e.g Unit Test Generator). Memory is role specific and consists of list of messages, which contain all the necessary information, context and observations. Everything is modeled by roles, where new roles can be added for agents or humans can take one of the roles [source]

Observation space: What is the system able to observe while ‘thinking’? In MetaGPT, each agent observes messages from other agents and any context about the task it is solving specific to the role, feedback from previous projects and iterative feedback from current project [source]

Action space/tools: What direct actions can the system take? Each agent can directly execute tasks specific to their role e.g. writeDesign, writeCode, writeCodeReview, writePRD, writeTasks, publish messages, subscribe to messages (new actions can be added specific to role)

User interface: How do users interact with the system? MetaGPT takes a one line requirement as input and outputs user stories / competitive analysis / requirements / data structures / APIs / documents, etc.

Development cost and compute: What is known about the development costs? Unknown


Guardrails and oversight

Accessibility of components:

  • Weights: Are model parameters available? N/A; backends various models
  • Data: Is data available? N/A; backends various models
  • Code: Is code available? Available [source]
  • Scaffolding: Is system scaffolding available? On Github [source] and publication [source]
  • Documentation: Is documentation available? Documentation page [source] on Github [source], publication [source] and pre-prints [source] [source]

Controls and guardrails: What notable methods are used to protect against harmful actions? None

Customer and usage restrictions: Are there know-your-customer measures or other restrictions on customers? None

Monitoring and shutdown procedures: Are there any notable methods or protocols that allow for the system to be shut down if it is observed to behave harmfully? None


Evaluation

Notable benchmark evaluations: 87.7% on MBPP (Pass @1) and 85.9% on HumanEval [source], 46.67% on SWE-lite [source]; some more benchmarks for specific components, Data Interpreter [source] and AFlow [source].

Bespoke testing: Demo [source]

Safety: Have safety evaluations been conducted by the developers? What were the results? None

Publicly reported external red-teaming or comparable auditing:

  • Personnel: Who were the red-teamers/auditors? None
  • Scope, scale, access, and methods: What access did red-teamers/auditors have and what actions did they take? None
  • Findings: What did the red-teamers/auditors conclude? None

Ecosystem information

Interoperability with other systems: What tools or integrations are available? MetaGPT allows to create your own tools and supports some tool usage by specific role (e.g. web search tools) but does not support UI or front-end based tools [source] [source]

Usage statistics and patterns: Are there any notable observations about usage? MetaGPT has 46.7k stars and 5.45k forks [source]; publication has ~500 citations


Additional notes

None