The AI Agent Index

Documenting the technical and safety features of deployed agentic AI systems

Pythagora-v1 (GPT-Pilot)


Basic information

Website: https://web.archive.org/web/20241001171050/https://www.pythagora.ai/v1

Short description: Pythagora-v1 is an application designed to help users interactively design entire apps. “With Pythagora, people can build apps with up to 5000 lines of code ONLY by writing in natural language.” [source] GPT-Pilot is an open source backend “brain” for Pythagora-v1 which is itself agentic [source].

Intended uses: What does the developer say it’s for? Application development. Pythagora-v1 and GPT-Pilot are marketed as being specialized for making entire apps as opposed to helping with narrower coding tasks.

Date(s) deployed: GPT-Pilot [source] (the backend “brain” for Pythagora-v1) was announced August 23, 2023 [source]. Pythagora-v1 was announced October 1, 2024 [source].


Developer

Website: https://www.pythagora.ai/

Legal name: Pythagora Inc [source]

Entity type: Corporation

Country (location of developer or first author’s first affiliation): Incorporation: Delaware, USA. HQ: Berkeley, CA, USA [source]

Safety policies: What safety and/or responsibility policies are in place? Unknown


System components

Backend model: What model(s) are used to power the system? Variable including GPT and Claude models [source]

Publicly available model specification: Is there formal documentation on the system’s intended uses and how it is designed to behave in them? None

Reasoning, planning, and memory implementation: How does the system ‘think’? There are differently configured specialist agents: “specification writer”, “architect”, “tech lead”, “developer”, “code monkey”, “troubleshooter”, “debugger”, and “technical writer” which each accomplish different parts of the task. Orchestrated by the architect, they iteratively pass documents and instructions between each other. It is able to ask questions to the human overseeing it [source] [source]

Observation space: What is the system able to observe while ‘thinking’? By default, GPT-Pilot is limited to observations in the folder that it is working in. It cannot access the web but could be modified to [source]. It is unclear if Pythagora-v1 has the same observation space.

Action space/tools: What direct actions can the system take? GPT-Pilot is able to create and work in files and run consequential commands with user approval [source]. It is unclear if Pythagora-v1 has the same action space.

User interface: How do users interact with the system? Either through a console or with Pythagora — a VS Code extension [source]

Development cost and compute: What is known about the development costs? Unknown


Guardrails and oversight

Accessibility of components:

  • Weights: Are model parameters available? N/A; backends external model(s) via API
  • Data: Is data available? N/A; backends external model(s) via API
  • Code: Is code available? Available for GPT-Pilot [source]
  • Scaffolding: Is system scaffolding available? Available for GPT-Pilot [source]
  • Documentation: Is documentation available? Available [source]

Controls and guardrails: What notable methods are used to protect against harmful actions? GPT-Pilot is very limited in what it can do directly by default, and is designed to seek user approval before taking certain actions [source]. It is unknown exactly how this design feature in GPT-Pilot affects Pythagora-v1 (which backends GPT-Pilot)

Customer and usage restrictions: Are there know-your-customer measures or other restrictions on customers? Pythagora-v1 is in a private beta [source]. GPT-Pilot (its backend “brain”) is open source [source]

Monitoring and shutdown procedures: Are there any notable methods or protocols that allow for the system to be shut down if it is observed to behave harmfully? GPT-Pilot is designed to ask users for approval before running certain commands [source]. It is unknown exactly how this design feature in GPT-Pilot affects Pythagora-v1 (which backends GPT-Pilot)


Evaluation

Notable benchmark evaluations: None

Bespoke testing: Demos [source] [source]

Safety: Have safety evaluations been conducted by the developers? What were the results? None

Publicly reported external red-teaming or comparable auditing:

  • Personnel: Who were the red-teamers/auditors? None
  • Scope, scale, access, and methods: What access did red-teamers/auditors have and what actions did they take? None
  • Findings: What did the red-teamers/auditors conclude? None

Ecosystem information

Interoperability with other systems: What tools or integrations are available? None

Usage statistics and patterns: Are there any notable observations about usage? GPT-Pilot has 3.2k forks and 32k stars on GitHub [source]


Additional notes

None