Pythagora-v1 (GPT-Pilot)

Basic Information

Short description: Pythagora-v1 is an application designed to help users interactively design entire apps. "With Pythagora, people can build apps with up to 5000 lines of code ONLY by writing in natural language." [source] GPT-Pilot is an open source backend "brain" for Pythagora-v1 which is itself agentic [source].
Intended uses: What does the developer state that the system is intended for?: Application development. Pythagora-v1 and GPT-Pilot are marketed as being specialized for making entire apps as opposed to helping with narrower coding tasks.
Date(s) deployed: GPT-Pilot [source] (the backend "brain" for Pythagora-v1) was announced August 23, 2023 [source]. Pythagora-v1 was announced October 1, 2024 [source].

Developer

Legal name: Pythagora Inc [source]
Entity type: Corporation
Country (location of developer or first author's first affiliation): Incorporation: Delaware, USA. HQ: Berkeley, CA, USA [source]
Safety policies: What safety and/or responsibility policies are in place?: Unknown

System Components

Backend model(s): What model(s) are used to power the system?: Variable including GPT and Claude models [source]
Public model specification: Is there formal documentation on the system’s intend...: None
Description of reasoning, planning, and memory implementation: How does the syst...: There are differently configured specialist agents: "specification writer", "architect", "tech lead", "developer", "code monkey", "troubleshooter", "debugger", and "technical writer" which each accomplish different parts of the task. Orchestrated by the architect, they iteratively pass documents and instructions between each other. It is able to ask questions to the human overseeing it [source] [source]
Observation space: What is the system able to observe while 'thinking'?: By default, GPT-Pilot is limited to observations in the folder that it is working in. It cannot access the web but could be modified to [source]. It is unclear if Pythagora-v1 has the same observation space.
Action space/tools: What direct actions can the system take?: GPT-Pilot is able to create and work in files and run consequential commands with user approval [source]. It is unclear if Pythagora-v1 has the same action space.
User interface: How do users interact with the system?: Either through a console or with Pythagora -- a VS Code extension [source]
Development cost and compute: What is known about the development costs?: Unknown

Guardrails & Oversight

Accessibility of components
Weights: Are model parameters available?: N/A; backends external model(s) via API
Data: Is data available?: N/A; backends external model(s) via API
Code: Is code available?: Available for GPT-Pilot [source]
Documentation: Is documentation available?: Available [source]
Scaffolding: Is system scaffolding available?: Available for GPT-Pilot [source]
Controls and guardrails: What notable methods are used to protect against harmfu...: GPT-Pilot is very limited in what it can do directly by default, and is designed to seek user approval before taking certain actions [source]. It is unknown exactly how this design feature in GPT-Pilot affects Pythagora-v1 (which backends GPT-Pilot)
Monitoring and shutdown procedures: Are there any notable methods or protocols t...: GPT-Pilot is designed to ask users for approval before running certain commands [source]. It is unknown exactly how this design feature in GPT-Pilot affects Pythagora-v1 (which backends GPT-Pilot)
Customer and usage restrictions: Are there know-your-customer measures or other ...: Pythagora-v1 is in a private beta [source]. GPT-Pilot (its backend "brain") is open source [source]

Evaluation

Notable benchmark evaluations (e.g., on SWE-Bench Verified): None
Bespoke testing (e.g., demos): Demos [source] [source]
Safety: Have safety evaluations been conducted by the developers? What were the ...: None
Publicly reported external red-teaming or comparable auditing
Personnel: Who were the red-teamers/auditors?: None
Scope, scale, access, and methods: What access did red-teamers/auditors have and...: None
Findings: What did the red-teamers/auditors conclude?: None

Ecosystem

Interoperability with other systems: What tools or integrations are available?: None
Usage statistics and patterns: Are there any notable observations about usage?: GPT-Pilot has 3.2k forks and 32k stars on GitHub [source]
Other notes (if any): --