The AI Agent Index

Documenting the technical and safety features of deployed agentic AI systems

Genie


Basic information

Website: https://web.archive.org/web/20240925010719/https://cosine.sh/genie

Short description: An AI software engineer

Intended uses: What does the developer say it’s for? Emulation of human software engineers [source]

Date(s) deployed: August 12, 2024 [source]


Developer

Website: https://web.archive.org/web/20241123055437/https://cosine.sh/

Legal name: BUILDT AI LIMITED (UK) [source]. Buildt Inc. (US) [source]

Entity type: Private limited Company (UK) [source]. Corporation (US) [source]. US Corporation controls over 75 percent of the shares and voting rights of the UK Private limited Company [source]

Country (location of developer or first author’s first affiliation): Incorporation: Delaware, USA (BUILDT INC. 7156765) [source]

Safety policies: What safety and/or responsibility policies are in place? Unknown


System components

Backend model: What model(s) are used to power the system? Variable models which they fine-tuned on their data [source]

Publicly available model specification: Is there formal documentation on the system’s intended uses and how it is designed to behave in them? None

Reasoning, planning, and memory implementation: How does the system ‘think’? Proprietary [source], however, the system is trained on a dataset that resembles a software engineer’s workflow [source].

Observation space: What is the system able to observe while ‘thinking’? GitHub access, workspace in which it can plan, write code, and run tests.

Action space/tools: What direct actions can the system take? Write and execute code, debugging tools, GitHub access [source]

User interface: How do users interact with the system? Can prompt with a freeform prompt, ticket, or link a GitHub issue; monitor the agent finding info, planning, writing code, and running tests [source]

Development cost and compute: What is known about the development costs? Unknown


Guardrails and oversight

Accessibility of components:

  • Weights: Are model parameters available? Closed source
  • Data: Is data available? Closed source; however, Cosine claims that the data “perfectly emulates the cognitive processes, logic and workflow of human engineers. Our proprietary techniques generates data that represents perfect information lineage, incremental knowledge discovery, and step by step decision making” [source]
  • Code: Is code available? Closed source
  • Scaffolding: Is system scaffolding available? Closed source
  • Documentation: Is documentation available? Available [source]

Controls and guardrails: What notable methods are used to protect against harmful actions? Unknown

Customer and usage restrictions: Are there know-your-customer measures or other restrictions on customers? Currently limited to select users who can sign up via waitlist [source]

Monitoring and shutdown procedures: Are there any notable methods or protocols that allow for the system to be shut down if it is observed to behave harmfully? Unknown


Evaluation

Notable benchmark evaluations: 30.08% on SWE-Bench [source]

Bespoke testing: Demos [source]

Safety: Have safety evaluations been conducted by the developers? What were the results? None

Publicly reported external red-teaming or comparable auditing:

  • Personnel: Who were the red-teamers/auditors? None
  • Scope, scale, access, and methods: What access did red-teamers/auditors have and what actions did they take? None
  • Findings: What did the red-teamers/auditors conclude? None

Ecosystem information

Interoperability with other systems: What tools or integrations are available? GitHub [source]

Usage statistics and patterns: Are there any notable observations about usage? Available to select users; anyone can register for the waitlist [source]


Additional notes

None