Grit Agent

Basic information

Short description: https://perma.cc/9M7E-QE6L

Intended uses: What does the developer say it’s for? For autonomously making large scale changes to codebases such as code migrations (from one API/framework to another) or refactors [source] [source]

Date(s) deployed: Unknown

Developer

Website: https://perma.cc/9M7E-QE6L

Legal name: Iuvo AI, Inc [source]

Entity type: Corporation [source] [source]

Country (location of developer or first author’s first affiliation): Incorporation: Delaware, USA (IUVO AI, INC. 6427074) [source] [source]

Safety policies: What safety and/or responsibility policies are in place? Unknown

System components

Backend model: What model(s) are used to power the system? Unknown

Publicly available model specification: Is there formal documentation on the system’s intended uses and how it is designed to behave in them? None

Reasoning, planning, and memory implementation: How does the system ‘think’? The agent works in three stages: 1. Planning. Combines static analysis with LLMs to produce an index of the codebase. Then creates a rough plan for how to change the code. 2. Generation. Applies a series of transformations to the code using AI + Grit DSL. Queries online documentation to inform this step. 3. Refinement. Creates a pull request and gets feedback from unit tests, CI tools, an automated reviewer, as well as human reviews. Then rewrites code based on feedback [source]

Observation space: What is the system able to observe while ‘thinking’? The raw code, an index of the code (details dependencies, functionality of each file, and class hierarchies), online documentation, and unit test and reviewer feedback [source]

Action space/tools: What direct actions can the system take? Rewrite codebases in a sandbox, search for and read online documentation, submit pull requests, and run unit tests [source]

User interface: How do users interact with the system? Through a web interface called Grit Studio [source]

Development cost and compute: What is known about the development costs? Unknown

Guardrails and oversight

Accessibility of components:

Weights: Are model parameters available? Closed source
Data: Is data available? Closed source
Code: Is code available? Closed source
Scaffolding: Is system scaffolding available? Closed source
Documentation: Is documentation available? Available [source]

Controls and guardrails: What notable methods are used to protect against harmful actions? Unknown

Customer and usage restrictions: Are there know-your-customer measures or other restrictions on customers? Must schedule call with Grit to obtain access to paid tier [source]

Monitoring and shutdown procedures: Are there any notable methods or protocols that allow for the system to be shut down if it is observed to behave harmfully? Unknown

Evaluation

Notable benchmark evaluations: None

Bespoke testing: Minimal, but some demo videos available on the internet of predefined workflows being executed [source]

Safety: Have safety evaluations been conducted by the developers? What were the results? None

Publicly reported external red-teaming or comparable auditing:

Personnel: Who were the red-teamers/auditors? None
Scope, scale, access, and methods: What access did red-teamers/auditors have and what actions did they take? None
Findings: What did the red-teamers/auditors conclude? None

Ecosystem information

Interoperability with other systems: What tools or integrations are available? Integrates with github, GritQL code transformation language, internet search tool for documentation, and python interpreter [source]

Usage statistics and patterns: Are there any notable observations about usage? Not many, but a few customers (like LangChain) on Twitter speak about the Grit system [source] [source].

Additional notes

None