Neo

Basic Information

Short description: Neo is an autonomous AI engineer. It's a multi-agent system capable of solving complex ML engineering problems by automating the machine learning workflow. The developers write "It's like having a kaggle master/expert in your team" [source] .
Intended uses: What does the developer state that the system is intended for?: "Automating the entire machine learning workflow." [source]
Date(s) deployed: It is used and tested internally and was introduced on the company's blog on November 15, 2024 [source]. However there is currently a waitlist for a private beta.

Developer

Legal name: HeyNeo [source]
Entity type: Unknown
Country (location of developer or first author's first affiliation): USA [source]
Safety policies: What safety and/or responsibility policies are in place?: Unknown

System Components

Backend model(s): What model(s) are used to power the system?: Unknown
Public model specification: Is there formal documentation on the system’s intend...: None
Description of reasoning, planning, and memory implementation: How does the syst...: "Given a specific objective, NEO initiates a comprehensive workflow to reach its goal. NEO utilizes a structured, multi-step approach to achieve its objectives by breaking down complex problems into manageable components. This approach involves a continuous loop of planning, coding, executing, and debugging — ensuring thorough refinement at each stage. As NEO progresses through these steps, it adapts and iterates until optimal results are achieved. Once developers approve NEO's output, the workflow deploys in seconds. NEO simplifies all the intricacies discussed above for Machine Learning Engineers." [source]
Observation space: What is the system able to observe while 'thinking'?: Neo can see chat, the filesystem, and can browse the web [source]
Action space/tools: What direct actions can the system take?: Neo can run commands in a terminal and produce output for the user to view [source]. NEO uses containerized GPU/CPU sandboxes to perform code executions.
User interface: How do users interact with the system?: Chat. The user can also see an artifact viewer, terminal, monitor, browser, and file explorer, to oversee Neo's actions [source]
Development cost and compute: What is known about the development costs?: Unknown

Guardrails & Oversight

Accessibility of components
Weights: Are model parameters available?: Unknown
Data: Is data available?: Unknown
Code: Is code available?: Closed source
Documentation: Is documentation available?: Unavailable
Scaffolding: Is system scaffolding available?: Closed source
Controls and guardrails: What notable methods are used to protect against harmfu...: Unknown
Monitoring and shutdown procedures: Are there any notable methods or protocols t...: Based on video demos, the user can monitor Neo's activity and intervene if needed [source].
Customer and usage restrictions: Are there know-your-customer measures or other ...: There is currently a waitlist for a private Beta [source]

Evaluation

Notable benchmark evaluations (e.g., on SWE-Bench Verified): MLE Bench (26%) [source]
Bespoke testing (e.g., demos): Demos [source]
Safety: Have safety evaluations been conducted by the developers? What were the ...: None
Publicly reported external red-teaming or comparable auditing
Personnel: Who were the red-teamers/auditors?: None
Scope, scale, access, and methods: What access did red-teamers/auditors have and...: None
Findings: What did the red-teamers/auditors conclude?: None

Ecosystem

Interoperability with other systems: What tools or integrations are available?: None
Usage statistics and patterns: Are there any notable observations about usage?: Unknown
Other notes (if any): --