The AI Agent Index

Documenting the technical and safety features of deployed agentic AI systems

Neo


Basic information

Website: https://web.archive.org/web/20241120161541/https://heyneo.so/

Short description: Neo is an autonomous AI engineer. It’s a multi-agent system capable of solving complex ML engineering problems by automating the machine learning workflow. The developers write “It’s like having a kaggle master/expert in your team” [source] .

Intended uses: What does the developer say it’s for? “Automating the entire machine learning workflow.” [source]

Date(s) deployed: It is used and tested internally and was introduced on the company’s blog on November 15, 2024 [source]. However there is currently a waitlist for a private beta.


Developer

Website: https://web.archive.org/web/20241120161541/https://heyneo.so/

Legal name: HeyNeo [source]

Entity type: Unknown

Country (location of developer or first author’s first affiliation): USA [source]

Safety policies: What safety and/or responsibility policies are in place? Unknown


System components

Backend model: What model(s) are used to power the system? Unknown

Publicly available model specification: Is there formal documentation on the system’s intended uses and how it is designed to behave in them? None

Reasoning, planning, and memory implementation: How does the system ‘think’? “Given a specific objective, NEO initiates a comprehensive workflow to reach its goal. NEO utilizes a structured, multi-step approach to achieve its objectives by breaking down complex problems into manageable components. This approach involves a continuous loop of planning, coding, executing, and debugging — ensuring thorough refinement at each stage. As NEO progresses through these steps, it adapts and iterates until optimal results are achieved. Once developers approve NEO’s output, the workflow deploys in seconds. NEO simplifies all the intricacies discussed above for Machine Learning Engineers.” [source]

Observation space: What is the system able to observe while ‘thinking’? Neo can see chat, the filesystem, and can browse the web [source]

Action space/tools: What direct actions can the system take? Neo can run commands in a terminal and produce output for the user to view [source]. NEO uses containerized GPU/CPU sandboxes to perform code executions.

User interface: How do users interact with the system? Chat. The user can also see an artifact viewer, terminal, monitor, browser, and file explorer, to oversee Neo’s actions [source]

Development cost and compute: What is known about the development costs? Unknown


Guardrails and oversight

Accessibility of components:

  • Weights: Are model parameters available? Unknown
  • Data: Is data available? Unknown
  • Code: Is code available? Closed source
  • Scaffolding: Is system scaffolding available? Closed source
  • Documentation: Is documentation available? Unavailable

Controls and guardrails: What notable methods are used to protect against harmful actions? Unknown

Customer and usage restrictions: Are there know-your-customer measures or other restrictions on customers? There is currently a waitlist for a private Beta [source]

Monitoring and shutdown procedures: Are there any notable methods or protocols that allow for the system to be shut down if it is observed to behave harmfully? Based on video demos, the user can monitor Neo’s activity and intervene if needed [source].


Evaluation

Notable benchmark evaluations: MLE Bench (26%) [source]

Bespoke testing: Demos [source]

Safety: Have safety evaluations been conducted by the developers? What were the results? None

Publicly reported external red-teaming or comparable auditing:

  • Personnel: Who were the red-teamers/auditors? None
  • Scope, scale, access, and methods: What access did red-teamers/auditors have and what actions did they take? None
  • Findings: What did the red-teamers/auditors conclude? None

Ecosystem information

Interoperability with other systems: What tools or integrations are available? None

Usage statistics and patterns: Are there any notable observations about usage? Unknown


Additional notes

None