Weco
Basic information
Website: https://web.archive.org/web/20241127010507/https://www.weco.ai/
Short description: An AI data science agent: AIDE designs pipelines for data analysis by generating code and producing models to analyze data [source]
Intended uses: What does the developer say it’s for? Weco “generates code for data preprocessing as well as model training, inference, and evaluation…The current alpha version of AIDE primarily targets tabular data tasks that can be solved with CPUs.” [source]
Date(s) deployed: April 4, 2024 [source]
Developer
Website: https://web.archive.org/web/20241127010507/https://www.weco.ai/
Legal name: WECO AI LTD [source] [source]
Entity type: Private limited Company (UK) [source]
Country (location of developer or first author’s first affiliation): Incorporation: UK [source]. HQ: London [source]
Safety policies: What safety and/or responsibility policies are in place? Unknown
System components
Backend model: What model(s) are used to power the system? Variable including OpenAI or Anthropic models [source]
Publicly available model specification: Is there formal documentation on the system’s intended uses and how it is designed to behave in them? None
Reasoning, planning, and memory implementation: How does the system ‘think’? “Solution Space Tree Search:” (1) Proposes solutions or makes changes to existing ones, (2) evaluates quality of solutions by running them and evaluating results, (3) selects most promising solution and begins another round of iteration/refinement [source]. Uses a ‘journal’ structure which stores generated code samples, tree structure of generated code samples, results of code execution, and evaluation metrics [source].
Observation space: What is the system able to observe while ‘thinking’? Maintains a workspace with all of the files and data generated by the AI agent [source]
Action space/tools: What direct actions can the system take? Writes and executes code, python interpreter, directory for storing logs [source]
User interface: How do users interact with the system? The user can monitor the agent’s logs and the forming solution tree [source]
Development cost and compute: What is known about the development costs? Unknown
Guardrails and oversight
Accessibility of components:
- Weights: Are model parameters available? N/A; backends various models
- Data: Is data available? N/A; backends various models
- Code: Is code available? Available [source]
- Scaffolding: Is system scaffolding available? Open source [source]
- Documentation: Is documentation available? Available [source]
Controls and guardrails: What notable methods are used to protect against harmful actions? Depends on what guardrails are implemented in a specific configuration
Customer and usage restrictions: Are there know-your-customer measures or other restrictions on customers? None
Monitoring and shutdown procedures: Are there any notable methods or protocols that allow for the system to be shut down if it is observed to behave harmfully? Depends on what is implemented in a specific configuration
Evaluation
Notable benchmark evaluations: On MLE-Bench, “OpenAI’s o1-preview with AIDE scaffolding — achieves at least the level of a Kaggle bronze medal in 16.9% of competitions” [source], which was the best reported score; OpenAI used Weco AI’s open source scaffolding for their benchmarking
Bespoke testing: Several sample results Available [source]
Safety: Have safety evaluations been conducted by the developers? What were the results? None
Publicly reported external red-teaming or comparable auditing:
- Personnel: Who were the red-teamers/auditors? None
- Scope, scale, access, and methods: What access did red-teamers/auditors have and what actions did they take? None
- Findings: What did the red-teamers/auditors conclude? None
Ecosystem information
Interoperability with other systems: What tools or integrations are available? None
Usage statistics and patterns: Are there any notable observations about usage? Unknown
Additional notes
None