Sibyl System
Basic information
Website: https://web.archive.org/web/20240717072810/https://github.com/Ag2S1/Sibyl-System
Short description: Sibyl is a framework that transforms existing language models (i.e. GPT-4o) into agents that can complete tasks by using a web browser and Python interpreter [source]
Intended uses: What does the developer say it’s for? Augment existing language models (i.e., GPT-4o), helping them to solve complex reasoning tasks [source]
Date(s) deployed: Earliest GitHub commits from July 16, 2024 [source]
Developer
Website: https://web.archive.org/web/20241204153959/http://www.baichuan-ai.com/home
Legal name: Beijing Baichuan Intelligent Technology Co., Ltd. (北京百川智能科技有限公司 [source])
Entity type: Unknown
Country (location of developer or first author’s first affiliation): Beijing, China [source]
Safety policies: What safety and/or responsibility policies are in place? Available [source]
System components
Backend model: What model(s) are used to power the system? The default backend models are GPT-4 and GPT-4o [source]
Publicly available model specification: Is there formal documentation on the system’s intended uses and how it is designed to behave in them? None
Reasoning, planning, and memory implementation: How does the system ‘think’? Planning: Sibyl’s tool planner processes a user’s query and any associated step history to select appropriate tools. Reasoning: Sibyl’s jury mechanism uses a “multiagent debate format for self-critique and correction.” Memory: Sibyl’s global workspace compresses and shares information between the agent’s modules [source]
Observation space: What is the system able to observe while ‘thinking’? Sibyl operates in a workspace where it can observe outputs from a web browser and Python interpreter, along with its task memory [source]
Action space/tools: What direct actions can the system take? Sibyl can execute code and search the internet. For a full breakdown of Sibyl’s action space, see Appendix A of the technical report [source]
User interface: How do users interact with the system? Code released in a GitHub repository without a user interface [source]
Development cost and compute: What is known about the development costs? Unknown
Guardrails and oversight
Accessibility of components:
- Weights: Are model parameters available? N/A; backends various models
- Data: Is data available? N/A; backends various models
- Code: Is code available? Available [source]
- Scaffolding: Is system scaffolding available? Available [source]
- Documentation: Is documentation available? Documentation on GitHub [source] and pre-print [source]
Controls and guardrails: What notable methods are used to protect against harmful actions? None
Customer and usage restrictions: Are there know-your-customer measures or other restrictions on customers? None
Monitoring and shutdown procedures: Are there any notable methods or protocols that allow for the system to be shut down if it is observed to behave harmfully? Depends on what is implemented in a specific configuration [source]
Evaluation
Notable benchmark evaluations: 34.55% average score on GAIA Benchmark [source]
Bespoke testing: None
Safety: Have safety evaluations been conducted by the developers? What were the results? None
Publicly reported external red-teaming or comparable auditing:
- Personnel: Who were the red-teamers/auditors? None
- Scope, scale, access, and methods: What access did red-teamers/auditors have and what actions did they take? None
- Findings: What did the red-teamers/auditors conclude? None
Ecosystem information
Interoperability with other systems: What tools or integrations are available? By default, Sibyl interacts with only two external systems: a web browser and a Python interpreter. However, Sibyl is open-source and can be modified to integrate with other systems. According to its developers, Sibyl “can be seamlessly integrated as a low-cost enhancement into existing frameworks, easily replacing the vanilla GPT-4 API.” [source]
Usage statistics and patterns: Are there any notable observations about usage? The GitHub repository has 1 fork and 34 stars stars [source]
Additional notes
None