Virtual Lab

Basic information

Website: https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1

Short description: The Virtual Lab consists of an LLM principal investigator agent guiding a team of LLM agents with different scientific backgrounds (e.g., a chemist agent, a computer scientist agent, a critic agent), with a human researcher providing high-level feedback [source]

Intended uses: What does the developer say it’s for? The Virtual Lab is intended to facilitate interdisciplinary scientific research by combining AI agents and human collaboration. It uses LLMs to simulate a research team with a principal investigator, scientist agents, and a critic agent to address complex, open-ended problems. Its capabilities were demonstrated in designing nanobody binders for SARS-CoV-2 variants, integrating tools like AlphaFold and Rosetta. The goal is to enable impactful, real-world discoveries by bridging expertise across fields [source]

Date(s) deployed: November 26th 2024 [source]

Developer

Website: https://web.archive.org/web/20241126222051/https://github.com/zou-group/virtual-lab

Legal name: Stanford University (et al.) [source]

Entity type: Academic Institution(s)

Country (location of developer or first author’s first affiliation): California, USA [source]

Safety policies: What safety and/or responsibility policies are in place? None

System components

Backend model: What model(s) are used to power the system? It’s powered by large language models (LLMs), specifically GPT-4o, which drives the reasoning abilities of its agents. It also incorporates computational tools such as ESM (a protein language model), AlphaFold-Multimer (a protein folding model), and Rosetta (a computational biology software) for specific tasks within its nanobody design pipeline [source]

Publicly available model specification: Is there formal documentation on the system’s intended uses and how it is designed to behave in them? None

Reasoning, planning, and memory implementation: How does the system ‘think’? it uses a multi-agent architecture to simulate reasoning, planning, and memory through structured interactions. A PI agent leads the process, setting research directions and summarizing team discussions, while specialist agents contribute expertise from their defined scientific domains. Reasoning is achieved via iterative team and individual meetings, where agents collaborate, critique each other’s inputs, and refine their outputs across multiple rounds. Memory is implemented through agent-written summaries of previous meetings, which inform ongoing decisions, ensuring continuity and context throughout the research project [source]

Observation space: What is the system able to observe while ‘thinking’? The observation space includes inputs provided by the human researcher, such as agendas, scientific contexts, rules, and relevant datasets. During meetings, agents can observe each other’s contributions, critiques, and outputs, enabling collaborative reasoning. For specific tasks, agents can access computational tools like ESM, AlphaFold-Multimer, and Rosetta to analyze data and generate results. The system also uses memory elements, such as summaries of prior meetings, to maintain awareness of past discussions and decisions [source]

Action space/tools: What direct actions can the system take? The Virtual Lab is designed to handle a wide range of tasks through its AI agents. It can come up with and refine scientific ideas during team and one-on-one meetings, promoting collaboration across different fields. The system can also design workflows for research projects (designing nanobodies being the one exposed in the main article), using tools like ESM, AlphaFold-Multimer, and Rosetta. On top of that, it can write and run code to analyze data and fine-tune results, with feedback from the Scientific Critic agent to make sure everything stays up to standards [source]

User interface: How do users interact with the system? Users interact with the Virtual Lab primarily through an interface that allows them to provide high-level guidance. This includes defining the research agenda, setting goals, and specifying rules for the agents. The system also allows users to write summaries and provide feedback. The interactions are structured around meetings where users can input specific agendas and questions, which guide the agents in their collaborative problem-solving process [source]

Development cost and compute: What is known about the development costs? Unknown

Guardrails and oversight

Accessibility of components:

Weights: Are model parameters available? N/A; backends external model(s) via API
Data: Is data available? N/A; backends external model(s) via API
Code: Is code available? Available [source]
Scaffolding: Is system scaffolding available? Available [source]
Documentation: Is documentation available? Available [source]

Controls and guardrails: What notable methods are used to protect against harmful actions? The system employs iterative refinement, where outputs are improved over multiple rounds of discussion, with their Scientific Critic Agent providing feedback. Additionally, predefined rules and constraints guide the system’s actions, preventing harmful decisions [source]

Customer and usage restrictions: Are there know-your-customer measures or other restrictions on customers? None

Monitoring and shutdown procedures: Are there any notable methods or protocols that allow for the system to be shut down if it is observed to behave harmfully? Not formally defined, but there are detailed mechanisms of controls and guardrails that could potentially shut down the system if needed [source]

Evaluation

Notable benchmark evaluations: Unknown

Bespoke testing: The paper presenting Virtual Lab demonstrates designing nanobody binders to recent variants of SARS-CoV-2 [source]

Safety: Have safety evaluations been conducted by the developers? What were the results? None

Publicly reported external red-teaming or comparable auditing:

Personnel: Who were the red-teamers/auditors? None
Scope, scale, access, and methods: What access did red-teamers/auditors have and what actions did they take? None
Findings: What did the red-teamers/auditors conclude? None

Ecosystem information

Interoperability with other systems: What tools or integrations are available? ChatGPT (GPT-4o), ESM, AlphaFold-Multimer, and Rosetta to support tasks like protein design, structure prediction, and AI-driven scientific research [source]

Usage statistics and patterns: Are there any notable observations about usage? Unknown

Additional notes

None