The AI Agent Index

Documenting the technical and safety features of deployed agentic AI systems

XBOW


Basic information

Website: https://web.archive.org/web/20241231073112/https://xbow.com/

Short description: XBOW autonomously identifies vulnerabilities/exploits in web settings and produces patches [source]

Intended uses: What does the developer say it’s for? Improving offensive security on the web [source]

Date(s) deployed: Announced in a blog post on July 15, 2024 [source], but has not yet publicly launched [source]


Developer

Website: https://web.archive.org/web/20241231073112/https://xbow.com/

Legal name: XBOW USA, Inc [source]

Entity type: Corporation [source]

Country (location of developer or first author’s first affiliation): Incorporation: Delaware, USA (XBOW USA Inc. 3350735) [source]

Safety policies: What safety and/or responsibility policies are in place? Unknown


System components

Backend model: What model(s) are used to power the system? Unknown

Publicly available model specification: Is there formal documentation on the system’s intended uses and how it is designed to behave in them? None

Reasoning, planning, and memory implementation: How does the system ‘think’? The system is given access to source code on a local machine and prompted to find an exploit; it identities strategies, and writes and executes code to test its strategies, e.g [source]

Observation space: What is the system able to observe while ‘thinking’? XBOW can observe the outputs of its code execution and observe files on the local machine [source]

Action space/tools: What direct actions can the system take? XBOW can write and execute code and navigate on the local machine [source]

User interface: How do users interact with the system? Users provide prompts to the system and can observe the system’s outputs and code execution [source]

Development cost and compute: What is known about the development costs? Unknown


Guardrails and oversight

Accessibility of components:

  • Weights: Are model parameters available? Unknown
  • Data: Is data available? Unknown
  • Code: Is code available? Closed source
  • Scaffolding: Is system scaffolding available? Closed source
  • Documentation: Is documentation available? Unavailable

Controls and guardrails: What notable methods are used to protect against harmful actions? “We will only make our technology available to trusted customers in the cloud. It is not possible to run XBOW as a standalone application outside our control.” [source]

Customer and usage restrictions: Are there know-your-customer measures or other restrictions on customers? XBOW is not currently available to external users, and will only be made available to ‘trusted customers’ [source]

Monitoring and shutdown procedures: Are there any notable methods or protocols that allow for the system to be shut down if it is observed to behave harmfully? Unknown


Evaluation

Notable benchmark evaluations: Passes 75 percent of assorted web benchmarks including PortSwigger, PentesterLab, and novel ones [source]; list of all benchmarks available [source]

Bespoke testing: Various demos/example outputs, e.g [source]

Safety: Have safety evaluations been conducted by the developers? What were the results? None

Publicly reported external red-teaming or comparable auditing:

  • Personnel: Who were the red-teamers/auditors? None
  • Scope, scale, access, and methods: What access did red-teamers/auditors have and what actions did they take? None
  • Findings: What did the red-teamers/auditors conclude? None

Ecosystem information

Interoperability with other systems: What tools or integrations are available? None

Usage statistics and patterns: Are there any notable observations about usage? Not available to external users [source]


Additional notes

None