Octo

Basic information

Website: https://arxiv.org/abs/2405.12213

Short description: Octo is an open-source generalist robot policy designed for robotic manipulation tasks. There are two released Octo-models, Octo-small and Octo-base, both transformer models with 27M and 93M parameters respectively.

Intended uses: What does the developer say it’s for? Robotic control.

Date(s) deployed: First paper released on May 20, 2024 [source]

Developer

Website: https://web.archive.org/web/20250105113842/https://octo-models.github.io/

Legal name: University of California Berkeley (et al.) [source]

Entity type: Academic Institution(s)

Country (location of developer or first author’s first affiliation): California, USA [source]

Safety policies: What safety and/or responsibility policies are in place? None

System components

Backend model: What model(s) are used to power the system? The main Octo model is trained from scratch. However, language inputs are first processed by an 11M parameter t5-base model, and the resulting embeddings are processed by the Octo model.

Publicly available model specification: Is there formal documentation on the system’s intended uses and how it is designed to behave in them? None

Reasoning, planning, and memory implementation: How does the system ‘think’? Octo maps natural language or image depiction of goal states, and image representations of the current state. to robot actions. No explicit planning is used beyond what is learnt internally from the training data.

Observation space: What is the system able to observe while ‘thinking’? Textual or image inputs describing goal states, and image inputs describing the current world state.

Action space/tools: What direct actions can the system take? The action space is flexible. The model outputs action embeddings that are converted to specific actions by task specific action heads (that are diffusion based).

User interface: How do users interact with the system? N/A; an engineering project

Development cost and compute: What is known about the development costs? Octo-base “was trained for 300k steps with a batch size of 2048 using a TPU v4-128 pod, which took 14 hours. A finetuning run of the same model on a single NVIDIA A5000 GPU with 24GB of VRAM takes approximately 5 hours and can be sped up with multi-GPU training.”

Guardrails and oversight

Accessibility of components:

Weights: Are model parameters available? Open source [source].
Data: Is data available? Octo is trained on a curated subset of the Open X-Embodiment dataset.
Code: Is code available? Available [source].
Scaffolding: Is system scaffolding available? Available [source].
Documentation: Is documentation available? Unavailable, but they have a technical report [source].

Controls and guardrails: What notable methods are used to protect against harmful actions? None

Customer and usage restrictions: Are there know-your-customer measures or other restrictions on customers? None

Monitoring and shutdown procedures: Are there any notable methods or protocols that allow for the system to be shut down if it is observed to behave harmfully? The model has no shutdown procedures, however it is a base model.

Evaluation

Notable benchmark evaluations: The authors evaluate Octo’s ability across 9 robot learning tasks, testing both 0-shot and task specific finetuning performance. The authors find performance comparable to or exceeding RT-1-X and RT-2-X [source]

Bespoke testing: The authors evaluate Octo’s ability across 9 robot learning tasks, testing both 0-shot and task specific finetuning performance. The authors find performance comparable to or exceeding RT-1-X and RT-2-X [source]

Safety: Have safety evaluations been conducted by the developers? What were the results? None

Publicly reported external red-teaming or comparable auditing:

Personnel: Who were the red-teamers/auditors? None
Scope, scale, access, and methods: What access did red-teamers/auditors have and what actions did they take? None
Findings: What did the red-teamers/auditors conclude? None

Ecosystem information

Interoperability with other systems: What tools or integrations are available? Octo can be finetuned to control different manipulation robots, specifically by finetuning new action heads.

Usage statistics and patterns: Are there any notable observations about usage? The github repository for the Octoi bas 160 forks, and 839 stars [source].

Additional notes

None