Octo
Basic Information
Website: https://arxiv.org/abs/2405.12213
Short description: Octo is an open-source generalist robot policy designed for robotic manipulation tasks. There are two released Octo-models, Octo-small and Octo-base, both transformer models with 27M and 93M parameters respectively.
Intended uses: What does the developer state that the system is intended for?: Robotic control.
Date(s) deployed: First paper released on May 20, 2024 [source]
Developer
Website: https://octo-models.github.io/
Legal name: University of California Berkeley (et al.) [source]
Entity type: Academic Institution(s)
Country (location of developer or first author's first affiliation): California, USA [source]
Safety policies: What safety and/or responsibility policies are in place?: None
System Components
Backend model(s): What model(s) are used to power the system?: The main Octo model is trained from scratch. However, language inputs are first processed by an 11M parameter t5-base model, and the resulting embeddings are processed by the Octo model.
Public model specification: Is there formal documentation on the system’s intend...: None
Description of reasoning, planning, and memory implementation: How does the syst...: Octo maps natural language or image depiction of goal states, and image representations of the current state. to robot actions. No explicit planning is used beyond what is learnt internally from the training data.
Observation space: What is the system able to observe while 'thinking'?: Textual or image inputs describing goal states, and image inputs describing the current world state.
Action space/tools: What direct actions can the system take?: The action space is flexible. The model outputs action embeddings that are converted to specific actions by task specific action heads (that are diffusion based).
User interface: How do users interact with the system?: N/A; an engineering project
Development cost and compute: What is known about the development costs?: Octo-base "was trained for 300k steps with a batch size of 2048 using a TPU v4-128 pod, which took 14 hours. A finetuning run of the same model on a single NVIDIA A5000 GPU with 24GB of VRAM takes approximately 5 hours and can be sped up with multi-GPU training."
Guardrails & Oversight
Accessibility of components
Weights: Are model parameters available?: Open source [source].
Data: Is data available?: Octo is trained on a curated subset of the Open X-Embodiment dataset.
Code: Is code available?: Available [source].
Documentation: Is documentation available?: Unavailable, but they have a technical report [source].
Scaffolding: Is system scaffolding available?: Available [source].
Controls and guardrails: What notable methods are used to protect against harmfu...: None
Monitoring and shutdown procedures: Are there any notable methods or protocols t...: The model has no shutdown procedures, however it is a base model.
Customer and usage restrictions: Are there know-your-customer measures or other ...: None
Evaluation
Notable benchmark evaluations (e.g., on SWE-Bench Verified): The authors evaluate Octo's ability across 9 robot learning tasks, testing both 0-shot and task specific finetuning performance. The authors find performance comparable to or exceeding RT-1-X and RT-2-X [source]
Bespoke testing (e.g., demos): The authors evaluate Octo's ability across 9 robot learning tasks, testing both 0-shot and task specific finetuning performance. The authors find performance comparable to or exceeding RT-1-X and RT-2-X [source]
Safety: Have safety evaluations been conducted by the developers? What were the ...: None
Publicly reported external red-teaming or comparable auditing
Personnel: Who were the red-teamers/auditors?: None
Scope, scale, access, and methods: What access did red-teamers/auditors have and...: None
Findings: What did the red-teamers/auditors conclude?: None
Ecosystem
Interoperability with other systems: What tools or integrations are available?: Octo can be finetuned to control different manipulation robots, specifically by finetuning new action heads.
Usage statistics and patterns: Are there any notable observations about usage?: The github repository for the Octoi bas 160 forks, and 839 stars [source].
Other notes (if any): --