Codex

Chat

OpenAI

Product overview

Name of Agent: Codex
Short description of agent: "Codex CLI is a coding agent that you can run locally from your terminal and that can read, modify, and run code on your machine, in the chosen directory" (link, archived)
Date of release: 16/05/2025 (link, archived)
Advertised use: "navigates your repo to edit files, run commands, and execute tests. Ship new features, fix bugs, brainstorm solutions, or tackle whatever’s next" (link, archived)
Monetisation/Usage price: 20, plus 200, pro greater rate limits business"
Who is using it?: end user and enterprise customers for coding and prototyping
Category: Chat

Company & accountability

Developer: OpenAI
Name of legal entity: OpenAI, L.L.C. (link, archived)
Place of legal incorporation: Delaware
For profit company?: Yes
Parent company?: For-profit LLC falls within the OpenAI Group (PBC) which is controlled by OpenAI Foundation (26% vs Microsoft's 27%, rest going to staff)
Governance documents analysis: Terms and Policies (link, archived)(general to OpenAI, not product specific)
AI safety/trust framework: Preparedness Framework (link, archived)
Compliance with existing standards: unsure if this is distinct from regular

Technical capabilities & system architecture

Model specifications: With the Codex CLI and IDE extension we recommend using GPT‑5, which is the default model. You can choose the reasoning level. You can also authenticate via API key to use our older models. (link, archived)
Observation space: File system, command line, web search, MCP
Action space: File system, command line, MCP
Memory architecture: Hierarchical memory through markdown documents (link)
User interface and interaction design: Chatbot interfacei n the CLI
User roles: Operator (issues queries, which the agent to responds to); Executor (user may take actions/make decisions based on outputs);
Component accessibility: Open source (link), for the agentic harness, the models are closed source

Autonomy & control

Autonomy level and planning depth: L1-L3: Tasks that users assigns to the agent are often narrow in scope. More complex tasks need multi-turn conversations where the user is in charge of planning. Agent always comes back to the user and awaits further instructions. Agent can create a plan and execute on it across multiple steps. "Codex only interrupts you when it needs to leave the workspace or rerun something outside the sandbox (link)"
User approval requirements for different decision types: "Codex starts conservatively. Until you explicitly tell it a working directory is trusted, the CLI defaults to read-only. Codex can inspect files and answer questions, but every edit or command requires approval. When you mark a working directory as trusted (for example via the onboarding prompt or /approvals → “Trust this directory”), Codex upgrades the default preset to Agent, which allows writes inside the workspace. Codex only interrupts you when it needs to leave the workspace or rerun something outside the sandbox." source
Execution monitoring, traces, and transparency: Visible (albeit summarized, there is the option of "transcript mode (link, archived)" with more details) chain of thought and summaries of actions
Emergency stop and shut down mechanisms and user control: User can pause/stop the agent at any time
Usage monitoring and statistics and patterns: User can see how much context is used

Ecosystem interaction

Identify to humans?: None found
Identifies technically?: None found
Interoperability standards and integrations: MCP support (link, archived)
Web conduct: None found

Safety, evaluation & impact

Technical guardrails and safety measures: safeguards "including both the model safety training described above and scaling up our monitoring and enforcement pipeline to disrupt potential misuse." page 10 addendum to system card (link, archived)
Sandboxing and containment approaches: Yes, using OS specific sandbox by default (link, archived)
What types of risks were evaluated?: cyber, data destruction, cbrn (link, archived)
(Internal) safety evaluations and results: Addendum to system card for Codex (link, archived)
Third-party testing, audits, and red-teaming: External Evaluations by Irregular on "Vulnerability Research and Exploitation, Network Attack Simulation, Evasion" page 15-16 addendum (link, archived)
Benchmark performance and demonstrated capabilities: see addendum and system card
Bug bounty programmes and vulnerability disclosure: None found
Any known incidents?: None found