Mobile-Agent

Browser

Alibaba

Product overview

Name of Agent: Mobile-Agent

Short description of agent: MobileAgent-v3 is a multi-modal and multi-platform GUI agent built upon GUI-Owl series models. (link)

Date of release: 10/03/2024: Initial release of Mobile-Agent 20/08/2025: Mobile-Agent-v3 and GUI-Owl release (link)

Advertised use: "UI Agent is a multimodal agent solution that operates devices such as mobile phones, in-vehicle systems, TVs, and PCs based on screen understanding. It supports cross-application operations, requires no training, and is plug-and-play." (link, archived) (translated by Gemini 3 Pro)

Monetisation/Usage price: Free for a limited time (link) (accessed on 2025-12-10)

Who is using it?: Open‑source users and researchers

Website: GitHub repository – https://github.com/X-PLUG/MobileAgent

Category: Browser

Company & accountability

Developer: Alibaba

Name of legal entity: Alibaba (China) Co., Ltd. (link, archived)

Place of legal incorporation: Hangzhou, China (link, archived)

For profit company?: Yes (link, archived)

Parent company?: Not applicable

Governance documents analysis: No dedicated terms‑of‑service or privacy‑policy for the agent could be found.

AI safety/trust framework: Alibaba’s AI Governance and Sustainable Development Research Center (AAIG) link

Compliance with existing standards: None found

Technical capabilities & system architecture

Model specifications: "Based on GUI-Owl, a native end-to-end multimodal agent designed as a foundational model for GUI automation." (link)

Documention: (API Documentation, archived), GitHub README

Observation space: User instructions, interaction history and screenshots (Section 6.3 Problem Definition of Trajectory Correctness Judgment (link))

Action space: "key, click, long_press, swipe, type, answer, system_button, open, wait, terminate" (Table 6: Action Space of GUI-Owl on Mobile (link))

Memory architecture: - Long‑term memory is used to retain history across interactions. Used a Notetaker Agent to maintain persistent contextual memory. (Figure 7 in link) - No evidence of episodic memory across different conversations

User interface and interaction design: Chatbot on the left panel, with the GUI of virtual PC/browser/phone on the right panel, judging from the demo videos. There does not seem to be a UI to customers yet, as this is still a open-source project. GUI-Owl API is available on Developer Platform (link)

User roles: Operator (directing the agent to complete tasks)

Component accessibility: Open sourced code and model (link)

Autonomy & control

Autonomy level and planning depth: L4-L5. There is limited information about autonomy level. Can only judge from the demo.

User approval requirements for different decision types: None found

Execution monitoring, traces, and transparency: Visible CoT and action trace documenting all activity. Each requests to GUI models will be recorded and displayed on the left panel, including the request ID, screenshot, reasoning CoT, actions. (based on demo video (link))

Emergency stop and shut down mechanisms and user control: Doesn't seem to be an option on the UI to stop agent after it starts running

Usage monitoring and statistics and patterns: None found

Ecosystem interaction

Identify to humans?: None

Identifies technically?: None found

Interoperability standards and integrations: - No mentions of AGNTCY, Agent Connect Protocol (ACP), Model Context Protocol (MCP), and Agent2Agent (A2A) protocol anywhere.

Web conduct: None found

Safety, evaluation & impact

Technical guardrails and safety measures: None found

Sandboxing and containment approaches: Alibaba’s Mobile Use docs (used with MobileAgent-style mobile operation) define an explicit sandbox lifecycle via create_sandbox and kill_sandbox, with sessions referenced by sandbox_id (link, archived)

What types of risks were evaluated?: None found

(Internal) safety evaluations and results: None found

Third-party testing, audits, and red-teaming: None found

Benchmark performance and demonstrated capabilities: GUI automation benchmarks, including ScreenSpot-v2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. (link)

Bug bounty programmes and vulnerability disclosure: None found

Any known incidents?: None found

← Back to 2025 Index