Mobile-Agent
BrowserAlibaba
Product overview
Name of Agent: Mobile-Agent
Short description of agent: MobileAgent-v3 is a multi-modal and multi-platform GUI agent built upon GUI-Owl series models. (link)
Date of release: 10/03/2024: Initial release of Mobile-Agent
20/08/2025: Mobile-Agent-v3 and GUI-Owl release (link)
Monetisation/Usage price: Free for a limited time (link) (accessed on 2025-12-10)
Who is using it?: Open‑source users and researchers
Website: GitHub repository – https://github.com/X-PLUG/MobileAgent
Category: Browser
Company & accountability
Developer: Alibaba
Parent company?: Not applicable
Governance documents analysis: No dedicated terms‑of‑service or privacy‑policy for the agent could be found.
AI safety/trust framework: Alibaba’s AI Governance and Sustainable Development Research Center (AAIG) link
Compliance with existing standards: None found
Technical capabilities & system architecture
Model specifications: "Based on GUI-Owl, a native end-to-end multimodal agent designed as a foundational model for GUI automation." (link)
Documention: (API Documentation, archived), GitHub README
Observation space: User instructions, interaction history and screenshots (Section 6.3 Problem Definition of Trajectory Correctness Judgment (link))
Action space: "key, click, long_press, swipe, type, answer, system_button, open, wait, terminate" (Table 6: Action Space of GUI-Owl on Mobile (link))
Memory architecture: - Long‑term memory is used to retain history across interactions. Used a Notetaker Agent to maintain persistent contextual memory. (Figure 7 in link)
- No evidence of episodic memory across different conversations
User interface and interaction design: Chatbot on the left panel, with the GUI of virtual PC/browser/phone on the right panel, judging from the demo videos. There does not seem to be a UI to customers yet, as this is still a open-source project. GUI-Owl API is available on Developer Platform (link)
User roles: Operator (directing the agent to complete tasks)
Component accessibility: Open sourced code and model (link)
Autonomy & control
Autonomy level and planning depth: L4-L5. There is limited information about autonomy level. Can only judge from the demo.
User approval requirements for different decision types: None found
Execution monitoring, traces, and transparency: Visible CoT and action trace documenting all activity. Each requests to GUI models will be recorded and displayed on the left panel, including the request ID, screenshot, reasoning CoT, actions. (based on demo video (link))
Emergency stop and shut down mechanisms and user control: Doesn't seem to be an option on the UI to stop agent after it starts running
Usage monitoring and statistics and patterns: None found
Ecosystem interaction
Identify to humans?: None
Identifies technically?: None found
Interoperability standards and integrations: - No mentions of AGNTCY, Agent Connect Protocol (ACP), Model Context Protocol (MCP), and Agent2Agent (A2A) protocol anywhere.
Web conduct: None found
Safety, evaluation & impact
Technical guardrails and safety measures: None found
What types of risks were evaluated?: None found
(Internal) safety evaluations and results: None found
Third-party testing, audits, and red-teaming: None found
Benchmark performance and demonstrated capabilities: GUI automation benchmarks, including ScreenSpot-v2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. (link)
Bug bounty programmes and vulnerability disclosure: None found
Any known incidents?: None found