OpenGUI

Project Url: Core-Mate/OpenGUI
Introduction: OpenGUI is an Android GUI agent framework for phone-use AI that can see, plan, and operate real mobile apps through the GUI.
More: Author   ReportBugs   OfficialWebsite   
Tags:

Language: English | 简体中文 | 日本語

OpenGUI banner

Bootstrap with Claude or Codex Multi-role operator system Tasks up to 12 hours Recommended model profiles Manual setup docs

A mobile GUI agent framework for Android.

OpenGUI helps AI agents see, understand, and operate Android app interfaces on real devices.

Demo video coming soon.

The first demo will show OpenGUI operating a real Android app on an Android device, including screen understanding, tapping, typing, and returning a structured result.

Quick Start

The fastest way to try OpenGUI is to let Claude Code or Codex bootstrap it for you.

Read ./skills/open-gui-bootstrap/SKILL.md and help me run OpenGUI. Only ask me for phone-side actions.

You will need:

  • an Android phone or emulator
  • USB debugging enabled
  • AccessibilityService enabled
  • model API keys for real task execution

OpenGUI will use the repository scripts to start the backend and install the Android client:

cd server
./start.sh
cd client
./start.sh

After the backend and Android client are running, send a first task:

cd server
pnpm opengui -- devices --json
pnpm opengui -- do "Observe the current Android screen and summarize what you see" --json

Manual setup guide: docs/get-started.md.

Recent Updates

  • [2026.5.16] Added Codex / Claude Code remote control with a local REST API, pnpm opengui -- ... CLI, and the open-gui-remote-control Skill for dispatching Android app tasks from coding agents.
  • [2026.5.9] Added a Discord IM channel for remote Android task dispatch, including prefix commands, slash commands, allowlists, and guild-scoped command registration.
  • [2026.5.7] Hardened local startup to avoid common PostgreSQL and Redis port conflicts during Docker-based backend setup.
  • [2026.5.1] Improved backend onboarding with .env.example, startup checks, and graph-agent VLM environment configuration.

What You Can Do with OpenGUI

OpenGUI provides an Android GUI agent stack for screen understanding, task planning, action execution, review, and recovery.

You can use the same repository in four practical ways:

  • Operate mainstream Android apps: let AI handle mobile tasks inside X, Reddit, Hacker News, Telegram, WeChat, Weibo, Xiaohongshu, and other Android apps on a real phone.

  • Run shipped workflows: the repository already includes a runnable backend, Android client, standby dispatch path, and a set of built-in task capabilities.

  • Let Claude or Codex bootstrap it for you: point the model at skills/open-gui-bootstrap/SKILL.md, describe the goal in plain language, and let it handle setup, build, install, and local debugging.

  • Let Codex control Android apps: after OpenGUI is running, point Codex or Claude Code at skills/open-gui-remote-control/SKILL.md to list devices, dispatch tasks, and track executions through the local CLI.

  • Operate phones as remote workers: dispatch tasks through Feishu, Telegram, Discord, or REST API, keep devices on standby, and get structured results back from the backend.

  • Join the Discord community

Highlights

  • Built for long-running tasks: OpenGUI is shaped for mobile workflows that may run for hours, with progress, review, and recovery kept inside the system.
  • The task can keep moving: Plan Supervisor maintains task state and continuation, Executor Graph runs screenshot, vision, action, and call-user loops on top of live device state, and Summarizer closes the run with a structured result.
  • Phones can stay on standby: the standby dispatch path lets devices receive remote work through Feishu, Telegram, Discord, or REST entry points.
  • Models can be assigned by role: model routing separates planning from VLM execution so teams can choose providers by job.
  • The system is organized around real mobile workflows: the graph, device execution path, and model split already exist in the source tree.

Why OpenGUI Is Different

OpenGUI is built as a mobile operator system with explicit orchestration layers.

The source code currently exposes these pieces:

  • server/apps/backend/src/modules/graph-agent/graph/mobile-agent.graph.ts for the main graph
  • server/apps/backend/src/modules/graph-agent/graph/executor.graph.ts for the device-side execution loop
  • server/apps/backend/src/common/ws/standby.gateway.ts for standby device dispatch
  • client/core_network/.../StandbySocketManager.kt for persistent device standby connections
  • client/core_accessibility/.../GestureService.kt for Android-side action execution
Dimension Typical phone-agent demo OpenGUI
Execution model Short interactive loop Main graph plus executor subgraph
Task state Usually local and session-bound Task state managed in the backend graph
Device path Often laptop-driven control Android client with standby and execution sockets
Model usage One model does most of the work Planning and VLM paths can be split across providers
Remote operation Optional add-on Feishu, Telegram, Discord, REST API, and standby dispatch are built into the backend

Typical Use Cases

  • Open X and collect recent posts for a topic
  • Read and summarize Reddit or Hacker News threads on a live phone
  • Trigger Android tasks remotely from Feishu, Telegram, Discord, or REST API
  • Execute repetitive mobile workflows on Android devices
  • Run long mobile workflows that need state, review, and recovery over many hours

Current Limitations

  • Requires an Android device or emulator.
  • Requires USB debugging and AccessibilityService permissions.
  • Execution quality depends on the model, app UI, network state, and task length.
  • Not an always-on OS-level assistant yet; tasks are currently triggered manually or through configured dispatch channels.
  • Long-running tasks are supported by the system design, but reliability still needs more real-world testing.
  • More ready-to-run task examples and benchmarks are still needed.

Roadmap

  • Add a short demo video and more real app examples.
  • Improve one-command local setup.
  • Add more ready-to-run phone-use task templates.
  • Improve execution recovery and failure reporting.
  • Add benchmark tasks for Android GUI agent reliability.
  • Expand docs for model configuration and cost-saving profiles.

How to Use OpenGUI

1. With Claude or Codex

Start with skills/open-gui-bootstrap/SKILL.md.

The intended flow is simple:

  1. point Claude or Codex at the skill
  2. describe the task in plain language
  3. let the model handle backend bootstrap, APK build, install, and local debugging

It should only stop for:

  • connecting a phone or starting an emulator
  • approving USB debugging
  • enabling AccessibilityService
  • granting overlay or battery permissions
  • providing API keys or bot credentials

After the backend and Android client are running, use skills/open-gui-remote-control/SKILL.md to let Codex or Claude Code control the phone through the local CLI:

cd server
pnpm opengui -- devices --json
pnpm opengui -- do "Observe the current Android screen and summarize what you see" --json
pnpm opengui -- status <executionId> --json

Recommended profiles:

High-performance profile

Use the latest Claude Opus model family across planning, supervision, review, and vision when you want the strongest overall quality.

This is the easiest way to get the best execution quality, and it is the most expensive path.

Cost-saving mixed profile

Use Qwen 3.6 Plus for text-side roles such as Planner and Supervisor, and use Doubao Pro for the VLM side.

This usually preserves the overall system shape while lowering model cost by roughly 10x to 15x compared with an all-Opus setup, depending on task length, screenshot volume, and token mix.

Recommended prompts:

Run it

Read ./skills/open-gui-bootstrap/SKILL.md and help me run OpenGUI. Only ask me for phone-side actions.

Use Claude Opus everywhere

Read ./skills/open-gui-bootstrap/SKILL.md and bootstrap OpenGUI with the latest Claude Opus model family for planning, supervision, review, and vision.

Use Qwen + Doubao to save cost

Read ./skills/open-gui-bootstrap/SKILL.md and set up OpenGUI with Qwen 3.6 Plus for Planner and Supervisor, and Doubao Pro for VLM execution.

Use my own APIs

Read ./skills/open-gui-bootstrap/SKILL.md and use my existing model APIs to get OpenGUI working.

2. Manual setup

Use the repository scripts directly:

cd server
./start.sh
cd client
./start.sh

Reference docs:

3. Optional Discord remote control

Discord can be enabled as an optional IM channel. A Discord bot receives commands such as !opengui devices or !opengui do ..., then the backend dispatches the task to a standby Android phone and posts progress back to the same channel.

This is not required for local use. If DISCORD_BOT_TOKEN is empty, the backend starts normally and skips Discord.

Full setup guide: docs/DISCORD.md.

The System

flowchart LR
    U["User or IM command"] --> BS["Bootstrap Skill / API / IM entry"]
    BS --> SP["Plan Supervisor"]

    SP --> EX["Executor Graph"]
    EX --> AC["Android Client"]
    AC --> GX["AccessibilityService + screenshots + actions"]
    EX --> RV["Execution review and retry"]
    RV --> SP

    SP --> SM["Summarizer"]
    SM --> SR["Structured Results"]

    RD["Feishu / Telegram / Discord / REST API"] --> ST["Standby Gateway"]
    ST --> AC

    SP --> MR["Model Routing"]
    MR --> MA["Claude / GPT / Gemini / Kimi / MiniMax / compatible"]
    EX --> MR

Core Runtime Pieces

  • Backend graph: server/apps/backend/src/modules/graph-agent/graph/
  • Task APIs: server/apps/backend/src/modules/task/task.controller.ts
  • Standby dispatch: server/apps/backend/src/common/ws/standby.gateway.ts
  • IM channel dispatch: server/apps/backend/src/modules/im-channel/
  • Android standby connection: client/core_network/src/main/java/com/coremate/opengui/network/websocket/StandbySocketManager.kt
  • Android execution path: client/core_accessibility/src/main/java/com/coremate/opengui/accessibility/GestureService.kt

Documentation

Community / Support

The most useful project feedback is:

  • open issues for bugs and feature requests
  • share real use cases and deployment feedback
  • contribute docs, integrations, and fixes

License

OpenGUI is source-available under the Business Source License 1.1 (BUSL-1.1).

You may copy, modify, distribute, and use the source for non-production purposes. Production use, commercial use, hosted services, and integration into commercial products require a separate commercial license from Core-Mate.

For this version:

  • Change Date: 2030-04-29
  • Change License: Apache License, Version 2.0

This is public source, but it is not OSI-approved open source until the Change Date.

See LICENSE.

Apps
About Me
GitHub: Trinea
Facebook: Dev Tools
AI Daily Digest