An Intelligent Phone That Never Sleeps
Let AI handle your phone-use tasks β locally, privately, autonomously.
Website Β· Documentation Β· Quickstart Β· Discord Β· Reddit Β· Contributing
Imagine having a second phone that works for you around the clock β replying to messages, handling payments, playing games, posting on social media β all while your real phone stays safely in your pocket, untouched.
OpenPocket is an open-source framework that lets AI agents operate an Android phone on your behalf. Tell it what you want in plain language, and it figures out the rest β tapping, scrolling, typing, navigating between apps β just like a real person would.
- Local-first β everything runs on your machine; your data never leaves your computer.
- Privacy by default β the agent phone and your personal phone are completely isolated. Sensitive actions require explicit human approval.
- Emulator + real device β connect to Android emulators or physical phones over ADB. Run multiple agents against multiple targets to build your own local phone farm.
- Extensible β add new capabilities through a single
SKILL.mdfile, or build your own agent workflows.
| Social Auto-manage social media 4.mp4 |
Gaming Play mobile games autonomously 2.mp4 |
Utility Payment Handle bills and payments 3.mp4 |
| Studying Assist with learning tasks 1.mp4 |
Social Autopilot your X 5.mp4 |
Utility Find best price/travel option 6.mp4 |
- Multi-model β works with OpenAI GPT-5.x, Claude 4.6, Gemini 3.x, DeepSeek, Qwen, GLM, Kimi, MiniMax, Doubao, and more.
- Multi-agent β run multiple isolated agents, each with its own config, workspace, target device, and session state.
- Scheduled jobs β create cron tasks from chat or CLI in natural language (e.g. "Every day at 8am open Slack and check in").
- Human-auth relay β sensitive actions (camera, payments, location) escalate to you for approval through a private local relay.
- Channel integrations β receive tasks and results through Telegram, Discord, WhatsApp, or CLI.
- Skills framework β extend agent capabilities by dropping a
SKILL.mdinto the skills directory β no code changes needed.
npm install -g openpocket
openpocket onboardgit clone git@github.com:pockebot/openpocket.git
cd openpocket
npm install
npm run build
./openpocket onboardThen start the agent gateway:
openpocket gateway startOr run a one-off task directly:
openpocket agent --model gpt-5.2-codex "Open Chrome and search weather"For full setup details see the Quickstart guide, Device targets, and Configuration.
openpocket create agent review-bot --type physical-phone --device R5CX123456A
openpocket create agent ops-bot --type emulator
openpocket agents listTarget a specific agent with --agent:
openpocket --agent review-bot gateway start
openpocket --agent review-bot config-show
openpocket --agent review-bot target show
openpocket --agent review-bot channels login --channel discordFrom chat or CLI, describe a schedule in natural language. OpenPocket confirms before persisting. Jobs run in isolated cron:<jobId> sessions.
openpocket cron list
openpocket cron add --id daily-slack-checkin \
--name "Daily Slack Check-in" \
--cron "0 8 * * *" --tz Asia/Shanghai \
--task "Open Slack and complete check-in" \
--channel telegram --to 12345
openpocket cron disable --id daily-slack-checkinopenpocket dashboard manager
openpocket human-auth-relay startdashboard managerβ overview of all agents, targets, channels, and gateway status.human-auth-relay startβ shared relay hub for human-auth approval flows, with optional ngrok public URL.
openpocket target show
openpocket target set --type emulator
openpocket target set --type physical-phone
openpocket target pair --host <device-ip> --pair-port <pair-port> --code <pairing-code> --type physical-phoneModel configuration is per-agent. New agents inherit from the onboard template; each can diverge independently.
openpocket model show
openpocket model list
openpocket model set --name gpt-5.4
openpocket --agent review-bot model set --provider google --model gemini-3.1-pro-previewTune log level, payload redaction, and per-module output in your agent config:
{
"gatewayLogging": {
"level": "info",
"includePayloads": false,
"maxPayloadChars": 160,
"modules": {
"core": true,
"access": true,
"task": true,
"channel": true,
"cron": true,
"heartbeat": false,
"humanAuth": true,
"chat": false
}
}
}Full CLI reference: CLI and Gateway | Filesystem layout
flowchart LR
U["User Surfaces\nCLI / Telegram / Dashboard"] --> M["Manager Layer\nagent registry + ports + target locks"]
M --> D0["Default Agent\nconfig + workspace + state"]
M --> D1["Managed Agent A\nconfig + workspace + state"]
M --> D2["Managed Agent B\nconfig + workspace + state"]
D0 --> G0["Gateway + Dashboard"]
D1 --> G1["Gateway + Dashboard"]
D2 --> G2["Gateway + Dashboard"]
G0 --> A0["Agent Runtime"]
G1 --> A1["Agent Runtime"]
G2 --> A2["Agent Runtime"]
A0 --> T0["Target A"]
A1 --> T1["Target B"]
A2 --> T2["Target C"]
A0 --> H0["Private Local Relay"]
A1 --> H1["Private Local Relay"]
A2 --> H2["Private Local Relay"]
H0 --> RH["Shared Relay Hub\noptional ngrok"]
H1 --> RH
H2 --> RH
| # | Component | What it does | Docs |
|---|---|---|---|
| 1 | Multi-agent manager | Registry, port allocation, and target locks for one default + N managed agents | Multi-agent, Filesystem |
| 2 | Gateway orchestration | Per-agent gateway, dashboard, session store, channel credentials, task queue | CLI & Gateway, Runbook |
| 3 | Prompting & model loop | System/user prompt composition, context budgeting, model-driven step execution | Prompting, Prompt templates |
| 4 | Tool execution | ADB phone actions, coding tools, memory tools, and user-defined scripts | Action schema, Scripts, Skills |
| 5 | Human authorization | Capability probe + escalation for sensitive actions, shared relay hub with per-agent state | Remote human auth |
| 6 | Device target abstraction | One target per agent, mutual exclusion enforced by the manager | Device targets |
| 7 | Persistence & audit | Sessions, memory, screenshots, relay state, artifacts β all inside the agent workspace | Filesystem, Session & memory |
| 8 | Runtime operations | Manager dashboard, per-agent dashboards, keep-awake heartbeat, troubleshooting playbooks | Runbook, Troubleshooting, Screen awake |
We'd love your help making OpenPocket better. Whether it's fixing a typo, improving docs, adding a new skill, or building a major feature β every contribution matters.
- Fork the repo and create a feature branch.
- Install dependencies and run the dev checks:
npm install
npm run check
npm run test
npm run smoke:dual-side
npm run docs:build- Keep changes focused β one concern per PR.
- Add or update tests for any behavioral change.
- Open a PR with context on what changed, why, and how you verified it.
- Good first issues β look for issues labeled
good first issuein the issue tracker. - Skills β write a new
SKILL.mdto teach the agent a new phone workflow. See the Skills guide. - Model adapters β add or improve support for new LLM providers.
- Channel integrations β improve Telegram, Discord, or WhatsApp support, or add new channels.
- Docs β fix gaps, add examples, or translate content.
Read the full guidelines before opening a PR: CONTRIBUTING.md | contribution.md
This project is licensed under the MIT License β see LICENSE for details.
Special thanks to the open-source projects that make OpenPocket possible:
- pi-mono ecosystem by Mario Zechner β
@mariozechner/pi-agent-core,@mariozechner/pi-ai,@mariozechner/pi-coding-agent - Messaging SDKs β
node-telegram-bot-api,discord.js,baileys - Core runtime β
openai,@modelcontextprotocol/sdk,zod,@sinclair/typebox,sharp,qrcode - Docs & tooling β
vitepress,mermaid,typescript,tsx
And thanks to all contributors and community members who report issues, share real-world scenarios, and help shape the project. We're building this together.
