🚀 UIPro: Unleashing Superior Interaction Capability For GUI Agents

🎯 ICCV 2025 • Next-Generation AI GUI Automation

📢 News

[2025-06-28] 🎉 UIPro has been accepted to ICCV 2025!
[2025-11-23] Uploaded UIPro models.
[2025-11-23] Uploaded data processing scripts, and systematic denoising procedures for AITW, AITZ, MobileViews, WAE, WebUI, MultiUI, AndroidControl, GUIOdyssey, AMEX, GUIAct
[2025-11-27] Uploaded data processing scripts, and systematic denoising procedures for SeeClick-Web, RefExp
[2025-12-10] Uploaded data processing scripts, and systematic denoising procedures for MOTIF, RefExp, and GUIEnv
[TODO] Upload whole datasets

🌟 Revolutionary GUI Agent Technology

UIPro represents a paradigm shift in GUI automation, achieving human-level interaction capabilities across multiple platforms through advanced AI.

🎨 What Makes UIPro Special

🏗️ Architecture & Training Pipeline

A two-stage revolutionary approach to GUI agent development

🔄 Two-Stage Training Process

The training process involves a sophisticated pipeline designed to enhance both understanding and execution capabilities.

🎯 Core Capabilities

🧠 GUI Understanding Capabilities

Capability	Description	Performance
🎯 Element Grounding	Accurately locates UI elements based on descriptions	⭐⭐⭐⭐⭐
🔍 Functionality Recognition	Understands purpose and function of interface components	⭐⭐⭐⭐⭐
🧭 Intent Mapping	Connects user intentions to appropriate UI interactions	⭐⭐⭐⭐⭐

🤖 GUI Agent Task Execution

Capability	Description	Performance
📋 Task Planning	Breaks down complex requests into actionable steps	⭐⭐⭐⭐⭐
⚡ Action Execution	Performs clicks, typing, scrolling with high precision	⭐⭐⭐⭐⭐
🌐 Cross-Platform Navigation	Seamless operation across different device types	⭐⭐⭐⭐⭐

📊 Performance Benchmarks

Industry-leading results across all major GUI benchmarks

🏆 GUI Agent Task Evaluation

🎯 Benchmark	🤖 UIPro-SLiME (3B)	🚀 UIPro-Qwen2VL (7B)	📊 Metric
AITW	68.0%	70.4%	Step SR
AndroidControl	61.1%	85.5%	Step SR
GUIAct-Web	68.2%	69.1%	Step SR
Mind2Web	28.7%	48.4%	Step SR

Step Success Rate (Step SR) - Higher is better

🚀 Quick Start Guide

Get up and running with UIPro in minutes

📦 Installation

🔧 Setup Instructions

1️⃣ Clone Repository

git clone https://github.com/ZJULiHongxin/UIPro.git
cd UIPro

2️⃣ Install Dependencies

pip install -r requirements.txt

📚 Dataset: The Foundation of Excellence

The world's largest and most comprehensive GUI understanding collection

📊 Dataset Statistics

Metric	Value	Description
📝Task Samples	20.6M	GUI understanding tasks
🖼️Screenshots	2.5M	Unique GUI screenshots
🎯Elements	3.3M	Clean GUI elements
🔢Task Types	13	Different task categories

🏗️ Data Compilation Pipeline

We provide comprehensive scripts to process various GUI datasets. Please follow the instructions below for each dataset.

Note: We also implemented a systematic denoising procedure to ensure data quality, removing up to 29% of noise from some data sources.

GUI Understanding SFT Data Processing

📱 MobileViews

Download the MobileViews raw data from HuggingFace via:

hf download mllmTeam/MobileViews --repo-type dataset --local-dir ./MobileViews

Unzip and organize the data as follows:

root/
├── MobileViews/
│   ├── MobileViews_0-150000/
│   ├── MobileViews_0-150000.csv
│   ├── MobileViews_150001-291197/
│   ├── MobileViews_150001-291197.csv
│   └── ...

Modify MOBILEVIEWS_DIR, ROOT, SCALE (coordinate scale), and PROB_BOX (proportion of the box-prediction samples) in utils/data_utils/make_mobileviews_data/extract_and_generate_mobilebiews_data.py.
Run the processing script (this may take ~48 hours due to the large number of screenshots):
```
python utils/data_utils/make_mobileviews_data/extract_and_generate_mobilebiews_data.py
```
Processed training samples will be saved in ROOT/mobileviews_processed.
Finally, run utils/data_utils/make_mobileviews_data/run_generate_symlinks.sh to create a unified folder for screenshots.

📱 WAE

Download the WAE raw data from WAE DropBox.

Merge, unzip and organize the data as follows:

root/
├── WAE/
│   ├── COM.HSBFREE_25-output
│   ├── Com.sktelecom.minit_52-output
│   ├── Draziw.Button.Mines_71-output
│   ├── HBVerbrauchszaehler.lite_119-output
│   └── ...

Modify WAE_DIR, ROOT_DIR, SCALE (coordinate scale), and PROB_BOX (proportion of the box-prediction samples) in utils/data_utils/make_WAE_data/make_WAE_data.py.
Run the processing script (this may take ~24 hours due to the large number of screenshots):
```
python utils/data_utils/make_WAE_data/make_WAE_data.py
```
Processed training samples will be saved in ROOT/WAE_processed.

📱 WebUI

Download the WebUI raw data from biglab/webui-all.

Merge, unzip and organize the data as follows:

root/
├── WebUI/
│   ├── dh2 (GUI metadata resulting from unzipping)
│   ├── WebUI_screenshots (A folder used to save processed GUI screenshots)

Modify WEBUI_DIR, WEBUI_PROCESSED_IMG_DIR, ROOT, SCALE (coordinate scale), and PROB_BOX (proportion of the box-prediction samples) in utils/data_utils/make_webui_data/make_webui_data.py.
Run the processing script (this may take ~24 hours due to the large number of screenshots):
```
python utils/data_utils/make_webui_data/make_webui_data.py
```
Processed training samples will be saved in ROOT/WebUI_processed.

📱 MultiUI

Download the MultiUI raw data from neulab/MultiUI.

Merge, unzip and organize the data as follows:

root/
├── MultiUI/
│   ├── v0.6_5M
│   ├── v0.7_exclude_v0.6
|   ├── v0.8_exclude_v0.7
│   ├── stage1_data.json
│   ├── stage1_data_10k.json
|   ├── stage2_data_to_be_combined_with_general_data.json

Modify MULTIUI_SAMPLE_FILE, IMG_DIR, SAVE_ROOT, and SCALE (coordinate scale) in utils/data_utils/make_multiui_data/make_multiui_data.py.
Run the processing script (this may take ~2 hours due to the large number of screenshots):
```
python utils/data_utils/make_multiui_data/make_multiui_data.py
```
Processed training samples will be saved in ROOT/MultiUI_processed.

📱 SeeClick-Web

Download the SeeClick-Web raw data from SeeClick-Web Annotation File and SeeClick-Web Images.

Unzip and organize the data as follows:

root/
├── SeeClick-Web/
│   ├── 0a5c8a5b7d73de574f2a21f27dbc9a53.png
│   ├── 0a6dcd3f9e1907af232e2c038a866f74.png
| ...

Modify IMG_DIR, ANNO_FILE, SAVE_ROOT, and SCALE (coordinate scale) in utils/data_utils/make_seeclickweb_data/make_seeclickweb_data.py.
Run the processing script (this may take ~24 hours due to the large number of screenshots):
```
python utils/data_utils/make_seeclickweb_data/make_seeclickweb_data.py
```
Processed training samples will be saved in ROOT/SeeClick-Web_processed.

📱 GUIEnv

Download the GUIEnv raw data from yiye2023/GUIEnv.

Unzip and organize the data as follows:

root/
│   └── GUICourse/
│            └── GUIEnv/
│                    ├── imgs
│                    ├── ocr_grounding_train_stage2_images.parquet
│                    ├── ocr_grounding_train_stage2_data.json
│                    ├── ocr_grounding_train_stage1_images.parquet
│                    ├── ocr_grounding_train_stage1_data.json
│                    ├── ocr_grounding_test_images.parquet
│                    └── ocr_grounding_test_data.json

Modify SUBTASK, CURRENT_SPLIT, DATA_ROOT, SAVE_DIR, ENABLE_TEXTLOC, ENABLE_OCR, ENABLE_INTENT_GND, and SCALE (coordinate scale) in utils/data_utils/make_refexp_data/make_refexp_data.py.
Run the processing script:
```
python utils/data_utils/make_refexp_data/make_refexp_data.py
```
Processed training samples will be saved in ROOT/RefExp_processed.

📱 RefExp

Download the RICO image data from SeeClick RICO Data.

Unzip and organize the data as follows:

root/
│   └── rico/
│       ├── 72197.jpg
│       ├── ...
│       ├── 71949.json
│       └── ...

Modify IMG_DIR, SAVE_ROOT, , and SCALE (coordinate scale) in utils/data_utils/make_refexp_data/make_refexp_data.py.
Run the processing script:
```
python utils/data_utils/make_refexp_data/make_refexp_data.py
```
Processed training samples will be saved in ROOT/RefExp_processed.

📱 MOTIF

Download the MOTIF image data from HongxinLi/MOTIF.
Unzip and organize the data as follows:
```
root/
│   └── motif/
```
Modify IMG_DIR, SAVE_ROOT_DIR, and SCALE (coordinate scale) in utils/data_utils/make_motif_data/make_motif_data.py.
Run the processing script:
```
python utils/data_utils/make_motif_data/make_motif_data.py
```
Processed training samples will be saved in ROOT/MOTIF_processed.

📱 OmniAct

Download the OmniAct raw data from Writer/omniact.

Unzip and organize the data as follows:

root/
├── OmniAct/
│   └── data/
│       ├── tasks/
│       ├── metadata/
│       └── data/

Modify ROOT_DIR, SAVE_ROOT_DIR, SPLIT, and SCALE (coordinate scale) in utils/data_utils/make_omniact_data/make_omniact_data.py.
Run the processing script:
```
python utils/data_utils/make_omniact_data/make_omniact_data.py
```
Processed training samples will be saved in ROOT/OmniAct_processed.

Agentic SFT Data Processing

🤖 Android in the Wild (AiTW)

Download AiTW screenshots from here and annotations from here.

Organize the data:

root/
├── AITW/
│   ├── aitw_data_test.json
│   ├── aitw_data_train.json
│   ├── aitw_data_val.json
│   └── aitw_images/
│       ├── general/
│       ├── googleapps/
│       ├── install/
│       ├── single/
│       └── webshopping/

Modify ROOT, SAVE_DIR, SPLIT, and POINT_FORMAT in utils/data_utils/make_aitw_data/make_aitw_data.py.
Run the script:
```
python utils/data_utils/make_aitw_data/make_aitw_data.py
```
Processed samples will be saved in SAVE_DIR/AITW_processed.

🦓 Android in the Zoo (AitZ)

Download raw data following instructions in the AitZ Github Repo.

Organize the data:

root/
├── AITZ/
│   ├── train/
│   │   ├── general/
│   │   ├── googleapps/
│   │   ├── install/
│   │   ├── single/
│   │   └── webshopping/
│   └── test/
│       ├── general/
│       ├── googleapps/
│       ├── install/
│       └── webshopping/

Modify ROOT, SAVE_DIR, SCALE, SPLIT, and USE_ACTION_REFEXP in utils/data_utils/make_aitz_data/make_aitz_data.py.
Run the script:
```
python utils/data_utils/make_aitz_data/make_aitz_data.py
```
Processed samples will be saved in SAVE_DIR/AITZ_processed.

🎮 AndroidControl

Download raw data following instructions in the AndroidControl Github Repo.

Organize the data:

root/
├── AndroidControl/
│   ├── raw/
│   │   ├── android_control-00000-of-00020
│   │   ├── android_control-00001-of-00020
│   │   ├── ...
│   │   ├── android_control-00019-of-00020
│   │   └── splits.json

Modify ANDROIDCONTROL_ROOT, SAVE_DIR, SPLIT, and POINT_FORMAT in utils/data_utils/make_androidcontrol_data/make_androidcontrol_data.py.
Run the script:
```
python utils/data_utils/make_androidcontrol_data/make_androidcontrol_data.py
```
Processed samples will be saved in SAVE_DIR/AndroidControl_processed.

🌊 GUIOdyssey

Download raw data from the GUIOdyssey HF Repo.

Organize the data:

root/
├── GUIOdyssey_raw/
│   ├── screenshots/
│   │   ├── 2386365564178401_9.png
│   │   ├── 5022534067657028_12.png
│   │   ├── 7287738713744873_13.png
│   │   └── ...
│   ├── splits/
│   └── annotations/

Move all images from data_* subfolders in screenshots directly to screenshots.
Modify DATA_ROOT, SAVE_ROOT, and SPLIT in utils/data_utils/make_guiodyssey_data/make_guiodyssey_data.py.
Run the script:
```
python utils/data_utils/make_guiodyssey_data/make_guiodyssey_data.py
```
Processed samples will be saved in SAVE_ROOT/GUIOdyssey_processed.

💳 AMEX

Download raw data from the AMEX HF Repo.

Organize the data:

root/
├── AMEX/
│   ├── element_anno/
│   ├── screenshot/
│   └── metadata/

Modify DATA_ROOT, SAVE_ROOT, and SPLIT in utils/data_utils/make_amex_data/make_amex_data.py.
Run the script:
```
python utils/data_utils/make_amex_data/make_amex_data.py
```
Processed samples will be saved in SAVE_ROOT/AMEX_processed.

🎭 GUIAct

Download raw data from the GUIAct HF Repo by running hf download yiye2023/GUIAct --repo-type dataset --local-dir path/to/GUICourse/GUIAct.

Organize the data:

root/
├── GUICourse/
│   ├── GUIAct/
│   │   ├── smartphone_test_data.json
│   │   ├── smartphone_test_images.parquet
│   │   ├── smartphone_train_data.json
│   │   └── ...

Modify DATA_ROOT, SAVE_DIR, CURRENT_SPLIT, and CURRENT_DEVICE_TYPE in the DatasetConfig class within utils/data_utils/make_guicourse_data/make_guicourse_data.py.

Run the script:

python utils/data_utils/make_guicourse_data/make_guicourse_data.py

Processed samples will be saved in SAVE_DIR.

🔬 Technical Deep Dive

Advanced technical details for researchers and developers

🎮 Unified Action Space Design

📱 Mobile Action Framework

{
  "mobile_actions": [
    "tap", "long_press", "drag", "input_text",
    "navigate_home", "navigate_back", "navigate_recent",
    "press_enter", "swipe", "wait", "status_complete"
  ]
}

⚡ Unified Swipe Action

{
  "action": "swipe",
  "start": [x, y],          // Starting coordinates
  "direction": "up",        // Movement direction  
  "distance": 200           // Swipe distance in pixels
}

📝 Citation

If you use UIPro in your research, please cite our paper

@inproceedings{li2025uipro,
  title={UIPro: Unleashing Superior Interaction Capability For GUI Agents},
  author={Li, Hongxin and Su, Jingran and Chen, Jingfan and Ju, Zheng and Chen, Yuntao and Li, Qing and Zhang, Zhaoxiang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

👥 Team & Acknowledgments

Special thanks to our research team and the open-source community

This work was supported in part by the National Key R&D Program of China and the National Natural Science Foundation of China. We extend our gratitude to the open-source community for providing foundational datasets and tools that made this research possible.

⭐ Star this repository if you find UIPro helpful! ⭐

🚀 Revolutionizing GUI automation, one interaction at a time

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
docs		docs
uipro		uipro
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 UIPro: Unleashing Superior Interaction Capability For GUI Agents

🎯 ICCV 2025 • Next-Generation AI GUI Automation

📢 News

🌟 Revolutionary GUI Agent Technology

🎨 What Makes UIPro Special

🏗️ Architecture & Training Pipeline

🔄 Two-Stage Training Process

🎯 Core Capabilities

🧠 GUI Understanding Capabilities

🤖 GUI Agent Task Execution

📊 Performance Benchmarks

🏆 GUI Agent Task Evaluation

🚀 Quick Start Guide

📦 Installation

1️⃣ Clone Repository

2️⃣ Install Dependencies

📚 Dataset: The Foundation of Excellence

📊 Dataset Statistics

🏗️ Data Compilation Pipeline

GUI Understanding SFT Data Processing

Agentic SFT Data Processing

🔬 Technical Deep Dive

🎮 Unified Action Space Design

📝 Citation

👥 Team & Acknowledgments

⭐ Star this repository if you find UIPro helpful! ⭐

About

Uh oh!

Releases

Packages

Languages

License

ZJULiHongxin/UIPro

Folders and files

Latest commit

History

Repository files navigation

🚀 UIPro: Unleashing Superior Interaction Capability For GUI Agents

🎯 ICCV 2025 • Next-Generation AI GUI Automation

📢 News

🌟 Revolutionary GUI Agent Technology

🎨 What Makes UIPro Special

🏗️ Architecture & Training Pipeline

🔄 Two-Stage Training Process

🎯 Core Capabilities

🧠 GUI Understanding Capabilities

🤖 GUI Agent Task Execution

📊 Performance Benchmarks

🏆 GUI Agent Task Evaluation

🚀 Quick Start Guide

📦 Installation

1️⃣ Clone Repository

2️⃣ Install Dependencies

📚 Dataset: The Foundation of Excellence

📊 Dataset Statistics

🏗️ Data Compilation Pipeline

GUI Understanding SFT Data Processing

Agentic SFT Data Processing

🔬 Technical Deep Dive

🎮 Unified Action Space Design

📝 Citation

👥 Team & Acknowledgments

⭐ Star this repository if you find UIPro helpful! ⭐

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages