- [2025-06-28] ๐ UIPro has been accepted to ICCV 2025!
- [2025-11-23] Uploaded UIPro models.
- [2025-11-23] Uploaded data processing scripts, and systematic denoising procedures for AITW, AITZ, MobileViews, WAE, WebUI, MultiUI, AndroidControl, GUIOdyssey, AMEX, GUIAct
- [2025-11-27] Uploaded data processing scripts, and systematic denoising procedures for SeeClick-Web, RefExp
- [2025-12-10] Uploaded data processing scripts, and systematic denoising procedures for MOTIF, RefExp, and GUIEnv
- [TODO] Upload whole datasets
UIPro represents a paradigm shift in GUI automation, achieving human-level interaction capabilities across multiple platforms through advanced AI.
The training process involves a sophisticated pipeline designed to enhance both understanding and execution capabilities.
| Capability | Description | Performance |
|---|---|---|
| ๐ฏ Element Grounding | Accurately locates UI elements based on descriptions | โญโญโญโญโญ |
| ๐ Functionality Recognition | Understands purpose and function of interface components | โญโญโญโญโญ |
| ๐งญ Intent Mapping | Connects user intentions to appropriate UI interactions | โญโญโญโญโญ |
| Capability | Description | Performance |
|---|---|---|
| ๐ Task Planning | Breaks down complex requests into actionable steps | โญโญโญโญโญ |
| โก Action Execution | Performs clicks, typing, scrolling with high precision | โญโญโญโญโญ |
| ๐ Cross-Platform Navigation | Seamless operation across different device types | โญโญโญโญโญ |
Industry-leading results across all major GUI benchmarks
| ๐ฏ Benchmark | ๐ค UIPro-SLiME (3B) | ๐ UIPro-Qwen2VL (7B) | ๐ Metric |
|---|---|---|---|
| AITW | 68.0% | 70.4% | Step SR |
| AndroidControl | 61.1% | 85.5% | Step SR |
| GUIAct-Web | 68.2% | 69.1% | Step SR |
| Mind2Web | 28.7% | 48.4% | Step SR |
Step Success Rate (Step SR) - Higher is better
๐ง Setup Instructions
git clone https://github.com/ZJULiHongxin/UIPro.git
cd UIPropip install -r requirements.txtThe world's largest and most comprehensive GUI understanding collection
| Metric | Value | Description |
|---|---|---|
| ๐Task Samples | 20.6M | GUI understanding tasks |
| ๐ผ๏ธScreenshots | 2.5M | Unique GUI screenshots |
| ๐ฏElements | 3.3M | Clean GUI elements |
| ๐ขTask Types | 13 | Different task categories |
We provide comprehensive scripts to process various GUI datasets. Please follow the instructions below for each dataset.
Note: We also implemented a systematic denoising procedure to ensure data quality, removing up to 29% of noise from some data sources.
๐ฑ MobileViews
-
Download the MobileViews raw data from HuggingFace via:
hf download mllmTeam/MobileViews --repo-type dataset --local-dir ./MobileViews
-
Unzip and organize the data as follows:
root/ โโโ MobileViews/ โ โโโ MobileViews_0-150000/ โ โโโ MobileViews_0-150000.csv โ โโโ MobileViews_150001-291197/ โ โโโ MobileViews_150001-291197.csv โ โโโ ... -
Modify
MOBILEVIEWS_DIR,ROOT,SCALE(coordinate scale), andPROB_BOX(proportion of the box-prediction samples) inutils/data_utils/make_mobileviews_data/extract_and_generate_mobilebiews_data.py. -
Run the processing script (this may take ~48 hours due to the large number of screenshots):
python utils/data_utils/make_mobileviews_data/extract_and_generate_mobilebiews_data.py
Processed training samples will be saved in
ROOT/mobileviews_processed. -
Finally, run
utils/data_utils/make_mobileviews_data/run_generate_symlinks.shto create a unified folder for screenshots.
๐ฑ WAE
-
Download the WAE raw data from WAE DropBox.
-
Merge, unzip and organize the data as follows:
root/ โโโ WAE/ โ โโโ COM.HSBFREE_25-output โ โโโ Com.sktelecom.minit_52-output โ โโโ Draziw.Button.Mines_71-output โ โโโ HBVerbrauchszaehler.lite_119-output โ โโโ ... -
Modify
WAE_DIR,ROOT_DIR,SCALE(coordinate scale), andPROB_BOX(proportion of the box-prediction samples) inutils/data_utils/make_WAE_data/make_WAE_data.py. -
Run the processing script (this may take ~24 hours due to the large number of screenshots):
python utils/data_utils/make_WAE_data/make_WAE_data.pyProcessed training samples will be saved in
ROOT/WAE_processed.
๐ฑ WebUI
-
Download the WebUI raw data from biglab/webui-all.
-
Merge, unzip and organize the data as follows:
root/ โโโ WebUI/ โ โโโ dh2 (GUI metadata resulting from unzipping) โ โโโ WebUI_screenshots (A folder used to save processed GUI screenshots) -
Modify
WEBUI_DIR,WEBUI_PROCESSED_IMG_DIR,ROOT,SCALE(coordinate scale), andPROB_BOX(proportion of the box-prediction samples) inutils/data_utils/make_webui_data/make_webui_data.py. -
Run the processing script (this may take ~24 hours due to the large number of screenshots):
python utils/data_utils/make_webui_data/make_webui_data.pyProcessed training samples will be saved in
ROOT/WebUI_processed.
๐ฑ MultiUI
-
Download the MultiUI raw data from neulab/MultiUI.
-
Merge, unzip and organize the data as follows:
root/ โโโ MultiUI/ โ โโโ v0.6_5M โ โโโ v0.7_exclude_v0.6 | โโโ v0.8_exclude_v0.7 โ โโโ stage1_data.json โ โโโ stage1_data_10k.json | โโโ stage2_data_to_be_combined_with_general_data.json -
Modify
MULTIUI_SAMPLE_FILE,IMG_DIR,SAVE_ROOT, andSCALE(coordinate scale) inutils/data_utils/make_multiui_data/make_multiui_data.py. -
Run the processing script (this may take ~2 hours due to the large number of screenshots):
python utils/data_utils/make_multiui_data/make_multiui_data.pyProcessed training samples will be saved in
ROOT/MultiUI_processed.
๐ฑ SeeClick-Web
-
Download the SeeClick-Web raw data from SeeClick-Web Annotation File and SeeClick-Web Images.
-
Unzip and organize the data as follows:
root/ โโโ SeeClick-Web/ โ โโโ 0a5c8a5b7d73de574f2a21f27dbc9a53.png โ โโโ 0a6dcd3f9e1907af232e2c038a866f74.png | ... -
Modify
IMG_DIR,ANNO_FILE,SAVE_ROOT, andSCALE(coordinate scale) inutils/data_utils/make_seeclickweb_data/make_seeclickweb_data.py. -
Run the processing script (this may take ~24 hours due to the large number of screenshots):
python utils/data_utils/make_seeclickweb_data/make_seeclickweb_data.pyProcessed training samples will be saved in
ROOT/SeeClick-Web_processed.
๐ฑ GUIEnv
-
Download the GUIEnv raw data from yiye2023/GUIEnv.
-
Unzip and organize the data as follows:
root/ โ โโโ GUICourse/ โ โโโ GUIEnv/ โ โโโ imgs โ โโโ ocr_grounding_train_stage2_images.parquet โ โโโ ocr_grounding_train_stage2_data.json โ โโโ ocr_grounding_train_stage1_images.parquet โ โโโ ocr_grounding_train_stage1_data.json โ โโโ ocr_grounding_test_images.parquet โ โโโ ocr_grounding_test_data.json -
Modify
SUBTASK,CURRENT_SPLIT,DATA_ROOT,SAVE_DIR,ENABLE_TEXTLOC,ENABLE_OCR,ENABLE_INTENT_GND, andSCALE(coordinate scale) inutils/data_utils/make_refexp_data/make_refexp_data.py. -
Run the processing script:
python utils/data_utils/make_refexp_data/make_refexp_data.pyProcessed training samples will be saved in
ROOT/RefExp_processed.
๐ฑ RefExp
-
Download the RICO image data from SeeClick RICO Data.
-
Unzip and organize the data as follows:
root/ โ โโโ rico/ โ โโโ 72197.jpg โ โโโ ... โ โโโ 71949.json โ โโโ ... -
Modify
IMG_DIR,SAVE_ROOT, , andSCALE(coordinate scale) inutils/data_utils/make_refexp_data/make_refexp_data.py. -
Run the processing script:
python utils/data_utils/make_refexp_data/make_refexp_data.pyProcessed training samples will be saved in
ROOT/RefExp_processed.
๐ฑ MOTIF
-
Download the MOTIF image data from HongxinLi/MOTIF.
-
Unzip and organize the data as follows:
root/ โ โโโ motif/ -
Modify
IMG_DIR,SAVE_ROOT_DIR, andSCALE(coordinate scale) inutils/data_utils/make_motif_data/make_motif_data.py. -
Run the processing script:
python utils/data_utils/make_motif_data/make_motif_data.pyProcessed training samples will be saved in
ROOT/MOTIF_processed.
๐ฑ OmniAct
-
Download the OmniAct raw data from Writer/omniact.
-
Unzip and organize the data as follows:
root/ โโโ OmniAct/ โ โโโ data/ โ โโโ tasks/ โ โโโ metadata/ โ โโโ data/ -
Modify
ROOT_DIR,SAVE_ROOT_DIR,SPLIT, andSCALE(coordinate scale) inutils/data_utils/make_omniact_data/make_omniact_data.py. -
Run the processing script:
python utils/data_utils/make_omniact_data/make_omniact_data.pyProcessed training samples will be saved in
ROOT/OmniAct_processed.
๐ค Android in the Wild (AiTW)
- Download AiTW screenshots from here and annotations from here.
- Organize the data:
root/ โโโ AITW/ โ โโโ aitw_data_test.json โ โโโ aitw_data_train.json โ โโโ aitw_data_val.json โ โโโ aitw_images/ โ โโโ general/ โ โโโ googleapps/ โ โโโ install/ โ โโโ single/ โ โโโ webshopping/ - Modify
ROOT,SAVE_DIR,SPLIT, andPOINT_FORMATinutils/data_utils/make_aitw_data/make_aitw_data.py. - Run the script:
Processed samples will be saved in
python utils/data_utils/make_aitw_data/make_aitw_data.py
SAVE_DIR/AITW_processed.
๐ฆ Android in the Zoo (AitZ)
- Download raw data following instructions in the AitZ Github Repo.
- Organize the data:
root/ โโโ AITZ/ โ โโโ train/ โ โ โโโ general/ โ โ โโโ googleapps/ โ โ โโโ install/ โ โ โโโ single/ โ โ โโโ webshopping/ โ โโโ test/ โ โโโ general/ โ โโโ googleapps/ โ โโโ install/ โ โโโ webshopping/ - Modify
ROOT,SAVE_DIR,SCALE,SPLIT, andUSE_ACTION_REFEXPinutils/data_utils/make_aitz_data/make_aitz_data.py. - Run the script:
Processed samples will be saved in
python utils/data_utils/make_aitz_data/make_aitz_data.py
SAVE_DIR/AITZ_processed.
๐ฎ AndroidControl
- Download raw data following instructions in the AndroidControl Github Repo.
- Organize the data:
root/ โโโ AndroidControl/ โ โโโ raw/ โ โ โโโ android_control-00000-of-00020 โ โ โโโ android_control-00001-of-00020 โ โ โโโ ... โ โ โโโ android_control-00019-of-00020 โ โ โโโ splits.json - Modify
ANDROIDCONTROL_ROOT,SAVE_DIR,SPLIT, andPOINT_FORMATinutils/data_utils/make_androidcontrol_data/make_androidcontrol_data.py. - Run the script:
Processed samples will be saved in
python utils/data_utils/make_androidcontrol_data/make_androidcontrol_data.py
SAVE_DIR/AndroidControl_processed.
๐ GUIOdyssey
- Download raw data from the GUIOdyssey HF Repo.
- Organize the data:
root/ โโโ GUIOdyssey_raw/ โ โโโ screenshots/ โ โ โโโ 2386365564178401_9.png โ โ โโโ 5022534067657028_12.png โ โ โโโ 7287738713744873_13.png โ โ โโโ ... โ โโโ splits/ โ โโโ annotations/ - Move all images from
data_*subfolders inscreenshotsdirectly toscreenshots. - Modify
DATA_ROOT,SAVE_ROOT, andSPLITinutils/data_utils/make_guiodyssey_data/make_guiodyssey_data.py. - Run the script:
Processed samples will be saved in
python utils/data_utils/make_guiodyssey_data/make_guiodyssey_data.py
SAVE_ROOT/GUIOdyssey_processed.
๐ณ AMEX
- Download raw data from the AMEX HF Repo.
- Organize the data:
root/ โโโ AMEX/ โ โโโ element_anno/ โ โโโ screenshot/ โ โโโ metadata/ - Modify
DATA_ROOT,SAVE_ROOT, andSPLITinutils/data_utils/make_amex_data/make_amex_data.py. - Run the script:
Processed samples will be saved in
python utils/data_utils/make_amex_data/make_amex_data.py
SAVE_ROOT/AMEX_processed.
๐ญ GUIAct
- Download raw data from the GUIAct HF Repo by running
hf download yiye2023/GUIAct --repo-type dataset --local-dir path/to/GUICourse/GUIAct. - Organize the data:
root/ โโโ GUICourse/ โ โโโ GUIAct/ โ โ โโโ smartphone_test_data.json โ โ โโโ smartphone_test_images.parquet โ โ โโโ smartphone_train_data.json โ โ โโโ ... - Modify
DATA_ROOT,SAVE_DIR,CURRENT_SPLIT, andCURRENT_DEVICE_TYPEin theDatasetConfigclass withinutils/data_utils/make_guicourse_data/make_guicourse_data.py. - Run the script:
Processed samples will be saved in
python utils/data_utils/make_guicourse_data/make_guicourse_data.py
SAVE_DIR.
๐ฑ Mobile Action Framework
{
"mobile_actions": [
"tap", "long_press", "drag", "input_text",
"navigate_home", "navigate_back", "navigate_recent",
"press_enter", "swipe", "wait", "status_complete"
]
}โก Unified Swipe Action
{
"action": "swipe",
"start": [x, y], // Starting coordinates
"direction": "up", // Movement direction
"distance": 200 // Swipe distance in pixels
}@inproceedings{li2025uipro,
title={UIPro: Unleashing Superior Interaction Capability For GUI Agents},
author={Li, Hongxin and Su, Jingran and Chen, Jingfan and Ju, Zheng and Chen, Yuntao and Li, Qing and Zhang, Zhaoxiang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2025}
}Special thanks to our research team and the open-source community
This work was supported in part by the National Key R&D Program of China and the National Natural Science Foundation of China. We extend our gratitude to the open-source community for providing foundational datasets and tools that made this research possible.
๐ Revolutionizing GUI automation, one interaction at a time

