Skip to content

ZJULiHongxin/UIPro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

24 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš€ UIPro: Unleashing Superior Interaction Capability For GUI Agents

๐ŸŽฏ ICCV 2025 โ€ข Next-Generation AI GUI Automation

Research Badge AI Badge

Paper

UIPro Project Banner

๐Ÿ“ข News

  • [2025-06-28] ๐ŸŽ‰ UIPro has been accepted to ICCV 2025!
  • [2025-11-23] Uploaded UIPro models.
  • [2025-11-23] Uploaded data processing scripts, and systematic denoising procedures for AITW, AITZ, MobileViews, WAE, WebUI, MultiUI, AndroidControl, GUIOdyssey, AMEX, GUIAct
  • [2025-11-27] Uploaded data processing scripts, and systematic denoising procedures for SeeClick-Web, RefExp
  • [2025-12-10] Uploaded data processing scripts, and systematic denoising procedures for MOTIF, RefExp, and GUIEnv
  • [TODO] Upload whole datasets

๐ŸŒŸ Revolutionary GUI Agent Technology

UIPro represents a paradigm shift in GUI automation, achieving human-level interaction capabilities across multiple platforms through advanced AI.

๐ŸŽจ What Makes UIPro Special

๐Ÿ—๏ธ Architecture & Training Pipeline

A two-stage revolutionary approach to GUI agent development

UIPro Methodology Diagram

๐Ÿ”„ Two-Stage Training Process

The training process involves a sophisticated pipeline designed to enhance both understanding and execution capabilities.


๐ŸŽฏ Core Capabilities

๐Ÿง  GUI Understanding Capabilities

Capability Description Performance
๐ŸŽฏ Element Grounding Accurately locates UI elements based on descriptions โญโญโญโญโญ
๐Ÿ” Functionality Recognition Understands purpose and function of interface components โญโญโญโญโญ
๐Ÿงญ Intent Mapping Connects user intentions to appropriate UI interactions โญโญโญโญโญ

๐Ÿค– GUI Agent Task Execution

Capability Description Performance
๐Ÿ“‹ Task Planning Breaks down complex requests into actionable steps โญโญโญโญโญ
โšก Action Execution Performs clicks, typing, scrolling with high precision โญโญโญโญโญ
๐ŸŒ Cross-Platform Navigation Seamless operation across different device types โญโญโญโญโญ

๐Ÿ“Š Performance Benchmarks

Industry-leading results across all major GUI benchmarks

๐Ÿ† GUI Agent Task Evaluation

๐ŸŽฏ Benchmark ๐Ÿค– UIPro-SLiME (3B) ๐Ÿš€ UIPro-Qwen2VL (7B) ๐Ÿ“Š Metric
AITW 68.0% 70.4% Step SR
AndroidControl 61.1% 85.5% Step SR
GUIAct-Web 68.2% 69.1% Step SR
Mind2Web 28.7% 48.4% Step SR

Step Success Rate (Step SR) - Higher is better


๐Ÿš€ Quick Start Guide

Get up and running with UIPro in minutes

๐Ÿ“ฆ Installation

๐Ÿ”ง Setup Instructions

1๏ธโƒฃ Clone Repository

git clone https://github.com/ZJULiHongxin/UIPro.git
cd UIPro

2๏ธโƒฃ Install Dependencies

pip install -r requirements.txt

๐Ÿ“š Dataset: The Foundation of Excellence

The world's largest and most comprehensive GUI understanding collection

๐Ÿ“Š Dataset Statistics

Metric Value Description
๐Ÿ“Task Samples 20.6M GUI understanding tasks
๐Ÿ–ผ๏ธScreenshots 2.5M Unique GUI screenshots
๐ŸŽฏElements 3.3M Clean GUI elements
๐Ÿ”ขTask Types 13 Different task categories

๐Ÿ—๏ธ Data Compilation Pipeline

We provide comprehensive scripts to process various GUI datasets. Please follow the instructions below for each dataset.

Note: We also implemented a systematic denoising procedure to ensure data quality, removing up to 29% of noise from some data sources.

GUI Understanding SFT Data Processing

๐Ÿ“ฑ MobileViews
  1. Download the MobileViews raw data from HuggingFace via:

    hf download mllmTeam/MobileViews --repo-type dataset --local-dir ./MobileViews
  2. Unzip and organize the data as follows:

    root/
    โ”œโ”€โ”€ MobileViews/
    โ”‚   โ”œโ”€โ”€ MobileViews_0-150000/
    โ”‚   โ”œโ”€โ”€ MobileViews_0-150000.csv
    โ”‚   โ”œโ”€โ”€ MobileViews_150001-291197/
    โ”‚   โ”œโ”€โ”€ MobileViews_150001-291197.csv
    โ”‚   โ””โ”€โ”€ ...
    
  3. Modify MOBILEVIEWS_DIR, ROOT, SCALE (coordinate scale), and PROB_BOX (proportion of the box-prediction samples) in utils/data_utils/make_mobileviews_data/extract_and_generate_mobilebiews_data.py.

  4. Run the processing script (this may take ~48 hours due to the large number of screenshots):

    python utils/data_utils/make_mobileviews_data/extract_and_generate_mobilebiews_data.py

    Processed training samples will be saved in ROOT/mobileviews_processed.

  5. Finally, run utils/data_utils/make_mobileviews_data/run_generate_symlinks.sh to create a unified folder for screenshots.

๐Ÿ“ฑ WAE
  1. Download the WAE raw data from WAE DropBox.

  2. Merge, unzip and organize the data as follows:

    root/
    โ”œโ”€โ”€ WAE/
    โ”‚   โ”œโ”€โ”€ COM.HSBFREE_25-output
    โ”‚   โ”œโ”€โ”€ Com.sktelecom.minit_52-output
    โ”‚   โ”œโ”€โ”€ Draziw.Button.Mines_71-output
    โ”‚   โ”œโ”€โ”€ HBVerbrauchszaehler.lite_119-output
    โ”‚   โ””โ”€โ”€ ...
    
  3. Modify WAE_DIR, ROOT_DIR, SCALE (coordinate scale), and PROB_BOX (proportion of the box-prediction samples) in utils/data_utils/make_WAE_data/make_WAE_data.py.

  4. Run the processing script (this may take ~24 hours due to the large number of screenshots):

    python utils/data_utils/make_WAE_data/make_WAE_data.py
    

    Processed training samples will be saved in ROOT/WAE_processed.

๐Ÿ“ฑ WebUI
  1. Download the WebUI raw data from biglab/webui-all.

  2. Merge, unzip and organize the data as follows:

    root/
    โ”œโ”€โ”€ WebUI/
    โ”‚   โ”œโ”€โ”€ dh2 (GUI metadata resulting from unzipping)
    โ”‚   โ”œโ”€โ”€ WebUI_screenshots (A folder used to save processed GUI screenshots)
    
  3. Modify WEBUI_DIR, WEBUI_PROCESSED_IMG_DIR, ROOT, SCALE (coordinate scale), and PROB_BOX (proportion of the box-prediction samples) in utils/data_utils/make_webui_data/make_webui_data.py.

  4. Run the processing script (this may take ~24 hours due to the large number of screenshots):

    python utils/data_utils/make_webui_data/make_webui_data.py
    

    Processed training samples will be saved in ROOT/WebUI_processed.

๐Ÿ“ฑ MultiUI
  1. Download the MultiUI raw data from neulab/MultiUI.

  2. Merge, unzip and organize the data as follows:

    root/
    โ”œโ”€โ”€ MultiUI/
    โ”‚   โ”œโ”€โ”€ v0.6_5M
    โ”‚   โ”œโ”€โ”€ v0.7_exclude_v0.6
    |   โ”œโ”€โ”€ v0.8_exclude_v0.7
    โ”‚   โ”œโ”€โ”€ stage1_data.json
    โ”‚   โ”œโ”€โ”€ stage1_data_10k.json
    |   โ”œโ”€โ”€ stage2_data_to_be_combined_with_general_data.json
    
  3. Modify MULTIUI_SAMPLE_FILE, IMG_DIR, SAVE_ROOT, and SCALE (coordinate scale) in utils/data_utils/make_multiui_data/make_multiui_data.py.

  4. Run the processing script (this may take ~2 hours due to the large number of screenshots):

    python utils/data_utils/make_multiui_data/make_multiui_data.py
    

    Processed training samples will be saved in ROOT/MultiUI_processed.

๐Ÿ“ฑ SeeClick-Web
  1. Download the SeeClick-Web raw data from SeeClick-Web Annotation File and SeeClick-Web Images.

  2. Unzip and organize the data as follows:

    root/
    โ”œโ”€โ”€ SeeClick-Web/
    โ”‚   โ”œโ”€โ”€ 0a5c8a5b7d73de574f2a21f27dbc9a53.png
    โ”‚   โ”œโ”€โ”€ 0a6dcd3f9e1907af232e2c038a866f74.png
    | ...
    
  3. Modify IMG_DIR, ANNO_FILE, SAVE_ROOT, and SCALE (coordinate scale) in utils/data_utils/make_seeclickweb_data/make_seeclickweb_data.py.

  4. Run the processing script (this may take ~24 hours due to the large number of screenshots):

    python utils/data_utils/make_seeclickweb_data/make_seeclickweb_data.py
    

    Processed training samples will be saved in ROOT/SeeClick-Web_processed.

๐Ÿ“ฑ GUIEnv
  1. Download the GUIEnv raw data from yiye2023/GUIEnv.

  2. Unzip and organize the data as follows:

    root/
    โ”‚   โ””โ”€โ”€ GUICourse/
    โ”‚            โ””โ”€โ”€ GUIEnv/
    โ”‚                    โ”œโ”€โ”€ imgs
    โ”‚                    โ”œโ”€โ”€ ocr_grounding_train_stage2_images.parquet
    โ”‚                    โ”œโ”€โ”€ ocr_grounding_train_stage2_data.json
    โ”‚                    โ”œโ”€โ”€ ocr_grounding_train_stage1_images.parquet
    โ”‚                    โ”œโ”€โ”€ ocr_grounding_train_stage1_data.json
    โ”‚                    โ”œโ”€โ”€ ocr_grounding_test_images.parquet
    โ”‚                    โ””โ”€โ”€ ocr_grounding_test_data.json
    
    
  3. Modify SUBTASK, CURRENT_SPLIT, DATA_ROOT, SAVE_DIR, ENABLE_TEXTLOC, ENABLE_OCR, ENABLE_INTENT_GND, and SCALE (coordinate scale) in utils/data_utils/make_refexp_data/make_refexp_data.py.

  4. Run the processing script:

    python utils/data_utils/make_refexp_data/make_refexp_data.py
    

    Processed training samples will be saved in ROOT/RefExp_processed.

๐Ÿ“ฑ RefExp
  1. Download the RICO image data from SeeClick RICO Data.

  2. Unzip and organize the data as follows:

    root/
    โ”‚   โ””โ”€โ”€ rico/
    โ”‚       โ”œโ”€โ”€ 72197.jpg
    โ”‚       โ”œโ”€โ”€ ...
    โ”‚       โ”œโ”€โ”€ 71949.json
    โ”‚       โ””โ”€โ”€ ...
    
    
  3. Modify IMG_DIR, SAVE_ROOT, , and SCALE (coordinate scale) in utils/data_utils/make_refexp_data/make_refexp_data.py.

  4. Run the processing script:

    python utils/data_utils/make_refexp_data/make_refexp_data.py
    

    Processed training samples will be saved in ROOT/RefExp_processed.

๐Ÿ“ฑ MOTIF
  1. Download the MOTIF image data from HongxinLi/MOTIF.

  2. Unzip and organize the data as follows:

    root/
    โ”‚   โ””โ”€โ”€ motif/
    
    
  3. Modify IMG_DIR, SAVE_ROOT_DIR, and SCALE (coordinate scale) in utils/data_utils/make_motif_data/make_motif_data.py.

  4. Run the processing script:

    python utils/data_utils/make_motif_data/make_motif_data.py
    

    Processed training samples will be saved in ROOT/MOTIF_processed.

๐Ÿ“ฑ OmniAct
  1. Download the OmniAct raw data from Writer/omniact.

  2. Unzip and organize the data as follows:

    root/
    โ”œโ”€โ”€ OmniAct/
    โ”‚   โ””โ”€โ”€ data/
    โ”‚       โ”œโ”€โ”€ tasks/
    โ”‚       โ”œโ”€โ”€ metadata/
    โ”‚       โ””โ”€โ”€ data/
    
  3. Modify ROOT_DIR, SAVE_ROOT_DIR, SPLIT, and SCALE (coordinate scale) in utils/data_utils/make_omniact_data/make_omniact_data.py.

  4. Run the processing script:

    python utils/data_utils/make_omniact_data/make_omniact_data.py
    

    Processed training samples will be saved in ROOT/OmniAct_processed.

Agentic SFT Data Processing

๐Ÿค– Android in the Wild (AiTW)
  1. Download AiTW screenshots from here and annotations from here.
  2. Organize the data:
    root/
    โ”œโ”€โ”€ AITW/
    โ”‚   โ”œโ”€โ”€ aitw_data_test.json
    โ”‚   โ”œโ”€โ”€ aitw_data_train.json
    โ”‚   โ”œโ”€โ”€ aitw_data_val.json
    โ”‚   โ””โ”€โ”€ aitw_images/
    โ”‚       โ”œโ”€โ”€ general/
    โ”‚       โ”œโ”€โ”€ googleapps/
    โ”‚       โ”œโ”€โ”€ install/
    โ”‚       โ”œโ”€โ”€ single/
    โ”‚       โ””โ”€โ”€ webshopping/
    
  3. Modify ROOT, SAVE_DIR, SPLIT, and POINT_FORMAT in utils/data_utils/make_aitw_data/make_aitw_data.py.
  4. Run the script:
    python utils/data_utils/make_aitw_data/make_aitw_data.py
    Processed samples will be saved in SAVE_DIR/AITW_processed.
๐Ÿฆ“ Android in the Zoo (AitZ)
  1. Download raw data following instructions in the AitZ Github Repo.
  2. Organize the data:
    root/
    โ”œโ”€โ”€ AITZ/
    โ”‚   โ”œโ”€โ”€ train/
    โ”‚   โ”‚   โ”œโ”€โ”€ general/
    โ”‚   โ”‚   โ”œโ”€โ”€ googleapps/
    โ”‚   โ”‚   โ”œโ”€โ”€ install/
    โ”‚   โ”‚   โ”œโ”€โ”€ single/
    โ”‚   โ”‚   โ””โ”€โ”€ webshopping/
    โ”‚   โ””โ”€โ”€ test/
    โ”‚       โ”œโ”€โ”€ general/
    โ”‚       โ”œโ”€โ”€ googleapps/
    โ”‚       โ”œโ”€โ”€ install/
    โ”‚       โ””โ”€โ”€ webshopping/
    
  3. Modify ROOT, SAVE_DIR, SCALE, SPLIT, and USE_ACTION_REFEXP in utils/data_utils/make_aitz_data/make_aitz_data.py.
  4. Run the script:
    python utils/data_utils/make_aitz_data/make_aitz_data.py
    Processed samples will be saved in SAVE_DIR/AITZ_processed.
๐ŸŽฎ AndroidControl
  1. Download raw data following instructions in the AndroidControl Github Repo.
  2. Organize the data:
    root/
    โ”œโ”€โ”€ AndroidControl/
    โ”‚   โ”œโ”€โ”€ raw/
    โ”‚   โ”‚   โ”œโ”€โ”€ android_control-00000-of-00020
    โ”‚   โ”‚   โ”œโ”€โ”€ android_control-00001-of-00020
    โ”‚   โ”‚   โ”œโ”€โ”€ ...
    โ”‚   โ”‚   โ”œโ”€โ”€ android_control-00019-of-00020
    โ”‚   โ”‚   โ””โ”€โ”€ splits.json
    
  3. Modify ANDROIDCONTROL_ROOT, SAVE_DIR, SPLIT, and POINT_FORMAT in utils/data_utils/make_androidcontrol_data/make_androidcontrol_data.py.
  4. Run the script:
    python utils/data_utils/make_androidcontrol_data/make_androidcontrol_data.py
    Processed samples will be saved in SAVE_DIR/AndroidControl_processed.
๐ŸŒŠ GUIOdyssey
  1. Download raw data from the GUIOdyssey HF Repo.
  2. Organize the data:
    root/
    โ”œโ”€โ”€ GUIOdyssey_raw/
    โ”‚   โ”œโ”€โ”€ screenshots/
    โ”‚   โ”‚   โ”œโ”€โ”€ 2386365564178401_9.png
    โ”‚   โ”‚   โ”œโ”€โ”€ 5022534067657028_12.png
    โ”‚   โ”‚   โ”œโ”€โ”€ 7287738713744873_13.png
    โ”‚   โ”‚   โ””โ”€โ”€ ...
    โ”‚   โ”œโ”€โ”€ splits/
    โ”‚   โ””โ”€โ”€ annotations/
    
  3. Move all images from data_* subfolders in screenshots directly to screenshots.
  4. Modify DATA_ROOT, SAVE_ROOT, and SPLIT in utils/data_utils/make_guiodyssey_data/make_guiodyssey_data.py.
  5. Run the script:
    python utils/data_utils/make_guiodyssey_data/make_guiodyssey_data.py
    Processed samples will be saved in SAVE_ROOT/GUIOdyssey_processed.
๐Ÿ’ณ AMEX
  1. Download raw data from the AMEX HF Repo.
  2. Organize the data:
    root/
    โ”œโ”€โ”€ AMEX/
    โ”‚   โ”œโ”€โ”€ element_anno/
    โ”‚   โ”œโ”€โ”€ screenshot/
    โ”‚   โ””โ”€โ”€ metadata/
    
  3. Modify DATA_ROOT, SAVE_ROOT, and SPLIT in utils/data_utils/make_amex_data/make_amex_data.py.
  4. Run the script:
    python utils/data_utils/make_amex_data/make_amex_data.py
    Processed samples will be saved in SAVE_ROOT/AMEX_processed.
๐ŸŽญ GUIAct
  1. Download raw data from the GUIAct HF Repo by running hf download yiye2023/GUIAct --repo-type dataset --local-dir path/to/GUICourse/GUIAct.
  2. Organize the data:
    root/
    โ”œโ”€โ”€ GUICourse/
    โ”‚   โ”œโ”€โ”€ GUIAct/
    โ”‚   โ”‚   โ”œโ”€โ”€ smartphone_test_data.json
    โ”‚   โ”‚   โ”œโ”€โ”€ smartphone_test_images.parquet
    โ”‚   โ”‚   โ”œโ”€โ”€ smartphone_train_data.json
    โ”‚   โ”‚   โ””โ”€โ”€ ...
    
  3. Modify DATA_ROOT, SAVE_DIR, CURRENT_SPLIT, and CURRENT_DEVICE_TYPE in the DatasetConfig class within utils/data_utils/make_guicourse_data/make_guicourse_data.py.
  4. Run the script:
    python utils/data_utils/make_guicourse_data/make_guicourse_data.py
    Processed samples will be saved in SAVE_DIR.

๐Ÿ”ฌ Technical Deep Dive

Advanced technical details for researchers and developers

๐ŸŽฎ Unified Action Space Design

๐Ÿ“ฑ Mobile Action Framework
{
  "mobile_actions": [
    "tap", "long_press", "drag", "input_text",
    "navigate_home", "navigate_back", "navigate_recent",
    "press_enter", "swipe", "wait", "status_complete"
  ]
}
โšก Unified Swipe Action
{
  "action": "swipe",
  "start": [x, y],          // Starting coordinates
  "direction": "up",        // Movement direction  
  "distance": 200           // Swipe distance in pixels
}

๐Ÿ“ Citation

If you use UIPro in your research, please cite our paper

@inproceedings{li2025uipro,
  title={UIPro: Unleashing Superior Interaction Capability For GUI Agents},
  author={Li, Hongxin and Su, Jingran and Chen, Jingfan and Ju, Zheng and Chen, Yuntao and Li, Qing and Zhang, Zhaoxiang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

๐Ÿ‘ฅ Team & Acknowledgments

Special thanks to our research team and the open-source community

This work was supported in part by the National Key R&D Program of China and the National Natural Science Foundation of China. We extend our gratitude to the open-source community for providing foundational datasets and tools that made this research possible.




โญ Star this repository if you find UIPro helpful! โญ

GitHub Stars



๐Ÿš€ Revolutionizing GUI automation, one interaction at a time

About

Advanced GUI agents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published