Skip to content

Commit ec84383

Browse files
committed
update citation and example usage.
1 parent a6e77c0 commit ec84383

File tree

1 file changed

+70
-4
lines changed

1 file changed

+70
-4
lines changed

README.md

Lines changed: 70 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
<sup>*</sup> Equal Contribution&nbsp;&nbsp;&nbsp;&nbsp;<sup>†</sup> Leadership
2828

2929
<h4>
30-
<a href="https://www.arxiv.org/pdf/2502.13130">📄 arXiv Paper</a> &nbsp;
30+
<a href="https://www.arxiv.org/pdf/2506.03143">📄 arXiv Paper</a> &nbsp;
3131
<a href="https://aka.ms/GUI-Actor/">🌐 Project Page</a> &nbsp;
3232
<a href="https://huggingface.co/microsoft/GUI-Actor-7B-Qwen2-VL">🤗 Hugging Face Models</a>
3333
</h4>
@@ -134,6 +134,72 @@ For evaluation on ScreenSpot-Pro, you first need to download the data from [here
134134
python eval/screenSpot_pro.py --save_path <path_to_save_results> --data_path <path_to_data_dir>
135135
```
136136

137+
Example usage:
138+
```python
139+
import torch
140+
141+
from qwen_vl_utils import process_vision_info
142+
from datasets import load_dataset
143+
from transformers import Qwen2VLProcessor
144+
from gui_actor.constants import chat_template
145+
from gui_actor.modeling import Qwen2VLForConditionalGenerationWithPointer
146+
from gui_actor.inference import inference
147+
148+
149+
# load model
150+
model_name_or_path = "microsoft/GUI-Actor-7B-Qwen2-VL"
151+
data_processor = Qwen2VLProcessor.from_pretrained(model_name_or_path)
152+
tokenizer = data_processor.tokenizer
153+
model = Qwen2VLForConditionalGenerationWithPointer.from_pretrained(
154+
model_name_or_path,
155+
torch_dtype=torch.bfloat16,
156+
device_map="cuda:0",
157+
attn_implementation="flash_attention_2"
158+
).eval()
159+
160+
# prepare example
161+
dataset = load_dataset("rootsautomation/ScreenSpot")["test"]
162+
example = dataset[0]
163+
print(f"Intruction: {example['instruction']}")
164+
print(f"ground-truth action region (x1, y1, x2, y2): {[round(i, 2) for i in example['bbox']]}")
165+
166+
conversation = [
167+
{
168+
"role": "system",
169+
"content": [
170+
{
171+
"type": "text",
172+
"text": "You are a GUI agent. You are given a task and a screenshot of the screen. You need to perform a series of pyautogui actions to complete the task.",
173+
}
174+
]
175+
},
176+
{
177+
"role": "user",
178+
"content": [
179+
{
180+
"type": "image",
181+
"image": example["image"], # PIL.Image.Image or str to path
182+
# "image_url": "https://xxxxx.png" or "https://xxxxx.jpg" or "file://xxxxx.png" or "data:image/png;base64,xxxxxxxx", will be split by "base64,"
183+
},
184+
{
185+
"type": "text",
186+
"text": example["instruction"]
187+
},
188+
],
189+
},
190+
]
191+
192+
# inference
193+
pred = inference(conversation, model, tokenizer, data_processor, use_placeholder=True, topk=3)
194+
px, py = pred["topk_points"][0]
195+
print(f"Predicted click point: [{round(px, 4)}, {round(py, 4)}]")
196+
197+
# >> Model Response
198+
# Intruction: close this window
199+
# ground-truth action region (x1, y1, x2, y2): [0.9479, 0.1444, 0.9938, 0.2074]
200+
# Predicted click point: [0.9709, 0.1548]
201+
```
202+
137203
## :+1: Acknowledgements
138204

139205
This project is built upon the following projects. Thanks for their great work!
@@ -150,13 +216,13 @@ We also thank the authors of the following projects for their insightful work, a
150216
## :memo: Citation
151217
If you find this work useful in your research, please consider citing:
152218
```bibtex
153-
@article{wu2025guiactor,
219+
@misc{wu2025guiactor,
154220
title={GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents},
155221
author={Qianhui Wu and Kanzhi Cheng and Rui Yang and Chaoyun Zhang and Jianwei Yang and Huiqiang Jiang and Jian Mu and Baolin Peng and Bo Qiao and Reuben Tan and Si Qin and Lars Liden and Qingwei Lin and Huan Zhang and Tong Zhang and Jianbing Zhang and Dongmei Zhang and Jianfeng Gao},
156222
year={2025},
157-
eprint={},
223+
eprint={2506.03143},
158224
archivePrefix={arXiv},
159225
primaryClass={cs.CV},
160-
url={},
226+
url={https://arxiv.org/abs/2506.03143},
161227
}
162228
```

0 commit comments

Comments
 (0)