You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
print(f"ground-truth action region (x1, y1, x2, y2): {[round(i, 2) for i in example['bbox']]}")
165
+
166
+
conversation = [
167
+
{
168
+
"role": "system",
169
+
"content": [
170
+
{
171
+
"type": "text",
172
+
"text": "You are a GUI agent. You are given a task and a screenshot of the screen. You need to perform a series of pyautogui actions to complete the task.",
173
+
}
174
+
]
175
+
},
176
+
{
177
+
"role": "user",
178
+
"content": [
179
+
{
180
+
"type": "image",
181
+
"image": example["image"], # PIL.Image.Image or str to path
182
+
# "image_url": "https://xxxxx.png" or "https://xxxxx.jpg" or "file://xxxxx.png" or "data:image/png;base64,xxxxxxxx", will be split by "base64,"
183
+
},
184
+
{
185
+
"type": "text",
186
+
"text": example["instruction"]
187
+
},
188
+
],
189
+
},
190
+
]
191
+
192
+
# inference
193
+
pred = inference(conversation, model, tokenizer, data_processor, use_placeholder=True, topk=3)
This project is built upon the following projects. Thanks for their great work!
@@ -150,13 +216,13 @@ We also thank the authors of the following projects for their insightful work, a
150
216
## :memo: Citation
151
217
If you find this work useful in your research, please consider citing:
152
218
```bibtex
153
-
@article{wu2025guiactor,
219
+
@misc{wu2025guiactor,
154
220
title={GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents},
155
221
author={Qianhui Wu and Kanzhi Cheng and Rui Yang and Chaoyun Zhang and Jianwei Yang and Huiqiang Jiang and Jian Mu and Baolin Peng and Bo Qiao and Reuben Tan and Si Qin and Lars Liden and Qingwei Lin and Huan Zhang and Tong Zhang and Jianbing Zhang and Dongmei Zhang and Jianfeng Gao},
0 commit comments