Skip to content

Commit a95170a

Browse files
Update README.md
- Fix "peform" → "perform" in Highlights section - Fix "Intruction" → "Instruction" in Example usage section (2 occurrences)
1 parent 10c8590 commit a95170a

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Figure 1. **Left**: Model performance vs. training data scale on the ScreenSpot
4646

4747
💡 **Rethink how humans interact with digital interfaces**: humans do NOT calculate precise screen coordinates before acting—they perceive the target element and interact with it directly.
4848

49-
🚀 **We propose _GUI-Actor_, a VLM enhanced by an action head, to mitigate the above limitations.** The attention-based action head not only enables GUI-Actor to peform coordinate-free GUI grounding that more closely aligns with human behavior, but also can generate multiple candidate regions in a single forward pass, offering flexibility for downstream modules such as search strategies.
49+
🚀 **We propose _GUI-Actor_, a VLM enhanced by an action head, to mitigate the above limitations.** The attention-based action head not only enables GUI-Actor to perform coordinate-free GUI grounding that more closely aligns with human behavior, but also can generate multiple candidate regions in a single forward pass, offering flexibility for downstream modules such as search strategies.
5050

5151
**We design a _grounding verifier_ to evaluate and select the most plausible action region** among the candidates proposed from the action attention map. We show that this verifier can be easily integrated with other grounding methods for a further performance boost.
5252

@@ -161,7 +161,7 @@ model = Qwen2VLForConditionalGenerationWithPointer.from_pretrained(
161161
# prepare example
162162
dataset = load_dataset("rootsautomation/ScreenSpot")["test"]
163163
example = dataset[0]
164-
print(f"Intruction: {example['instruction']}")
164+
print(f"Instruction: {example['instruction']}")
165165
print(f"ground-truth action region (x1, y1, x2, y2): {[round(i, 2) for i in example['bbox']]}")
166166

167167
conversation = [
@@ -196,7 +196,7 @@ px, py = pred["topk_points"][0]
196196
print(f"Predicted click point: [{round(px, 4)}, {round(py, 4)}]")
197197

198198
# >> Model Response
199-
# Intruction: close this window
199+
# Instruction: close this window
200200
# ground-truth action region (x1, y1, x2, y2): [0.9479, 0.1444, 0.9938, 0.2074]
201201
# Predicted click point: [0.9709, 0.1548]
202202
```

0 commit comments

Comments
 (0)