Skip to content

Commit 235eccc

Browse files
committed
Update (kz)
1 parent 0af2d22 commit 235eccc

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

index.html

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,12 +209,19 @@ <h3 class="title is-3" style="padding-bottom: 20px;"><span>Key Takeaways</span><
209209
</ul>
210210

211211
<p>💡 <strong>Rethink how humans interact with digital interfaces: <span style="color: rgb(182, 30, 30);">humans do NOT calculate precise screen coordinates before acting—they perceive the target element and interact with it directly.</strong></p>
212-
<p>🚀 <strong>We propose <span style="color: rgb(182, 30, 30);">GUI-Actor</span>, a VLM-based method for <span style="color: rgb(182, 30, 30);">coordinate-free</span> GUI grounding that <span style="color: rgb(182, 30, 30);">more closely aligns with human behavior</span> while <span style="color: rgb(182, 30, 30);">addressing the above limitations</span>:</strong></p>
212+
<!-- <p>🚀 <strong>We propose <span style="color: rgb(182, 30, 30);">GUI-Actor</span>, a VLM-based method for <span style="color: rgb(182, 30, 30);">coordinate-free</span> GUI grounding that <span style="color: rgb(182, 30, 30);">more closely aligns with human behavior</span> while <span style="color: rgb(182, 30, 30);">addressing the above limitations</span>:</strong></p>
213213
<ul>
214214
<li>We introduce a dedicated <i>&lt;ACTOR&gt;</i> token as the contextual anchor to encode the grounding context by jointly processing visual input and NL instructions, and adopt an <i>attention-based action head</i> to align the <i>&lt;ACTOR&gt;</i> token with most relevant GUI regions by attending over visual patch tokens from the screenshot. ✅ Explicit spatial-semantic alignment. ✅ The resulting attention map naturally identifies (multiple) actionable regions in a single forward pass, offering flexibility for downstream modules such as search strategies.</li>
215215
<li>GUI-Actor is trained using <i>multi-patch supervision</i>. All visual patches overlapping with ground-truth bounding boxes are labeled as positives, while others are labeled as negatives. ✅ Reduce supervision signal ambiguity and over-penalization of valid action variations.</li>
216216
<li>GUI-Actor <i>grounds actions directly at the vision module's native spatial resolution</i>. ✅ Avoid the granularity mismatch and generalize more robustly to unseen screen resolutions and layouts.</li>
217217
<li>We design a <i>grounding verifier</i> to evaluate and select the most plausible action region from the candidates proposed for action execution. ✅ Can be easily integrated with other grounding methods for further performance boost.</li>
218+
</ul> -->
219+
<p>🚀 <strong>We propose <span style="color: rgb(182, 30, 30);">GUI-Actor</span>, a VLM-based method for <span style="color: rgb(182, 30, 30);">coordinate-free</span> GUI grounding that more closely aligns with human behavior while addressing the above limitations:</strong></p>
220+
<ul>
221+
<li>We introduce a dedicated <i>&lt;ACTOR&gt;</i> token as the contextual anchor and adopt an <i>attention-based action head</i> to align the <i>&lt;ACTOR&gt;</i> token with most relevant GUI regions by directly attending over visual patch tokens from the screenshot. ✅ Explicit spatial-semantic alignment.</li>
222+
<li>GUI-Actor is trained using <i>multi-patch supervision</i>. All visual patches overlapping with ground-truth bounding boxes are labeled as positives, while others are labeled as negatives. ✅ Reduce supervision signal ambiguity and over-penalization of valid action variations.</li>
223+
<li>GUI-Actor <i>grounds actions directly at the vision module's native spatial resolution</i>. ✅ Avoid the granularity mismatch and generalize more robustly to unseen screen resolutions and layouts.</li>
224+
<li>We design a <i>grounding verifier</i> to evaluate and select the most plausible action region from the candidates proposed for action execution. ✅ Can be easily integrated with other grounding methods for further performance boost.</li>
218225
</ul>
219226
<p>🎯 <strong>Results:</strong></p>
220227
<ul>

0 commit comments

Comments
 (0)