Update (kz)

njucckevin · njucckevin · commit 235eccc01987 · 2025-05-31T09:16:58.000Z
diff --git a/index.html b/index.html
@@ -209,12 +209,19 @@ <h3 class="title is-3" style="padding-bottom: 20px;"><span>Key Takeaways</span><
                         </ul>
                     
                         <p>💡 <strong>Rethink how humans interact with digital interfaces: <span style="color: rgb(182, 30, 30);">humans do NOT calculate precise screen coordinates before acting—they perceive the target element and interact with it directly.</strong></p>
-                        <p>🚀 <strong>We propose <span style="color: rgb(182, 30, 30);">GUI-Actor</span>, a VLM-based method for <span style="color: rgb(182, 30, 30);">coordinate-free</span> GUI grounding that <span style="color: rgb(182, 30, 30);">more closely aligns with human behavior</span> while <span style="color: rgb(182, 30, 30);">addressing the above limitations</span>:</strong></p>
+                        <!-- <p>🚀 <strong>We propose <span style="color: rgb(182, 30, 30);">GUI-Actor</span>, a VLM-based method for <span style="color: rgb(182, 30, 30);">coordinate-free</span> GUI grounding that <span style="color: rgb(182, 30, 30);">more closely aligns with human behavior</span> while <span style="color: rgb(182, 30, 30);">addressing the above limitations</span>:</strong></p>
                         <ul>
                             <li>We introduce a dedicated <i>&lt;ACTOR&gt;</i> token as the contextual anchor to encode the grounding context by jointly processing visual input and NL instructions, and adopt an <i>attention-based action head</i> to align the <i>&lt;ACTOR&gt;</i> token with most relevant GUI regions by attending over visual patch tokens from the screenshot. ✅ Explicit spatial-semantic alignment. ✅ The resulting attention map naturally identifies (multiple) actionable regions in a single forward pass, offering flexibility for downstream modules such as search strategies.</li>
                             <li>GUI-Actor is trained using <i>multi-patch supervision</i>. All visual patches overlapping with ground-truth bounding boxes are labeled as positives, while others are labeled as negatives. ✅ Reduce supervision signal ambiguity and over-penalization of valid action variations.</li>
                             <li>GUI-Actor <i>grounds actions directly at the vision module's native spatial resolution</i>. ✅ Avoid the granularity mismatch and generalize more robustly to unseen screen resolutions and layouts.</li>
                             <li>We design a <i>grounding verifier</i> to evaluate and select the most plausible action region from the candidates proposed for action execution. ✅ Can be easily integrated with other grounding methods for further performance boost.</li>
+                        </ul> -->
+                        <p>🚀 <strong>We propose <span style="color: rgb(182, 30, 30);">GUI-Actor</span>, a VLM-based method for <span style="color: rgb(182, 30, 30);">coordinate-free</span> GUI grounding that more closely aligns with human behavior while addressing the above limitations:</strong></p>
+                        <ul>
+                            <li>We introduce a dedicated <i>&lt;ACTOR&gt;</i> token as the contextual anchor and adopt an <i>attention-based action head</i> to align the <i>&lt;ACTOR&gt;</i> token with most relevant GUI regions by directly attending over visual patch tokens from the screenshot. ✅ Explicit spatial-semantic alignment.</li>
+                            <li>GUI-Actor is trained using <i>multi-patch supervision</i>. All visual patches overlapping with ground-truth bounding boxes are labeled as positives, while others are labeled as negatives. ✅ Reduce supervision signal ambiguity and over-penalization of valid action variations.</li>
+                            <li>GUI-Actor <i>grounds actions directly at the vision module's native spatial resolution</i>. ✅ Avoid the granularity mismatch and generalize more robustly to unseen screen resolutions and layouts.</li>
+                            <li>We design a <i>grounding verifier</i> to evaluate and select the most plausible action region from the candidates proposed for action execution. ✅ Can be easily integrated with other grounding methods for further performance boost.</li>
                         </ul>
                         <p>🎯 <strong>Results:</strong></p>
                         <ul>