generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 139
Open
Description
Feature Request: 3D Object Rotation / Mouse Drag Support
Summary
Nova Act currently lacks support for continuous mouse drag interactions, which are essential for rotating 3D objects, drag-and-drop functionality, and other complex pointer-based interactions on modern web applications.
Problem Statement
Many modern web applications include:
- 3D product viewers (e-commerce)
- CAD/design tools
- Map interactions
- Image editors
- Game interfaces
- Slider controls
These require continuous mouse drag operations that involve:
mousedownat starting positionmousemoveacross a path (with coordinates)mouseupat ending position
Currently, Nova Act cannot perform these chained pointer actions, limiting its usefulness for automation on 3D-enabled websites.
Proposed Solution
New Action Type: drag or mouse_drag
from nova_act import NovaAct
with NovaAct(starting_page="https://example.com/3d-viewer") as nova:
# Option 1 (Preferred): Natural language instruction
# This aligns with Nova Act's core strength of understanding natural language
nova.act("Rotate the 3D Amazon Echo Dot to view the back side of the device")
# Option 2: Explicit drag action
nova.drag(
start=(500, 400),
end=(300, 400),
duration=1.0, # seconds
steps=20 # interpolation points
)
# Option 3: Path-based drag for complex movements
nova.drag_path(
points=[(500, 400), (400, 350), (300, 400)],
duration=1.5
)Why Option 1 is Preferred
- Maintains consistency with Nova Act's natural language-first approach
- No need for users to calculate pixel coordinates
- AI can intelligently determine drag direction and distance based on intent
- More accessible for non-technical users
- Adapts to different screen sizes and element positions automatically
Example: Rotating Amazon Echo Dot
from nova_act import NovaAct
# Real-world example: Inspecting Amazon Echo Dot from all angles
with NovaAct(starting_page="https://www.amazon.com/dp/B09B8V1LZ3") as nova:
# View the front of the device
nova.act("Rotate the 3D Amazon Echo Dot to show the front with the LED ring")
# View the back to see the ports
nova.act("Rotate the 3D Amazon Echo Dot to view the power port on the back")
# View from the top
nova.act("Rotate the 3D Amazon Echo Dot to view it from the top")
# Spin it around completely
nova.act("Slowly rotate the 3D Amazon Echo Dot 360 degrees")Technical Implementation Suggestion
Leverage existing Playwright low-level mouse API:
async def perform_drag(page, start_x, start_y, end_x, end_y, steps=10):
# Move to start position
await page.mouse.move(start_x, start_y)
# Press mouse button
await page.mouse.down()
# Interpolate movement
for i in range(1, steps + 1):
x = start_x + (end_x - start_x) * i / steps
y = start_y + (end_y - start_y) * i / steps
await page.mouse.move(x, y)
await asyncio.sleep(0.05)
# Release mouse button
await page.mouse.up()Use Cases
| Industry | Application | Interaction Needed |
|---|---|---|
| E-commerce | 3D product viewers | Rotate product |
| Real Estate | Virtual tours | Pan/rotate view |
| Automotive | Car configurators | Rotate vehicle |
| Gaming | Web games | Drag controls |
| Design | Figma, Canva | Move objects |
| Maps | Google Maps | Pan navigation |
Expected Behavior
Input (Natural Language)
"Rotate the 3D Amazon Echo device 90 degrees to the right"
Nova Act Should
- Identify the 3D viewer element on Amazon product page
- Calculate center point of viewer
- Execute horizontal drag from center-left to center-right
- Verify rotation occurred (optional)
Priority
High — 3D web experiences are increasingly common in e-commerce and enterprise applications. This gap significantly limits Nova Act's automation capabilities.
Additional Context
- Similar tools like Selenium and Puppeteer support low-level mouse actions
- Playwright already has
mouse.down(),mouse.move(),mouse.up()APIs - This is a matter of exposing/integrating these into Nova Act's action space
- Amazon's own product pages feature 3D viewers that Nova Act currently cannot interact with
Metadata
Metadata
Assignees
Labels
No labels