Skip to content

FEATURE REQUEST - Add support for mouse drag operations (mousedown → mousemove → mouseup) for 3D object manipulation #86

@yezimichelle

Description

@yezimichelle

Feature Request: 3D Object Rotation / Mouse Drag Support

Summary

Nova Act currently lacks support for continuous mouse drag interactions, which are essential for rotating 3D objects, drag-and-drop functionality, and other complex pointer-based interactions on modern web applications.

Problem Statement

Many modern web applications include:

  • 3D product viewers (e-commerce)
  • CAD/design tools
  • Map interactions
  • Image editors
  • Game interfaces
  • Slider controls

These require continuous mouse drag operations that involve:

  1. mousedown at starting position
  2. mousemove across a path (with coordinates)
  3. mouseup at ending position

Currently, Nova Act cannot perform these chained pointer actions, limiting its usefulness for automation on 3D-enabled websites.

Proposed Solution

New Action Type: drag or mouse_drag

from nova_act import NovaAct

with NovaAct(starting_page="https://example.com/3d-viewer") as nova:
    
    # Option 1 (Preferred): Natural language instruction
    # This aligns with Nova Act's core strength of understanding natural language
    nova.act("Rotate the 3D Amazon Echo Dot to view the back side of the device")
    
    # Option 2: Explicit drag action
    nova.drag(
        start=(500, 400),
        end=(300, 400),
        duration=1.0,  # seconds
        steps=20       # interpolation points
    )
    
    # Option 3: Path-based drag for complex movements
    nova.drag_path(
        points=[(500, 400), (400, 350), (300, 400)],
        duration=1.5
    )

Why Option 1 is Preferred

  • Maintains consistency with Nova Act's natural language-first approach
  • No need for users to calculate pixel coordinates
  • AI can intelligently determine drag direction and distance based on intent
  • More accessible for non-technical users
  • Adapts to different screen sizes and element positions automatically

Example: Rotating Amazon Echo Dot

from nova_act import NovaAct

# Real-world example: Inspecting Amazon Echo Dot from all angles
with NovaAct(starting_page="https://www.amazon.com/dp/B09B8V1LZ3") as nova:
    
    # View the front of the device
    nova.act("Rotate the 3D Amazon Echo Dot to show the front with the LED ring")
    
    # View the back to see the ports
    nova.act("Rotate the 3D Amazon Echo Dot to view the power port on the back")
    
    # View from the top
    nova.act("Rotate the 3D Amazon Echo Dot to view it from the top")
    
    # Spin it around completely
    nova.act("Slowly rotate the 3D Amazon Echo Dot 360 degrees")

Technical Implementation Suggestion

Leverage existing Playwright low-level mouse API:

async def perform_drag(page, start_x, start_y, end_x, end_y, steps=10):
    
    # Move to start position
    await page.mouse.move(start_x, start_y)
    
    # Press mouse button
    await page.mouse.down()
    
    # Interpolate movement
    for i in range(1, steps + 1):
        x = start_x + (end_x - start_x) * i / steps
        y = start_y + (end_y - start_y) * i / steps
        await page.mouse.move(x, y)
        await asyncio.sleep(0.05)
    
    # Release mouse button
    await page.mouse.up()

Use Cases

Industry Application Interaction Needed
E-commerce 3D product viewers Rotate product
Real Estate Virtual tours Pan/rotate view
Automotive Car configurators Rotate vehicle
Gaming Web games Drag controls
Design Figma, Canva Move objects
Maps Google Maps Pan navigation

Expected Behavior

Input (Natural Language)

"Rotate the 3D Amazon Echo device 90 degrees to the right"

Nova Act Should

  1. Identify the 3D viewer element on Amazon product page
  2. Calculate center point of viewer
  3. Execute horizontal drag from center-left to center-right
  4. Verify rotation occurred (optional)

Priority

High — 3D web experiences are increasingly common in e-commerce and enterprise applications. This gap significantly limits Nova Act's automation capabilities.

Additional Context

  • Similar tools like Selenium and Puppeteer support low-level mouse actions
  • Playwright already has mouse.down(), mouse.move(), mouse.up() APIs
  • This is a matter of exposing/integrating these into Nova Act's action space
  • Amazon's own product pages feature 3D viewers that Nova Act currently cannot interact with

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions