Skip to content

Releases: askui/vision-agent

v0.22.12

07 Jan 10:13
5c5d846

Choose a tag to compare

What's Changed

🚀 New Features

  • Background Scheduler for Executing Run
  • Create SBOM

🐛 Bug Fixes

  • Unauthorized issues with long Running instances

Full Changelog: v0.22.11...v0.22.12

v0.22.11

18 Dec 09:27

Choose a tag to compare

What's Changed

🚀 New Features

  • io_publisher for communicating events via stdio

🐛 Bug Fixes

  • Added encoding='utf-8' to all file operations
  • Handle messages without tool_result block correctly

Full Changelog: v0.22.10...v0.22.11

v0.22.10

15 Dec 09:39
cd1dae3

Choose a tag to compare

What's Changed

🐛 Bug Fixes

Pip Install

• Support for fastmcp 2.14.* installed via pip

Full Changelog: v0.22.9...v0.22.10

v0.22.9

11 Dec 09:14
49e41c5

Choose a tag to compare

Release Notes: v0.22.9

🚀 New Features

Android Agent

  • Added device parameter to AndroidVisionAgent constructor for device selection by serial number or index
  • Added act_tools parameter to AndroidVisionAgent (matching VisionAgent functionality)

Reporting

  • Added theme toggle (light/dark) to HTML reports
  • Updated HTML report styling with CSS variables and improved color scheme
  • Added reporting messages for key_combination(), shell(), drag_and_drop(), and swipe() methods
  • Moved reporting from AndroidAgentOsFacade to PpadbAgentOs to eliminate duplicate reporting

🐛 Bug Fixes

Android Agent

  • Added UnknownAndroidDisplay class for handling cases where display information cannot be determined

Full Changelog: v0.22.8...v0.22.9

v0.22.8

09 Dec 20:26
c1a4fb3

Choose a tag to compare

What's Changed

Full Changelog: v0.22.7...v0.22.8

v0.22.7

04 Dec 09:45
41583f7

Choose a tag to compare

What's Changed

🚀 Features

  • OpenTelemetry Tracing Docs
  • Introduce branching for conversations. Support rerunning and editing user messages.

🐛 Bug Fixes

  • Use computer-use-2025-11-24 beta flag for claude-opus-4-5-20251101

Full Changelog: v0.22.6...v0.22.7

v0.22.6

27 Nov 13:02
7bf0189

Choose a tag to compare

What's Changed

  • feat: enhance Agent Android tap tool to allow N taps in a row with optional delay by @mlikasam-askui in #199

🚀 Features

  • Enhanced Agent Tap Actions
    • The Android Vision Agent can now perform multiple consecutive taps on Android devices with a configurable delay between taps, enabling more complex instructions.
      • Supports commands like:
        agent.act("increase the temperature in the climate control by 20 degrees with 0 milliseconds delay between taps")
      • Handles multi-tap instructions precisely, useful for automated agent-driven tasks
      • Delay between taps can be specified in milliseconds

Full Changelog: v0.22.5...v0.22.6

v0.22.5

27 Nov 08:42
9babb58

Choose a tag to compare

What's Changed

🚀 Features

  • Allure Test Reporting Integration

    • AllureReporter: Seamlessly integrate Vision Agent test results into the Allure reporting framework.
      • Records each agent interaction as a structured Allure test step
      • Automatically attaches screenshots to improve debugging and test insights
      • Performs eager dependency checking — if Allure is not installed, an ImportError is raised during initialization
    from askui import VisionAgent
    from askui.reporting import AllureReporter
    
    with VisionAgent(reporter=[AllureReporter()]) as agent:
        agent.act("Click the login button")
        # Actions become Allure steps with screenshots in the report

    Requirements
    Install at least one of the following packages:

    • allure-python-commons
    • allure-pytest
    • allure-behave

    👉 Install with:

    pip install allure-python-commons

🛠️ Fixes

  • Android Multi-Screen Support
    • Improved handling of multiple displays on Android devices

Full Changelog: v0.22.4...v0.22.5

v0.22.4

24 Nov 10:51
b17eaf2

Choose a tag to compare

What's Changed

🚀 Features

  • Element Annotation:

    • annotate() method: Generate interactive HTML files that visualize detected UI elements on screenshots. The generated HTML allows users to:
      • View bounding boxes around all detected elements
      • Hover over elements to see their names and text values
      • Click on elements to copy their text values to the clipboard
    from askui import VisionAgent
    
    with VisionAgent() as agent:
        # Annotate current screen and save to default 'annotations' directory
        agent.annotate()
    
        # Or specify custom screenshot and output directory
        agent.annotate(screenshot="screenshot.png", annotation_dir="htmls")

    Also works with AndroidVisionAgent:

    from askui import AndroidVisionAgent
    
    with AndroidVisionAgent() as agent:
        agent.annotate()
    • locate_all_elements() method: Retrieve all detected elements programmatically as a list of DetectedElement objects:
    from askui import VisionAgent
    
    with VisionAgent() as agent:
        detected_elements = agent.locate_all_elements()
        print(f"Found {len(detected_elements)} elements: {detected_elements}")
    
        # Access element properties
        for element in detected_elements:
            print(f"Name: {element.name}, Text: {element.text}")
            print(f"Position: {element.center}, Size: {element.width}x{element.height}")
    • New Data Models:
      • DetectedElement: Represents a detected UI element with name, text, and bounding_box properties, plus convenience properties for center, width, and height
      • BoundingBox: Represents element coordinates with xmin, ymin, xmax, ymax, plus convenience properties for width, height, and center
  • Chat API Model Selection: Chat runs can now specify which model to use via the model parameter in the run creation request, allowing dynamic model selection per run instead of using only the configured default model.

📜 Documentation

  • AndroidVisionAgent class: Added comprehensive docstring with detailed parameter descriptions and usage examples for the AndroidVisionAgent class (src/askui/android_agent.py:41-62).

Full Changelog: v0.22.3...v0.22.4

v0.22.3

20 Nov 08:45

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.22.2...v0.22.3