Releases: askui/vision-agent
v0.22.12
What's Changed
- feat: add SBOM generation and release workflow by @mlikasam-askui in #214
- Add SBOM generator by @mlikasam-askui in #215
- ci: ensure SBOM generation runs after PyPI publish by @mlikasam-askui in #216
- refactor: remove functools.cache decorator from create_api_client fun… by @danyalxahid-askui in #218
- Cl 1935 scheduling workflows which run on a specific time by @danyalxahid-askui in #217
🚀 New Features
- Background Scheduler for Executing Run
- Create SBOM
🐛 Bug Fixes
- Unauthorized issues with long Running instances
Full Changelog: v0.22.11...v0.22.12
v0.22.11
What's Changed
- fix: add encoding='utf-8' to all file operations to prevent UnicodeEn… by @programminx-askui in #213
- feat(messages): add support for injecting cancelled tool results in m… by @danyalxahid-askui in #212
- Introduce io publisher for communicating events via stdio by @onur-askui in #211
🚀 New Features
io_publisherfor communicating events via stdio
🐛 Bug Fixes
- Added encoding='utf-8' to all file operations
- Handle messages without
tool_resultblock correctly
Full Changelog: v0.22.10...v0.22.11
v0.22.10
What's Changed
- refactor(mcp_clients): improve argument passing in call_tool method by @danyalxahid-askui in #209
🐛 Bug Fixes
Pip Install
• Support for fastmcp 2.14.* installed via pip
Full Changelog: v0.22.9...v0.22.10
v0.22.9
Release Notes: v0.22.9
🚀 New Features
Android Agent
- Added
deviceparameter toAndroidVisionAgentconstructor for device selection by serial number or index - Added
act_toolsparameter toAndroidVisionAgent(matchingVisionAgentfunctionality)
Reporting
- Added theme toggle (light/dark) to HTML reports
- Updated HTML report styling with CSS variables and improved color scheme
- Added reporting messages for
key_combination(),shell(),drag_and_drop(), andswipe()methods - Moved reporting from
AndroidAgentOsFacadetoPpadbAgentOsto eliminate duplicate reporting
🐛 Bug Fixes
Android Agent
- Added
UnknownAndroidDisplayclass for handling cases where display information cannot be determined
Full Changelog: v0.22.8...v0.22.9
v0.22.8
What's Changed
- fix inference of correct computer use beta flag by @philipph-askui in #204
- Add Caching to VisionAgent by @philipph-askui in #201
Full Changelog: v0.22.7...v0.22.8
v0.22.7
What's Changed
- Automatic inference of correct Computer-use Flag by @philipph-askui in #195
- Introduce otel tracing by @onur-askui in #196
- feat(chat): add parent_id to messages for tree navigation by @danyalxahid-askui in #186
🚀 Features
- OpenTelemetry Tracing Docs
- Introduce branching for conversations. Support rerunning and editing user messages.
🐛 Bug Fixes
- Use
computer-use-2025-11-24beta flag forclaude-opus-4-5-20251101
Full Changelog: v0.22.6...v0.22.7
v0.22.6
What's Changed
- feat: enhance Agent Android tap tool to allow N taps in a row with optional delay by @mlikasam-askui in #199
🚀 Features
- Enhanced Agent Tap Actions
- The Android Vision Agent can now perform multiple consecutive taps on Android devices with a configurable delay between taps, enabling more complex instructions.
- Supports commands like:
agent.act("increase the temperature in the climate control by 20 degrees with 0 milliseconds delay between taps")
- Handles multi-tap instructions precisely, useful for automated agent-driven tasks
- Delay between taps can be specified in milliseconds
- Supports commands like:
- The Android Vision Agent can now perform multiple consecutive taps on Android devices with a configurable delay between taps, enabling more complex instructions.
Full Changelog: v0.22.5...v0.22.6
v0.22.5
What's Changed
- feat: add allure reporter by @programminx-askui in #191
- fix: android multi-screen support by @mlikasam-askui in #197
🚀 Features
-
Allure Test Reporting Integration
AllureReporter: Seamlessly integrate Vision Agent test results into the Allure reporting framework.- Records each agent interaction as a structured Allure test step
- Automatically attaches screenshots to improve debugging and test insights
- Performs eager dependency checking — if Allure is not installed, an
ImportErroris raised during initialization
from askui import VisionAgent from askui.reporting import AllureReporter with VisionAgent(reporter=[AllureReporter()]) as agent: agent.act("Click the login button") # Actions become Allure steps with screenshots in the report
Requirements
Install at least one of the following packages:allure-python-commonsallure-pytestallure-behave
👉 Install with:
pip install allure-python-commons
🛠️ Fixes
- Android Multi-Screen Support
- Improved handling of multiple displays on Android devices
Full Changelog: v0.22.4...v0.22.5
v0.22.4
What's Changed
- Feat/Add annotation function by @mlikasam-askui in #188
- docs: add docstring to AndroidVisionAgent class by @mlikasam-askui in #192
- feat(chat): select model of run with request params by @adi-wan-askui in #193
🚀 Features
-
Element Annotation:
annotate()method: Generate interactive HTML files that visualize detected UI elements on screenshots. The generated HTML allows users to:- View bounding boxes around all detected elements
- Hover over elements to see their names and text values
- Click on elements to copy their text values to the clipboard
from askui import VisionAgent with VisionAgent() as agent: # Annotate current screen and save to default 'annotations' directory agent.annotate() # Or specify custom screenshot and output directory agent.annotate(screenshot="screenshot.png", annotation_dir="htmls")
Also works with
AndroidVisionAgent:from askui import AndroidVisionAgent with AndroidVisionAgent() as agent: agent.annotate()
locate_all_elements()method: Retrieve all detected elements programmatically as a list ofDetectedElementobjects:
from askui import VisionAgent with VisionAgent() as agent: detected_elements = agent.locate_all_elements() print(f"Found {len(detected_elements)} elements: {detected_elements}") # Access element properties for element in detected_elements: print(f"Name: {element.name}, Text: {element.text}") print(f"Position: {element.center}, Size: {element.width}x{element.height}")
- New Data Models:
DetectedElement: Represents a detected UI element withname,text, andbounding_boxproperties, plus convenience properties forcenter,width, andheightBoundingBox: Represents element coordinates withxmin,ymin,xmax,ymax, plus convenience properties forwidth,height, andcenter
-
Chat API Model Selection: Chat runs can now specify which model to use via the
modelparameter in the run creation request, allowing dynamic model selection per run instead of using only the configured default model.
📜 Documentation
- AndroidVisionAgent class: Added comprehensive docstring with detailed parameter descriptions and usage examples for the
AndroidVisionAgentclass (src/askui/android_agent.py:41-62).
Full Changelog: v0.22.3...v0.22.4
v0.22.3
What's Changed
- Remove roman themed tone and voice by @onur-askui in #189
- Thinking is disabled and temperature is set to 0.0 for the Android Agent by @philipph-askui in #190
New Contributors
- @philipph-askui made their first contribution in #190
- @MichaelMayer-askui made their first contribution in #182
Full Changelog: v0.22.2...v0.22.3