Before you can set up and run your agent, ensure you have the following installed:
- 🔄 AskUI Shell - The command line tool for AskUI Agents.
- 🖊️ A code editor of your choice (e.g., VSCode, PyCharm).
- 🔑 An Anthropic API key - Required for the VisionAgent to function properly.
Agent OS is a device controller that allows agents to take screenshots, move the mouse, click, and type on the keyboard across any operating system.
Linux
curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.runbash /tmp/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.runcurl -L -o /tmp/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.runbash /tmp/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.runMacOS
curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.runbash /tmp/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.runpip install askuiNote: Requires Python version >=3.10.
| AskUI INFO | Anthropic INFO | |
|---|---|---|
| ENV Variables | ASKUI_WORKSPACE_ID, ASKUI_TOKEN |
ANTHROPIC_API_KEY |
| Supported Commands | click(), get(), locate(), mouse_move() |
act(), click(), get(), locate(), mouse_move() |
| Description | Faster Inference, European Server, Enterprise Ready | Supports complex actions |
To get started, set the environment variables required to authenticate with your chosen model provider.
- Create
.envfile similar to.env.templateand update required credentials.
You can test the Vision Agent with Huggingface models via their Spaces API. Please note that the API is rate-limited so for production use cases, it is recommended to choose step 3a.
Note: Hugging Face Spaces host model demos provided by individuals not associated with Hugging Face or AskUI. Don't use these models on screens with sensible information.
Supported Models:
AskUI/PTA-1OS-Copilot/OS-Atlas-Base-7Bshowlab/ShowUI-2BQwen/Qwen2-VL-2B-InstructQwen/Qwen2-VL-7B-Instruct
Example Code:
agent.click("search field", model="OS-Copilot/OS-Atlas-Base-7B")You can use Vision Agent with UI-TARS if you provide your own UI-TARS API endpoint.
-
Step: Host the model locally or in the cloud. More information about hosting UI-TARS can be found here.
-
Step: Provide the
TARS_URLandTARS_API_KEYenvironment variables to Vision Agent. -
Step: Use the
model="tars"parameter in yourclick(),get()andact()etc. commands or when initializing theVisionAgent.
This agent runs automated UI tests using Pytest and AskUI's VisionAgent. When run, it:
- Initializes the VisionAgent
- Executes test cases defined in the
tests/directory - Performs UI interactions like clicks and assertions
- Reports test results
To run your agent locally:
rm -rf venv && python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txtpytest --verboseHappy Coding! 🚀
