Our agent is a Vision-based AI that performs tasks autonomously. Without relying on traditional APIs, it "sees" the screen like a human and switches between apps to complete workflows.
Click the image below to watch the full 3-minute demo of the Vision Agent in action:
[![Watch the video]https://youtu.be/37aNowmgseE?si=0E_oWtX5i6ppjbd1
Most automation tools (like Zapier or Selenium) require backend access or fixed element IDs. DroidSync-Vision-Agent is unique for the following reasons:
- Zero API Dependency: Works on any app without needing official API access.
- Vision Over Code: Resilient to UI changes; identifies elements visually.
- Complex Data Reasoning: Handles unstructured date/time extraction from emails.
- B2B Impact: Automates manual scheduling, saving significant employee time.
The agent follows this natural language instruction to complete the task: "Open the Gmail app and find the latest message containing the word 'Zoom Meeting'. Read the date and time mentioned in the email. Close the app, open calender app and create a new event with that specific date and time with the title 'Work Sync'. Save the event and close the calendar view."
- Gmail Data Extraction: The agent identifies the specific message containing 'Meeting'.
- Contextual Reasoning: It extracts the Date and Time and stores it in memory.
- Calendar Integration: It opens the System Calendar and creates the event.
- Smart Notification: The native Calendar app triggers a notification once saved.
Follow these steps to get the agent running on your local machine:
- Python 3.13+: Ensure you have the latest Python version installed.
- ADB Tools: Install Android Debug Bridge and add it to your System Path.
- Mobile Device: Enable USB Debugging on your Android phone.
Create a .env file in your project folder and add your key.
GOOGLE_API_KEY=your_actual_api_key_hereThe agent communicates with your device via the Android Debug Bridge (ADB):
- Step A: Enable Developer Options and USB Debugging on your phone.
- Step B: Connect your phone to your PC via USB.
- Step C: Verify the connection by running:
adb devicesExecution Flow Run the main script to start the automation:
python main.pyTerminal Logs: Monitor the logs for "Action: Clicking Gmail"
Visual Confirmation: Watch your phone screen navigate autonomously.
Final Output: You will see a system notification from your Calendar app
- Language: Python 3.13+
- AI Model: Google Gemini (Vision Capabilities)
- Device Control: ADB (Android Debug Bridge)
- Environment Management: Python-dotenv (for secure API keys)
Our Vision Agent follows a 4-step autonomous loop to complete the task without any human intervention:
-
👀 Vision Perception: The agent captures a screenshot of the Android screen.
-
🧠 Reasoning: It analyzes the screen using the Gemini model to find the Gmail icon or meeting details.
-
🎯 Action Planning: It decides the next click or scroll based on the prompt in
prompt.txt. -
📱 Execution: It sends an ADB command to the device to perform the touch/swipe action.
#MobilerunCloud #DroidSync #B2B #AI #Python #Automation #EmployeeEfficiency