AI-powered automation framework for Web and Android with natural language-driven UI operations - Java version
Midscene Java is a revolutionary AI-powered automation framework designed for UI automation operations on Web and Android platforms. It is the Java implementation of Midscene Python, inheriting its core philosophy: making automation as simple as speaking.
- Natural Language Operations - Describe operation intentions in everyday language, and AI will automatically understand and execute them
- Intelligent Element Locating - Multi-strategy fusion, automatically selects the optimal positioning method, adapts to page changes
- Structured Data Extraction - Use natural language to extract complex structured data
- Intelligent Assertion Verification - Describe verification conditions in natural language, AI automatically judges
- Multi-Platform Support - Unified interface supports Web and Android platforms
- Visual Debugging - Detailed execution screenshots and decision process recording
- Code Optimization and Refactoring - Systematically refactored for more modular and maintainable code
midscene-java/
βββ packages/
β βββ core/ # Core module, providing Agent and AI engine
β βββ web/ # Web automation module
β β βββ playwright/ # Playwright implementation
β β βββ selenium/ # Selenium implementation
β βββ android/ # Android automation module
β βββ cli/ # Command line tool
β βββ examples/ # Example code
β βββ playground/ # Development testing environment
β βββ tests/ # Test cases
βββ apps/ # Application examples
βββ docs/ # Project documentation and optimization plans
βββ wiki/ # Project wiki documentation
- Java 17+
- Maven 3.6+ or Gradle 7.0+
- Browser (Chrome/Firefox/Edge, for Web automation)
- AI model API Key (Choose one from OpenAI, Claude, Qwen, or Gemini)
Add Midscene Java dependencies to your pom.xml file:
<dependencies>
<!-- Core module -->
<dependency>
<groupId>com.midscene</groupId>
<artifactId>midscene-core</artifactId>
<version>0.1.1</version>
</dependency>
<!-- Web automation modules (choose as needed) -->
<dependency>
<groupId>com.midscene</groupId>
<artifactId>midscene-web-playwright</artifactId>
<version>0.1.1</version>
</dependency>
<dependency>
<groupId>com.midscene</groupId>
<artifactId>midscene-web-selenium</artifactId>
<version>0.1.1</version>
</dependency>
<!-- Android automation module (choose as needed) -->
<dependency>
<groupId>com.midscene</groupId>
<artifactId>midscene-android</artifactId>
<version>0.1.1</version>
</dependency>
</dependencies>Create an application.properties or application.yml file to configure the AI model:
# application.properties
midscene.ai.provider=openai
midscene.ai.model=gpt-4-vision-preview
midscene.ai.api-key=your_openai_api_key_herepackage com.example;
import com.midscene.core.Agent;
import com.midscene.web.playwright.PlaywrightPage;
import com.midscene.web.playwright.PlaywrightUIContextProvider;
import com.microsoft.playwright.Playwright;
import com.microsoft.playwright.Browser;
import com.microsoft.playwright.Page;
public class SearchExample {
public static void main(String[] args) {
try (Playwright playwright = Playwright.create()) {
// Create browser instance
Browser browser = playwright.chromium().launch();
Page page = browser.newPage();
// Create PlaywrightPage wrapper
PlaywrightPage playwrightPage = new PlaywrightPage(page);
// Create Agent
Agent agent = new Agent(new PlaywrightUIContextProvider(playwrightPage));
// Navigate to website
page.navigate("https://www.baidu.com");
// Use natural language for search
agent.aiAction("Type 'Java tutorial' in the search box");
agent.aiAction("Click the search button");
// Verify search results
agent.aiAssert("The page displays search results for Java tutorials");
System.out.println("β
Search operation completed!");
// Close browser
browser.close();
}
}
}package com.example;
import com.midscene.core.Agent;
import com.midscene.web.playwright.PlaywrightPage;
import com.midscene.web.playwright.PlaywrightUIContextProvider;
import com.microsoft.playwright.Playwright;
import com.microsoft.playwright.Browser;
import com.microsoft.playwright.Page;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class ExtractExample {
public static void main(String[] args) {
try (Playwright playwright = Playwright.create()) {
Browser browser = playwright.chromium().launch();
Page page = browser.newPage();
PlaywrightPage playwrightPage = new PlaywrightPage(page);
Agent agent = new Agent(new PlaywrightUIContextProvider(playwrightPage));
// Visit news website
page.navigate("https://news.example.com");
// Extract structured data
Map<String, Object> schema = new HashMap<>();
schema.put("articles", List.of(
Map.of(
"title", "News title",
"time", "Publish time",
"summary", "News summary"
)
));
Map<String, Object> newsData = agent.aiExtract(schema);
// Output results
List<Map<String, String>> articles = (List<Map<String, String>>) newsData.get("articles");
for (Map<String, String> article : articles) {
System.out.println("π° " + article.get("title"));
System.out.println("β° " + article.get("time"));
System.out.println("π " + article.get("summary") + "\n");
}
browser.close();
}
}
}package com.example;
import com.midscene.core.Agent;
import com.midscene.android.AndroidDevice;
import com.midscene.android.AndroidUIContextProvider;
import java.util.concurrent.CompletableFuture;
public class AndroidExample {
public static void main(String[] args) {
// Connect to Android device
AndroidDevice device = new AndroidDevice();
CompletableFuture<Void> connectFuture = device.connect();
connectFuture.join(); // Wait for connection to complete
try {
// Create Agent
Agent agent = new Agent(new AndroidUIContextProvider(device));
// Launch application
agent.aiAction("Launch the settings app");
// Perform operations
agent.aiAction("Tap on the Wi-Fi option");
agent.aiAssert("The Wi-Fi settings page is open");
System.out.println("β
Android automation operation completed!");
} finally {
device.disconnect();
}
}
}- Project Overview - Chinese only
- Installation and Configuration - Chinese only
- Quick Start - Chinese only
- API Reference
- Example Code
- Frequently Asked Questions - Chinese only
- Core Concepts - Chinese only
- Platform Integration
| Feature | Traditional Automation Tools | Midscene Java |
|---|---|---|
| Learning Curve | Steep, requires learning complex APIs | Gentle, natural language driven |
| Code Readability | Obscure and hard to understand | Intuitive and easy to understand |
| Maintenance Cost | High, requires extensive modifications for page changes | Low, AI automatically adapts to changes |
| Element Locating | Manual selector writing | AI intelligent locating |
| Error Handling | Manual handling of various exceptions | AI automatic retry and recovery |
| Cross-Platform | Requires learning different tools | Unified interface |
| Code Quality | Varies by project | Systematically refactored, modular design |
We welcome all forms of contributions! Whether it's submitting bug reports, feature requests, documentation improvements, or code contributions.
- Fork this repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Create a Pull Request
# Clone the repository
git clone https://github.com/Master-Frank/midscene-java.git
cd midscene-java
# Build the project
mvn clean install
# Run tests
mvn test- Follow commit message conventions from Conventional Commits
- Add corresponding test cases for new features
- Add JavaDoc documentation for public APIs
- Keep code modular, avoid overly long methods
This project is licensed under the MIT License - see the LICENSE file for details.
Thanks to Midscene Project: https://github.com/web-infra-dev/midscene for inspiration and technical references
- GitHub: Master-Frank/midscene-java
- Issue Reporting: GitHub Issues
- Discussions: GitHub Discussions
β If this project helps you, please give us a star!