Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 63 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Google Search Tool
# Search Tool

A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches and extract results. It can be used directly as a command-line tool or as a Model Context Protocol (MCP) server to provide real-time search capabilities to AI assistants like Claude.
A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google and Bing searches and extract results. It can be used directly as a command-line tool or as a Model Context Protocol (MCP) server to provide real-time search capabilities to AI assistants like Claude.

[![Star History Chart](https://api.star-history.com/svg?repos=web-agent-master/google-search&type=Date)](https://star-history.com/#web-agent-master/google-search&Date)

Expand All @@ -9,6 +9,8 @@ A Playwright-based Node.js tool that bypasses search engine anti-scraping mechan
## Key Features

- **Local SERP API Alternative**: No need to rely on paid search engine results API services, all searches are executed locally
- **Multiple Search Engines Support**: Currently supports Google and Bing search engines
- **URL Content Crawler**: Extract content from any web page with customizable selectors and metadata extraction
- **Advanced Anti-Bot Detection Bypass Techniques**:
- Intelligent browser fingerprint management that simulates real user behavior
- Automatic saving and restoration of browser state to reduce verification frequency
Expand All @@ -24,6 +26,7 @@ A Playwright-based Node.js tool that bypasses search engine anti-scraping mechan
- Command-line parameter support for search keywords
- MCP server support for AI assistant integration
- Returns search results with title, link, and snippet
- URL crawler with customizable content extraction and metadata support
- JSON format output
- Support for both headless and headed modes (for debugging)
- Detailed logging output
Expand Down Expand Up @@ -69,35 +72,52 @@ This tool has been specially adapted for Windows environments:

## Usage

### Command Line Tool
### Command Line

```bash
# Direct command line usage
google-search "search keywords"

# Using command line options
google-search --limit 5 --timeout 60000 --no-headless "search keywords"
# Google search
npx google-search "your search query"
# Or with options
npx google-search --limit 5 "your search query"

# Bing search
npx bing-search "your search query"
# Or with options
npx bing-search --limit 5 "your search query"

# URL crawler
npx url-crawler "https://example.com"
# Or with options
npx url-crawler -s "article.main-content" -w "div.loaded-content" -t 30000 "https://example.com"
```

# Or using npx
npx google-search-cli "search keywords"
You can also use the subcommands:

# Run in development mode
pnpm dev "search keywords"
```bash
# Google search
npx google-search google "your search query"

# Run in debug mode (showing browser interface)
pnpm debug "search keywords"
# Bing search
npx google-search bing "your search query"
```

#### Command Line Options
### Options

- `-l, --limit <number>`: Result count limit (default: 10)
- `-t, --timeout <number>`: Timeout in milliseconds (default: 60000)
- `--no-headless`: Show browser interface (for debugging)
- `--remote-debugging-port <number>`: Enable remote debugging port (default: 9222)
- `--state-file <path>`: Browser state file path (default: ./browser-state.json)
#### Search Options
- `--limit <number>`: Limit the number of results (default: 10)
- `--timeout <number>`: Set timeout in milliseconds (default: 30000)
- `--state-file <path>`: Specify browser state file path (default: ./browser-state.json)
- `--no-save-state`: Don't save browser state
- `--locale <locale>`: Specify search result language (default: zh-CN)

#### URL Crawler Options
- `-s, --selector <selector>`: CSS selector to extract specific content
- `-w, --wait-for <selector>`: Wait for specified element to appear before extracting content
- `-t, --timeout <ms>`: Timeout in milliseconds (default: 30000)
- `--no-metadata`: Don't extract metadata
- `--no-headless`: Run browser in headed mode
- `--no-save-state`: Don't save browser state
- `-V, --version`: Display version number
- `-h, --help`: Display help information
- `--state-file <path>`: Specify browser state file path (default: ~/.url-crawler-browser-state.json)

#### Output Example

Expand Down Expand Up @@ -125,15 +145,34 @@ pnpm debug "search keywords"
}
```

#### URL Crawler Output Example

```json
{
"url": "https://example.com",
"title": "Example Domain",
"content": "Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\n\nMore information...",
"metadata": {
"viewport": "width=device-width, initial-scale=1"
},
"timestamp": "2025-03-06T07:44:05.698Z"
}
```

### MCP Server

This project provides Model Context Protocol (MCP) server functionality, allowing AI assistants like Claude to directly use Google search capabilities. MCP is an open protocol that enables AI assistants to safely access external tools and data.

```bash
# Build the project
pnpm build
# Start the MCP server
npx google-search-mcp
```

The MCP server provides three tools:
- `google-search`: For Google search
- `bing-search`: For Bing search
- `url-crawler`: For crawling and extracting content from URLs

#### Integration with Claude Desktop

1. Edit the Claude Desktop configuration file:
Expand Down
106 changes: 72 additions & 34 deletions README.zh-CN.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,23 @@
# Google 搜索工具
# 搜索工具

这是一个基于 Playwright 的 Node.js 工具,能够绕过搜索引擎的反爬虫机制,执行 Google 搜索并提取结果。它可作为命令行工具直接使用,或通过 Model Context Protocol (MCP) 服务器为 Claude 等 AI 助手提供实时搜索能力。
一个基于 Playwright 的 Node.js 工具,能够绕过搜索引擎的反爬虫机制,执行 Google 和 Bing 搜索并提取结果。它可以直接作为命令行工具使用,也可以作为 Model Context Protocol (MCP) 服务器为 Claude 等 AI 助手提供实时搜索能力。

[![Star History Chart](https://api.star-history.com/svg?repos=web-agent-master/google-search&type=Date)](https://star-history.com/#web-agent-master/google-search&Date)

## 核心亮点
[English Documentation](README.md)

- **本地化 SERP API 替代方案**:无需依赖付费的搜索引擎结果 API 服务,完全在本地执行搜索操作
## 主要特点

- **本地 SERP API 替代方案**:无需依赖付费的搜索引擎结果 API 服务,所有搜索在本地执行
- **多搜索引擎支持**:目前支持 Google 和 Bing 搜索引擎
- **URL 内容爬取器**:可提取任何网页内容,支持自定义选择器和元数据提取
- **先进的反机器人检测绕过技术**:
- 智能浏览器指纹管理,模拟真实用户行为
- 自动保存和恢复浏览器状态,减少验证频率
- 无头/有头模式智能切换,遇到验证时自动转为有头模式让用户完成验证
- 多种设备和区域设置随机化,降低被检测风险
- **MCP 服务器集成**:为 Claude 等 AI 助手提供实时搜索能力,无需额外 API 密钥
- **完全开源免费**:所有代码开源,无使用限制,可自由定制和扩展
- 智能无头/有头模式切换,在需要验证时自动切换到有头模式
- 设备和区域设置的随机化,降低检测风险
- **MCP 服务器集成**:为 Claude 等 AI 助手提供实时搜索能力,无需额外的 API 密钥
- **完全开源和免费**:所有代码开源,无使用限制,可自由定制和扩展

## 技术特性

Expand All @@ -22,6 +26,7 @@
- 支持命令行参数输入搜索关键词
- 支持作为 MCP 服务器,为 Claude 等 AI 助手提供搜索能力
- 返回搜索结果的标题、链接和摘要
- URL 爬取器支持自定义内容提取和元数据支持
- 以 JSON 格式输出结果
- 支持无头模式和有头模式(调试用)
- 提供详细的日志输出
Expand Down Expand Up @@ -67,36 +72,52 @@ pnpm link

## 使用方法

### 命令行工具
### 命令行

```bash
# 直接使用命令行
google-search "搜索关键词"

# 使用命令行选项
google-search --limit 5 --timeout 60000 --no-headless "搜索关键词"

# Google 搜索
npx google-search "你的搜索查询"
# 或者带选项
npx google-search --limit 5 "你的搜索查询"

# Bing 搜索
npx bing-search "你的搜索查询"
# 或者带选项
npx bing-search --limit 5 "你的搜索查询"

# URL 爬取器
npx url-crawler "https://example.com"
# 或者带选项
npx url-crawler -s "article.main-content" -w "div.loaded-content" -t 30000 "https://example.com"
```

# 或者使用 npx
npx google-search-cli "搜索关键词"
你也可以使用子命令:

# 开发模式运行
pnpm dev "搜索关键词"
```bash
# Google 搜索
npx google-search google "你的搜索查询"

# 调试模式运行(显示浏览器界面)
pnpm debug "搜索关键词"
# Bing 搜索
npx google-search bing "你的搜索查询"
```

#### 命令行选项
### 选项

#### 搜索选项
- `--limit <number>`:限制结果数量(默认:10)
- `--timeout <number>`:设置超时时间(毫秒)(默认:30000)
- `--state-file <path>`:指定浏览器状态文件路径(默认:./browser-state.json)
- `--no-save-state`:不保存浏览器状态
- `--locale <locale>`:指定搜索结果语言(默认:zh-CN)

- `-l, --limit <number>`: 结果数量限制(默认:10)
- `-t, --timeout <number>`: 超时时间(毫秒,默认:60000)
- `--no-headless`: 显示浏览器界面(调试用)
- `--remote-debugging-port <number>`: 启用远程调试端口(默认:9222
- `--state-file <path>`: 浏览器状态文件路径(默认:./browser-state.json)
- `--no-save-state`: 不保存浏览器状态
- `-V, --version`: 显示版本号
- `-h, --help`: 显示帮助信息
#### URL 爬取器选项
- `-s, --selector <selector>`:CSS选择器,用于提取特定内容
- `-w, --wait-for <selector>`:等待指定元素出现后再提取内容
- `-t, --timeout <ms>`:超时时间(毫秒)(默认:30000
- `--no-metadata`:不提取元数据
- `--no-headless`:使用有头模式运行浏览器
- `--no-save-state`:不保存浏览器状态
- `--state-file <path>`:指定浏览器状态文件路径(默认:~/.url-crawler-browser-state.json)

#### 输出示例

Expand Down Expand Up @@ -124,15 +145,32 @@ pnpm debug "搜索关键词"
}
```

### MCP 服务器
#### URL 爬取器输出示例

本项目提供 Model Context Protocol (MCP) 服务器功能,让 Claude 等 AI 助手直接使用 Google 搜索能力。MCP 是一个开放协议,使 AI 助手能安全访问外部工具和数据。
```json
{
"url": "https://example.com",
"title": "Example Domain",
"content": "Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\n\nMore information...",
"metadata": {
"viewport": "width=device-width, initial-scale=1"
},
"timestamp": "2025-03-06T07:44:05.698Z"
}
```

### MCP 服务器

```bash
# 构建项目
pnpm build
# 启动 MCP 服务器
npx google-search-mcp
```

MCP 服务器提供两个工具:
- `google-search`:用于 Google 搜索
- `bing-search`:用于 Bing 搜索
- `url-crawler`:用于爬取和提取URL内容

#### 与 Claude Desktop 集成

1. 编辑 Claude Desktop 配置文件
Expand Down
3 changes: 3 additions & 0 deletions bin/bing-search
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/env node

import '../dist/src/bing-index.js';
3 changes: 3 additions & 0 deletions bin/bing-search-mcp
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/env node

import '../dist/src/mcp-server.js';
3 changes: 3 additions & 0 deletions bin/bing-search-mcp.cmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
@IF EXIST "%~dp0\node.exe" (
"%~dp0\node.exe" "%~dp0\bing-search-mcp" %*
)
3 changes: 3 additions & 0 deletions bin/bing-search.cmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
@IF EXIST "%~dp0\node.exe" (
"%~dp0\node.exe" "%~dp0\bing-search" %*
)
3 changes: 3 additions & 0 deletions bin/search-mcp
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/env node

import '../dist/src/mcp-server.js';
3 changes: 3 additions & 0 deletions bin/search-mcp.cmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
@IF EXIST "%~dp0\node.exe" (
"%~dp0\node.exe" "%~dp0\search-mcp" %*
)
21 changes: 21 additions & 0 deletions bin/url-crawler
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/usr/bin/env node

// 检查是否在开发环境中运行
import { fileURLToPath } from 'url';
import { dirname, resolve } from 'path';
import { existsSync } from 'fs';
import { createRequire } from 'module';

const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
const isDevMode = process.env.NODE_ENV === 'development' || !existsSync(resolve(__dirname, '../dist'));

if (isDevMode) {
// 开发模式:使用ts-node运行TypeScript源文件
const require = createRequire(import.meta.url);
require('ts-node').register();
await import('../src/url-crawler-test.js');
} else {
// 生产模式:运行编译后的JavaScript文件
await import('../dist/src/url-crawler-test.js');
}
21 changes: 16 additions & 5 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
{
"name": "google-search-cli",
"name": "search-cli",
"version": "1.0.0",
"description": "基于 Playwright 的 Google 搜索 CLI 工具",
"description": "基于 Playwright 的 Google 和 Bing 搜索 CLI 工具",
"type": "module",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"bin": {
"google-search": "./bin/google-search",
"google-search-mcp": "./bin/google-search-mcp"
"google-search-mcp": "./bin/google-search-mcp",
"bing-search": "./bin/bing-search",
"bing-search-mcp": "./bin/bing-search-mcp",
"search-mcp": "./bin/search-mcp",
"url-crawler": "./bin/url-crawler"
},
"scripts": {
"build": "tsc",
Expand All @@ -21,7 +25,13 @@
"link": "npm link",
"clean": "node -e \"const fs = require('fs'); const path = require('path'); if (fs.existsSync('dist')) fs.rmSync('dist', { recursive: true, force: true });\"",
"mcp": "ts-node src/mcp-server.ts",
"mcp:build": "npm run build && node dist/src/mcp-server.js"
"mcp:build": "npm run build && node dist/src/mcp-server.js",
"bing": "ts-node src/bing-index.ts",
"bing:build": "npm run build && node dist/src/bing-index.js",
"bing:test": "ts-node src/bing-index.ts \"playwright typescript\"",
"bing:debug": "ts-node src/bing-index.ts --no-headless \"playwright typescript\"",
"url-crawler": "ts-node src/url-crawler-test.ts",
"url-crawler:build": "npm run build && node dist/src/url-crawler-test.js"
},
"repository": {
"type": "git",
Expand Down Expand Up @@ -57,5 +67,6 @@
},
"engines": {
"node": ">=16.0.0"
}
},
"packageManager": "yarn@1.22.22+sha1.ac34549e6aa8e7ead463a7407e1c7390f61a6610"
}
Loading