A Python script to scrape publicly available novel and chapter data from jjwxc.net, a popular Chinese literature platform.
- Novel Metadata: Scrape titles, authors, genres, latest update date, and status.
- Chapter Details: Extract chapter titles, word counts, and content.
- Structured Output: Save data to JSON/TXT.
- Python
- Libraries:
requests,beautifulsoup4
Install dependencies:
pip install requests beautifulsoup4-
Clone the repository:
git clone https://github.com/chenxing-dev/jjwxc-scraper.git cd jjwxc-scraper -
Run the scraper:
python jjwxc_scraper.py \ --url "https://www.jjwxc.net/bookbase.php?xx=3&sortType=1" \ --pages 5 -
Output:
- Data is saved to
novels.json(orscraped_novels/*.txt).
- Data is saved to
{
"url": "https://www.jjwxc.net/onebook.php?novelid=7487846",
"id": "7487846",
"title": "穿越后,加入合欢宗",
"author": "七月岸",
"genre": "原创-百合-架空历史-仙侠",
"brief_intro": "大师姐是块玉,也很欲…",
"tags": ["灵魂转换", "穿越时空", "仙侠修真", "爽文", "沙雕"],
"characters": ["云暖", "江淮烟"],
"themes": "坚持热爱,去遇见美好。",
"summary": "云暖穿成了修真界第一宗门的少宗主…",
"status": "连载",
"word_count": "208523字",
"collected_count": "4361",
"chapters": [
{
"number": "1",
"title": "第 1 章",
"url": "http://www.jjwxc.net/onebook.php?novelid=7487846&chapterid=1",
"preview": "穿越开始",
"word_count": "3078",
"update_time": "2025-04-14 18:33:16",
"is_latest": false,
"content": "云暖望着满目残垣断壁和数不清的尸首..."
},
# ... more chapters
]
}作品: 穿越后,加入合欢宗
作者: 七月岸
类型: 原创-百合-架空历史-仙侠
进度: 连载
字数: 208523字
URL: https://www.jjwxc.net/onebook.php?novelid=7487846
============================ 文案 ============================
云暖穿成了修真界第一宗门的少宗主…
内容标签: 灵魂转换 穿越时空 仙侠修真 爽文 沙雕
主角: 云暖, 江淮烟
一句话简介: 大师姐是块玉,也很欲…
立意: 坚持热爱,去遇见美好。
========================= 1: 第 1 章 =========================
URL: http://www.jjwxc.net/onebook.php?novelid=7487846&chapterid=1
字数: 3078
云暖望着满目残垣断壁和数不清的尸首...
==================================================
This project is for educational purposes only. Use it responsibly and respect the target website's terms of service. The developers are not liable for misuse or damages.
MIT License. See LICENSE for details.
- Website structure changes may break the scraper (update CSS selectors as needed).
- Does not handle JavaScript-rendered content (static HTML only).
Contributions are welcome! Open an issue or submit a PR for improvements or bug fixes.