Web Scraper for 晋江文学城 (jjwxc.net)

A Python script to scrape publicly available novel and chapter data from jjwxc.net, a popular Chinese literature platform.

📌 Features

Novel Metadata: Scrape titles, authors, genres, latest update date, and status.
Chapter Details: Extract chapter titles, word counts, and content.
Structured Output: Save data to JSON/TXT.

📦 Requirements

Python
Libraries: requests, beautifulsoup4

Install dependencies:

pip install requests beautifulsoup4

🚀 Usage

Clone the repository:

git clone https://github.com/chenxing-dev/jjwxc-scraper.git
cd jjwxc-scraper

Run the scraper:

python jjwxc_scraper.py \
     --url "https://www.jjwxc.net/bookbase.php?xx=3&sortType=1" \
     --pages 5

Output:
- Data is saved to novels.json (or scraped_novels/*.txt).

🔍 Data Structure

Novel Metadata (JSON)

{
  "url": "https://www.jjwxc.net/onebook.php?novelid=7487846",
    "id": "7487846",
    "title": "穿越后，加入合欢宗",
    "author": "七月岸",
    "genre": "原创-百合-架空历史-仙侠",
    "brief_intro": "大师姐是块玉，也很欲…",
    "tags": ["灵魂转换", "穿越时空", "仙侠修真", "爽文", "沙雕"],
    "characters": ["云暖", "江淮烟"],
    "themes": "坚持热爱，去遇见美好。",
    "summary": "云暖穿成了修真界第一宗门的少宗主…",
    "status": "连载",
    "word_count": "208523字",
    "collected_count": "4361",
    "chapters": [
        {
            "number": "1",
            "title": "第 1 章",
            "url": "http://www.jjwxc.net/onebook.php?novelid=7487846&chapterid=1",
            "preview": "穿越开始",
            "word_count": "3078",
            "update_time": "2025-04-14 18:33:16",
            "is_latest": false,
            "content": "云暖望着满目残垣断壁和数不清的尸首..."
        },
        # ... more chapters
    ]
}

Plain Text

作品: 穿越后，加入合欢宗
作者: 七月岸
类型: 原创-百合-架空历史-仙侠
进度: 连载
字数: 208523字
URL: https://www.jjwxc.net/onebook.php?novelid=7487846

============================ 文案 ============================
云暖穿成了修真界第一宗门的少宗主…

内容标签: 灵魂转换 穿越时空 仙侠修真 爽文 沙雕
主角: 云暖, 江淮烟
一句话简介: 大师姐是块玉，也很欲…
立意: 坚持热爱，去遇见美好。

========================= 1: 第 1 章 =========================
URL: http://www.jjwxc.net/onebook.php?novelid=7487846&chapterid=1
字数: 3078

云暖望着满目残垣断壁和数不清的尸首...

==================================================

🛑 Disclaimer

This project is for educational purposes only. Use it responsibly and respect the target website's terms of service. The developers are not liable for misuse or damages.

📜 License

MIT License. See LICENSE for details.

🔧 Limitations

Website structure changes may break the scraper (update CSS selectors as needed).
Does not handle JavaScript-rendered content (static HTML only).

🤝 Contributing

Contributions are welcome! Open an issue or submit a PR for improvements or bug fixes.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
collage.sh		collage.sh
jjwxc-gl-cover-collage.jpg		jjwxc-gl-cover-collage.jpg
jjwxc_analyzer.py		jjwxc_analyzer.py
jjwxc_scraper.py		jjwxc_scraper.py
report.md		report.md
report_example.md		report_example.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraper for 晋江文学城 (jjwxc.net)

📌 Features

📦 Requirements

🚀 Usage

🔍 Data Structure

Novel Metadata (JSON)

Plain Text

🛑 Disclaimer

📜 License

🔧 Limitations

🤝 Contributing

About

Uh oh!

Languages

License

dev-chenxing/jjwxc-scraper

Folders and files

Latest commit

History

Repository files navigation

Web Scraper for 晋江文学城 (jjwxc.net)

📌 Features

📦 Requirements

🚀 Usage

🔍 Data Structure

Novel Metadata (JSON)

Plain Text

🛑 Disclaimer

📜 License

🔧 Limitations

🤝 Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages