Skip to content

dev-chenxing/jjwxc-scraper

Repository files navigation

Web Scraper for 晋江文学城 (jjwxc.net)

A Python script to scrape publicly available novel and chapter data from jjwxc.net, a popular Chinese literature platform.

📌 Features

  • Novel Metadata: Scrape titles, authors, genres, latest update date, and status.
  • Chapter Details: Extract chapter titles, word counts, and content.
  • Structured Output: Save data to JSON/TXT.

📦 Requirements

  • Python
  • Libraries: requests, beautifulsoup4

Install dependencies:

pip install requests beautifulsoup4

🚀 Usage

  1. Clone the repository:

    git clone https://github.com/chenxing-dev/jjwxc-scraper.git
    cd jjwxc-scraper
  2. Run the scraper:

    python jjwxc_scraper.py \
         --url "https://www.jjwxc.net/bookbase.php?xx=3&sortType=1" \
         --pages 5
  3. Output:

    • Data is saved to novels.json (or scraped_novels/*.txt).

🔍 Data Structure

Novel Metadata (JSON)

{
  "url": "https://www.jjwxc.net/onebook.php?novelid=7487846",
    "id": "7487846",
    "title": "穿越后,加入合欢宗",
    "author": "七月岸",
    "genre": "原创-百合-架空历史-仙侠",
    "brief_intro": "大师姐是块玉,也很欲…",
    "tags": ["灵魂转换", "穿越时空", "仙侠修真", "爽文", "沙雕"],
    "characters": ["云暖", "江淮烟"],
    "themes": "坚持热爱,去遇见美好。",
    "summary": "云暖穿成了修真界第一宗门的少宗主…",
    "status": "连载",
    "word_count": "208523字",
    "collected_count": "4361",
    "chapters": [
        {
            "number": "1",
            "title": "第 1 章",
            "url": "http://www.jjwxc.net/onebook.php?novelid=7487846&chapterid=1",
            "preview": "穿越开始",
            "word_count": "3078",
            "update_time": "2025-04-14 18:33:16",
            "is_latest": false,
            "content": "云暖望着满目残垣断壁和数不清的尸首..."
        },
        # ... more chapters
    ]
}

Plain Text

作品: 穿越后,加入合欢宗
作者: 七月岸
类型: 原创-百合-架空历史-仙侠
进度: 连载
字数: 208523字
URL: https://www.jjwxc.net/onebook.php?novelid=7487846

============================ 文案 ============================
云暖穿成了修真界第一宗门的少宗主…

内容标签: 灵魂转换 穿越时空 仙侠修真 爽文 沙雕
主角: 云暖, 江淮烟
一句话简介: 大师姐是块玉,也很欲…
立意: 坚持热爱,去遇见美好。

========================= 1: 第 1 章 =========================
URL: http://www.jjwxc.net/onebook.php?novelid=7487846&chapterid=1
字数: 3078

云暖望着满目残垣断壁和数不清的尸首...

==================================================

🛑 Disclaimer

This project is for educational purposes only. Use it responsibly and respect the target website's terms of service. The developers are not liable for misuse or damages.

📜 License

MIT License. See LICENSE for details.

🔧 Limitations

  • Website structure changes may break the scraper (update CSS selectors as needed).
  • Does not handle JavaScript-rendered content (static HTML only).

🤝 Contributing

Contributions are welcome! Open an issue or submit a PR for improvements or bug fixes.

About

A Python script to scrape publicly available novel and chapter data from 晋江文学城 https://www.jjwxc.net

Topics

Resources

License

Stars

Watchers

Forks