This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Setup virtual environment and Install Python dependencies
make install
# Run development servers
make start # Backend server only
make watch # Backend with auto-reload
# Build
make build # Full build (web + wheel + exe)
make build-wheel # Python wheel only
make build-exe # PyInstaller executable
# Linting
make lint # Both Python and web
make lint-py # Python (flake8)
make lint-web # Web frontend (eslint)
# Dependencies (uv)
make add-dep httpx # Add runtime dependency
make add-dev black # Add dev dependency
make rm-dep httpx # Remove runtime dependency
make rm-dev black # Remove dev dependency
# Docker Commands
make docker-build # Build application Docker image
make docker-up # Start containers in background
make docker-down # Stop containers
make docker-logs
# Run from source directly
uv run python -m lncrawllncrawl/__main__.py: Entry point, callsmain()fromlncrawl.applncrawl/context.py:AppContextsingleton manages all services via lazy-loaded propertieslncrawl/app.py: CLI setup using Typer, registers subcommands (crawl, search, server, sources, config)
Location: sources/ - Organized by language (en/, zh/, ja/, etc.), English further split alphabetically
Key Base Classes:
lncrawl/core/crawler.py:Crawlerclass - extend this to create new source crawlers- Required:
read_novel_info(),download_chapter_body() - Optional:
initialize(),login(),search_novel()
- Required:
lncrawl/core/scraper.py:Scraperbase - HTTP requests, BeautifulSoup parsing, Cloudflare handlinglncrawl/core/taskman.py:TaskManager- ThreadPoolExecutor for concurrent operations
Crawler Registration: lncrawl/services/sources/service.py
- Crawlers auto-discovered by importing Python files from source directories
- Each crawler has:
__id__,base_url,versionmetadata - Full-text search index for fast source lookup
FastAPI Server: lncrawl/server/app.py
- REST API at
/api, frontend at/ - Endpoints in
lncrawl/server/api/: novels, chapters, volumes, jobs, artifacts, auth, libraries
Binder Service: lncrawl/services/binder/
epub.py: Native EPUB generation (primary format)calibre.py: Converts EPUB to other formats (MOBI, PDF, DOCX, etc.) via Calibrejson.py,text.py: JSON and plain text formats
lncrawl/models/: Chapter, Volume, Novel, SearchResult, Sessionlncrawl/dao/: Database access objects (SQLAlchemy/SQLModel)
lncrawl is structured around a set of services, each responsible for a major piece of application functionality. All core services are accessed via the singleton AppContext (usually as ctx in code). This allows shared, on-demand initialization and simple dependency management throughout the app.
Key services available from the app context:
- config: Configuration manager (loads and saves user settings, environment variables, CLI flags)
- db: Database access (SQLModel, manages library, jobs, novels, chapters)
- http: HTTP client (handles web requests, session, caching, retries, Cloudflare support)
- sources: Source crawler registry/discovery (loads all available crawlers, search/index)
- crawler: The currently selected/active crawler instance (handles all scraping logic for a selected source)
- binder: Output/binding manager (handles EPUB generation, invoking Calibre for conversions, text export)
- jobs: Background job/task queue (for crawling, downloads, conversions)
- novels: Novel library management (add/remove novels, metadata)
- chapters: Chapters manager (fetch/save chapter data)
- volumes: Volumes manager (volume/chapter grouping, manipulation)
All services are lazily loaded as properties of the context to optimize performance and resource use.
For command-line tools, FastAPI server endpoints, and crawlers, you should always access shared services via ctx for consistency and compatibility.
Full guide: .github/docs/CREATING_CRAWLERS.md
Recommended: Copy sources/_examples/_01_general_soup.py and use GeneralSoupTemplate (lncrawl.templates.soup.general). Implement:
- Required:
parse_title(soup),parse_cover(soup),parse_chapter_list(soup)(yieldChapter/Volume),select_chapter_body(soup)(return the chapter text Tag). - Optional:
parse_authors(soup),parse_summary(soup),get_novel_soup(),initialize(),login().
For search use _02_searchable_soup.py (SearchableSoupTemplate)
For volumes use _05_with_volume_soup.py or _07_optional_volume_soup.py
For JS-heavy sites use browser examples (_09–_17).
Alternative: base Crawler with read_novel_info() and download_chapter_body() via _00_basic.py.
Test: uv run python -m lncrawl -s "URL" --first 3 -f and uv run python -m lncrawl sources list | grep mysite.