MZKScraper is a Python API wrapper for the Moravská Zemská Knihovna Digital Library, enabling users to search, retrieve, and process publicly available documents using flexible query parameters.
The MZKScraper class provides a simple interface for discovering document UUIDs that match your criteria. Once retrieved, these UUIDs can be used to access detailed information or content via the IIIF API.
For example, the get_pages_in_document method returns UUIDs of a document’s individual pages, which can then be downloaded with the download_image method.
- Search the MZK digital collection using multiple parameters (text, authors, keywords, access rights, etc.).
- Retrieve document UUIDs for further metadata or content queries.
- Automatically fetch citation data from the MZK API.
- Convert document UUIDs into BibTeX citations with unique tags (optionally including page UUIDs for page-specific references).
- Generate ISO 690 citations via the
Citationclass or directly from the API as plain text.
- Use
get_pages_in_documentwith optional parameters likevalid_labels,label_preprocessing, andlabel_formattingto filter or process pages before downloading. - Quickly open any document or page in your default web browser with
open_in_browser(document_id, page_id=None).
Install directly from GitHub:
pip install git+https://github.com/v-dvorak/mzkscraperOr use it as a Git submodule in your own project:
git submodule add https://github.com/v-dvorak/mzkscraper
cd mzkscraper
python -m pip install -r requirements.txt
python -m pip install -e .For example usage, see example.ipynb.
text_queryaccesskeywordsauthorslanguageslicenseslocationspublishersplacesgenresdoctypespublished_frompublished_to
For full details, refer to the Digital Library documentation.
If no results are returned:
- Validate your query manually in the digital library. If you see the message “Attention! No results found. Please, try a different query.”, the parameters may be invalid or overly restrictive.
- Check spelling and diacritics.
Example:
authors="Komensky, Jan Amos"will not find anything,authors="Komenský, Jan Amos"will return a list of books.
- Try longer timeouts.
Pages with multiple filters take longer to load. Increase the
timeoutparameter if necessary.
Interactions with MZK or IIIF may occasionally result in 4xx or 5xx errors. These are most probably issues with the source service - wait a bit and retry.
For simplicity, MZKScraper does not attempt to handle or retry these automatically.