MZKScraper

MZKScraper is a Python API wrapper for the Moravská Zemská Knihovna Digital Library, enabling users to search, retrieve, and process publicly available documents using flexible query parameters.

The MZKScraper class provides a simple interface for discovering document UUIDs that match your criteria. Once retrieved, these UUIDs can be used to access detailed information or content via the IIIF API. For example, the get_pages_in_document method returns UUIDs of a document’s individual pages, which can then be downloaded with the download_image method.

Features

Document Search

Search the MZK digital collection using multiple parameters (text, authors, keywords, access rights, etc.).
Retrieve document UUIDs for further metadata or content queries.

Citation Retrieval

Automatically fetch citation data from the MZK API.
Convert document UUIDs into BibTeX citations with unique tags (optionally including page UUIDs for page-specific references).
Generate ISO 690 citations via the Citation class or directly from the API as plain text.

Page Handling

Use get_pages_in_document with optional parameters like valid_labels, label_preprocessing, and label_formatting to filter or process pages before downloading.
Quickly open any document or page in your default web browser with open_in_browser(document_id, page_id=None).

Installation

Install directly from GitHub:

pip install git+https://github.com/v-dvorak/mzkscraper

Or use it as a Git submodule in your own project:

git submodule add https://github.com/v-dvorak/mzkscraper
cd mzkscraper
python -m pip install -r requirements.txt
python -m pip install -e .

For example usage, see example.ipynb.

Supported Query Parameters

text_query
access
keywords
authors
languages
licenses
locations
publishers
places
genres
doctypes
published_from
published_to

For full details, refer to the Digital Library documentation.

Troubleshooting

Empty Results

If no results are returned:

Validate your query manually in the digital library. If you see the message “Attention! No results found. Please, try a different query.”, the parameters may be invalid or overly restrictive.
Check spelling and diacritics. Example:
- authors="Komensky, Jan Amos" will not find anything,
- authors="Komenský, Jan Amos" will return a list of books.
Try longer timeouts. Pages with multiple filters take longer to load. Increase the timeout parameter if necessary.

Handling API Errors

Interactions with MZK or IIIF may occasionally result in 4xx or 5xx errors. These are most probably issues with the source service - wait a bit and retry. For simplicity, MZKScraper does not attempt to handle or retry these automatically.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
docs		docs
mzkscraper		mzkscraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.ipynb		example.ipynb
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MZKScraper

Features

Document Search

Citation Retrieval

Page Handling

Installation

Supported Query Parameters

Troubleshooting

Empty Results

Handling API Errors

Additional Resources

About

Uh oh!

Releases 2

Packages

Languages

License

v-dvorak/mzkscraper

Folders and files

Latest commit

History

Repository files navigation

MZKScraper

Features

Document Search

Citation Retrieval

Page Handling

Installation

Supported Query Parameters

Troubleshooting

Empty Results

Handling API Errors

Additional Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages