Book Metadata Extractor

A web application that extracts metadata from book images using OCR and outputs the results in JSON format.

Features

Extracts metadata from book cover images and title pages
Supports multiple image uploads (batch processing)
Extracts the following metadata:
- Title
- Authors
- ISBN (10 or 13 digits)
- Publishers
- Publication date
- Edition
Responsive web interface with drag-and-drop support
Outputs clean, structured JSON data

JSON Schema

The extracted metadata follows this schema:

{
  "title": "string | null",
  "authors": ["string"],
  "isbn": "string | null",
  "publishers": ["string"],
  "publication_date": "string | null",
  "edition": "string | null",
  "filename": "string"
}

Installation

Install Tesseract OCR
- Windows: Download and install from UB Mannheim
- macOS: brew install tesseract
- Linux: sudo apt install tesseract-ocr

Clone the repository

git clone <repository-url>
cd book_metadata_extractor

Create a virtual environment (recommended)

python -m venv venv
source venv/bin/activate  # On Windows: .\venv\Scripts\activate

Install Python dependencies
```
pip install -r requirements.txt
```

Usage

Start the application
```
python app.py
```
Open your web browser Visit http://localhost:5000
Upload book images
- Drag and drop images or click to select files
- Click "Process Images" to extract metadata

Dependencies

Python 3.7+
Tesseract OCR
Flask
pytesseract
opencv-python
Pillow
python-dateutil

License

This project is open source and available under the MIT License.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
templates		templates
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
app.py		app.py
build.sh		build.sh
metadata_extractor.py		metadata_extractor.py
render.yaml		render.yaml
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book Metadata Extractor

Features

JSON Schema

Installation

Usage

Dependencies

License

Contributing

About

Uh oh!

Releases

Packages

Languages

beckodea/book-metadata-extractor

Folders and files

Latest commit

History

Repository files navigation

Book Metadata Extractor

Features

JSON Schema

Installation

Usage

Dependencies

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages