Gemini Scribe

A FastAPI service for converting PDF documents to clean Markdown using Google's Gemini AI.

Features

PDF to Markdown Conversion: Advanced text extraction from PDF documents using Gemini AI
Google Cloud Integration: Native support for Google Cloud Storage and Vertex AI
High Performance: Async processing with FastAPI and optimized image handling
Production Ready: Docker containerization, error handling, and comprehensive logging
Developer Friendly: Full type hints, comprehensive tests, and development tooling

Architecture

Core Components

FastAPI Application: CORS-enabled REST API with structured endpoints
Cloud Storage Service: Google Cloud Storage integration for file operations
PDF Processing Engine: Converts PDF pages to images using pdf2image
Text Extraction: Gemini-powered image-to-markdown conversion
Configuration Management: Environment-based settings with Pydantic validation

Processing Flow

Client uploads PDF via gs:// URI to /extract_text endpoint
Service downloads PDF from Google Cloud Storage
PDF converted to images using pdf2image
Images processed by Gemini with structured prompts
Response parsed to extract clean Markdown
Temporary files cleaned up and results returned

Prerequisites

Python 3.12+
Poetry for dependency management
Google Cloud Platform account
Google Cloud Storage bucket
Gemini API access (Vertex AI or direct API)

Installation

Using Poetry (Recommended)

# Clone the repository
git clone https://github.com/Gal-Gilor/gemini-scribe.git
cd gemini-scribe

# Install dependencies
poetry install

# Activate virtual environment
poetry shell

Using pip

git clone https://github.com/Gal-Gilor/gemini-scribe.git
cd gemini-scribe

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt  # Generate with: poetry export -f requirements.txt --output requirements.txt

Configuration

Environment Variables

Copy the example environment file and configure:

cp .env.example .env

Set these variables in your .env file:

# Google Authentication
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1

# Vertex AI (Optional)
GOOGLE_GENAI_USE_VERTEXAI=True
GEMINI_API_KEY=your-gemini-api-key

# Cloud Storage 
GOOGLE_CLOUD_BUCKET=your-bucket-name

# Application 
DEVELOPMENT=True

Authentication

For Google Cloud services, authenticate using:

# Service account (production)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

# OR user account (development)
gcloud auth application-default login

Usage

Start the Server

# Development
poetry run python gemini_scribe/main.py
# OR
python gemini_scribe/main.py

# Production with Uvicorn
uvicorn gemini_scribe.main:app --host 0.0.0.0 --port 8080

API Endpoints

Extract Text from PDF

POST /extract_text

Request:

{
  "uri": "gs://your-bucket/path/to/document.pdf"
}

Response:

{
  "markdown": "# Extracted Document Content\n\nDocument text converted to markdown...",
  "pages_processed": 5,
  "processing_time_seconds": 12.34
}

Health Check

GET /health

Response:

{
  "status": "healthy",
  "timestamp": "2024-01-01T12:00:00Z"
}

Example Usage

import requests

# Extract text from PDF
response = requests.post(
    "http://localhost:8080/extract_text",
    json={"uri": "gs://my-bucket/document.pdf"}
)

if response.status_code == 200:
    result = response.json()
    print(f"Extracted {result['pages_processed']} pages")
    print(result["markdown"])

Development

Code Quality Tools

# Format code
black .
ruff format .

# Sort imports
isort .

# Lint and auto-fix
ruff check . --fix

# Type checking (if mypy is added)
mypy gemini_scribe/

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=gemini_scribe

# Run specific test file
pytest tests/test_text_extraction.py

# Run with verbose output
pytest -v

Development Server

# Start with auto-reload
uvicorn gemini_scribe.main:app --reload --port 8080

Deployment

Docker

# Build image
docker build -t gemini-scribe .

# Run container
docker run -p 8080:8080 \
  -e GOOGLE_CLOUD_PROJECT=your-project \
  -e GOOGLE_CLOUD_BUCKET=your-bucket \
  -e GOOGLE_GENAI_USE_VERTEXAI=true \
  gemini-scribe

Google Cloud Run

# Deploy to Cloud Run
gcloud run deploy gemini-scribe-service \
    --image gcr.io/your-project/gemini-scribe \
    --platform managed \
    --region us-central1 \
    --allow-unauthenticated \
    --port 8080 \
    --memory 2Gi \
    --cpu 2 \
    --set-env-vars GOOGLE_CLOUD_PROJECT=your-project,GOOGLE_CLOUD_BUCKET=your-bucket,GOOGLE_CLOUD_LOCATION=us-central1,GOOGLE_GENAI_USE_VERTEXAI=true

Project Structure

gemini-scribe/
├── gemini_scribe/
│   ├── endpoints/          # API route handlers
│   ├── models/            # Pydantic data models
│   ├── services/          # Core business logic
│   ├── templates/         # AI prompt templates
│   ├── main.py           # FastAPI application
│   └── settings.py       # Configuration management
├── tests/                # Test suite
├── .env.example         # Environment configuration template
├── Dockerfile           # Container configuration
├── pyproject.toml      # Project dependencies
└── README.md           # This file

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make changes following code style guidelines
Run tests: pytest
Run code quality checks: ruff check . --fix && black .
Commit changes: git commit -m "Description"
Push branch: git push origin feature-name
Create a Pull Request

Development Guidelines

Follow existing code style and patterns
Add tests for new functionality
Update documentation as needed
Use meaningful commit messages
Ensure all tests pass before submitting PR

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Support

Issues: GitHub Issues
Author: Gal Gilor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemini Scribe

Features

Architecture

Core Components

Processing Flow

Prerequisites

Installation

Using Poetry (Recommended)

Using pip

Configuration

Environment Variables

Authentication

Usage

Start the Server

API Endpoints

Extract Text from PDF

Health Check

Example Usage

Development

Code Quality Tools

Testing

Development Server

Deployment

Docker

Google Cloud Run

Project Structure

Contributing

Development Guidelines

License

Support

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
gemini_scribe		gemini_scribe
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

License

Gal-Gilor/gemini-scribe

Folders and files

Latest commit

History

Repository files navigation

Gemini Scribe

Features

Architecture

Core Components

Processing Flow

Prerequisites

Installation

Using Poetry (Recommended)

Using pip

Configuration

Environment Variables

Authentication

Usage

Start the Server

API Endpoints

Extract Text from PDF

Health Check

Example Usage

Development

Code Quality Tools

Testing

Development Server

Deployment

Docker

Google Cloud Run

Project Structure

Contributing

Development Guidelines

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages