A FastAPI service for converting PDF documents to clean Markdown using Google's Gemini AI.
- PDF to Markdown Conversion: Advanced text extraction from PDF documents using Gemini AI
- Google Cloud Integration: Native support for Google Cloud Storage and Vertex AI
- High Performance: Async processing with FastAPI and optimized image handling
- Production Ready: Docker containerization, error handling, and comprehensive logging
- Developer Friendly: Full type hints, comprehensive tests, and development tooling
- FastAPI Application: CORS-enabled REST API with structured endpoints
- Cloud Storage Service: Google Cloud Storage integration for file operations
- PDF Processing Engine: Converts PDF pages to images using pdf2image
- Text Extraction: Gemini-powered image-to-markdown conversion
- Configuration Management: Environment-based settings with Pydantic validation
- Client uploads PDF via
gs://URI to/extract_textendpoint - Service downloads PDF from Google Cloud Storage
- PDF converted to images using pdf2image
- Images processed by Gemini with structured prompts
- Response parsed to extract clean Markdown
- Temporary files cleaned up and results returned
- Python 3.12+
- Poetry for dependency management
- Google Cloud Platform account
- Google Cloud Storage bucket
- Gemini API access (Vertex AI or direct API)
# Clone the repository
git clone https://github.com/Gal-Gilor/gemini-scribe.git
cd gemini-scribe
# Install dependencies
poetry install
# Activate virtual environment
poetry shellgit clone https://github.com/Gal-Gilor/gemini-scribe.git
cd gemini-scribe
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt # Generate with: poetry export -f requirements.txt --output requirements.txtCopy the example environment file and configure:
cp .env.example .envSet these variables in your .env file:
# Google Authentication
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1
# Vertex AI (Optional)
GOOGLE_GENAI_USE_VERTEXAI=True
GEMINI_API_KEY=your-gemini-api-key
# Cloud Storage
GOOGLE_CLOUD_BUCKET=your-bucket-name
# Application
DEVELOPMENT=TrueFor Google Cloud services, authenticate using:
# Service account (production)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
# OR user account (development)
gcloud auth application-default login# Development
poetry run python gemini_scribe/main.py
# OR
python gemini_scribe/main.py
# Production with Uvicorn
uvicorn gemini_scribe.main:app --host 0.0.0.0 --port 8080POST /extract_textRequest:
{
"uri": "gs://your-bucket/path/to/document.pdf"
}Response:
{
"markdown": "# Extracted Document Content\n\nDocument text converted to markdown...",
"pages_processed": 5,
"processing_time_seconds": 12.34
}GET /healthResponse:
{
"status": "healthy",
"timestamp": "2024-01-01T12:00:00Z"
}import requests
# Extract text from PDF
response = requests.post(
"http://localhost:8080/extract_text",
json={"uri": "gs://my-bucket/document.pdf"}
)
if response.status_code == 200:
result = response.json()
print(f"Extracted {result['pages_processed']} pages")
print(result["markdown"])# Format code
black .
ruff format .
# Sort imports
isort .
# Lint and auto-fix
ruff check . --fix
# Type checking (if mypy is added)
mypy gemini_scribe/# Run all tests
pytest
# Run with coverage
pytest --cov=gemini_scribe
# Run specific test file
pytest tests/test_text_extraction.py
# Run with verbose output
pytest -v# Start with auto-reload
uvicorn gemini_scribe.main:app --reload --port 8080# Build image
docker build -t gemini-scribe .
# Run container
docker run -p 8080:8080 \
-e GOOGLE_CLOUD_PROJECT=your-project \
-e GOOGLE_CLOUD_BUCKET=your-bucket \
-e GOOGLE_GENAI_USE_VERTEXAI=true \
gemini-scribe# Deploy to Cloud Run
gcloud run deploy gemini-scribe-service \
--image gcr.io/your-project/gemini-scribe \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--port 8080 \
--memory 2Gi \
--cpu 2 \
--set-env-vars GOOGLE_CLOUD_PROJECT=your-project,GOOGLE_CLOUD_BUCKET=your-bucket,GOOGLE_CLOUD_LOCATION=us-central1,GOOGLE_GENAI_USE_VERTEXAI=truegemini-scribe/
├── gemini_scribe/
│ ├── endpoints/ # API route handlers
│ ├── models/ # Pydantic data models
│ ├── services/ # Core business logic
│ ├── templates/ # AI prompt templates
│ ├── main.py # FastAPI application
│ └── settings.py # Configuration management
├── tests/ # Test suite
├── .env.example # Environment configuration template
├── Dockerfile # Container configuration
├── pyproject.toml # Project dependencies
└── README.md # This file
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make changes following code style guidelines
- Run tests:
pytest - Run code quality checks:
ruff check . --fix && black . - Commit changes:
git commit -m "Description" - Push branch:
git push origin feature-name - Create a Pull Request
- Follow existing code style and patterns
- Add tests for new functionality
- Update documentation as needed
- Use meaningful commit messages
- Ensure all tests pass before submitting PR
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Issues: GitHub Issues
- Author: Gal Gilor