A production-ready document converter application that transforms various document formats into Markdown or JSON with async processing, event queues, and webhook notifications.
- Multi-format Support: Converts PDF, DOCX, images, text files, and more
- Async Processing: Event queue-based processing using Celery and Redis
- Image Handling: Base64 encoding with lossless compression for images
- Webhook Notifications: Real-time status updates via webhooks
- REST API: FastAPI-based API with OpenAPI documentation
- Production Ready: Docker support, health checks, and proper error handling
- Documents: PDF, Microsoft Word (.docx), Rich Text Format (.rtf)
- Presentations: Microsoft PowerPoint (.pptx, .pptm, .potx, .potm)
- Spreadsheets: Microsoft Excel (.xlsx, .xlsm, .xls)
- Images: PNG, JPEG, GIF, BMP, WebP, ICO, TIFF
- Text: Plain text (.txt), Markdown (.md), Log files
- Other: HTML, CSV, JSON, XML
- Markdown (.md): Human-readable markdown with embedded base64 images
- Structured JSON (.json): Hierarchical data preserving document organization:
- PDF: Page-wise content structure
- PowerPoint: Slide-wise with element-level parsing
- Excel: Sheet-based hierarchical organization with table data
- Word: Section-based structure with headings and content
- Images: Base64-encoded image data included
- Clone the repository:
git clone <repository-url>
cd doc_converter- Copy environment variables:
cp .env.example .env- Start the services:
docker-compose up -d- The API will be available at
http://localhost:8000 - API documentation at
http://localhost:8000/docs
This project uses uv for fast Python package management.
- Install uv:
# Windows (using winget)
winget install --id=astral-sh.uv -e
# Linux/macOS
curl -LsSf https://astral.sh/uv/install.sh | sh
# Using pip (any platform)
pip install uv- Clone and setup:
git clone <repository-url>
cd doc_converter
uv sync # Creates .venv and installs dependencies- Start Redis:
redis-server- Start the API server:
uv run python run_api.py- Start the Celery worker:
uv run python run_worker.py- Install dependencies:
pip install -r requirements.txt- Follow steps 3-5 above, replacing
uv runwith direct Python commands.
curl -X POST "http://localhost:8000/api/v1/jobs" \
-H "Content-Type: multipart/form-data" \
-F "file=@document.pdf" \
-F "output_format=md" \
-F "webhook_url=https://your-webhook-url.com/callback"Response:
{
"id": "123e4567-e89b-12d3-a456-426614174000",
"status": "pending",
"progress": 0,
"created_at": "2023-12-01T10:00:00Z",
"updated_at": "2023-12-01T10:00:00Z"
}curl "http://localhost:8000/api/v1/jobs/123e4567-e89b-12d3-a456-426614174000"curl "http://localhost:8000/api/v1/jobs/123e4567-e89b-12d3-a456-426614174000/result" \
-o converted_document.mdWhen a job completes (successfully or with failure), a webhook notification is sent:
{
"job_id": "123e4567-e89b-12d3-a456-426614174000",
"status": "completed",
"progress": 100,
"created_at": "2023-12-01T10:00:00Z",
"updated_at": "2023-12-01T10:05:00Z",
"completed_at": "2023-12-01T10:05:00Z",
"result_url": "http://localhost:8000/api/v1/jobs/123e4567-e89b-12d3-a456-426614174000/result",
"metadata": {}
}Environment variables (see .env.example):
REDIS_URL: Redis connection URLMAX_FILE_SIZE: Maximum upload file size (bytes)DEFAULT_WEBHOOK_URL: Default webhook URL if not providedIMAGE_COMPRESSION_QUALITY: Image compression quality (1-100)UPLOAD_DIR: Directory for uploaded filesOUTPUT_DIR: Directory for converted files
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ FastAPI │ │ Redis │ │ Celery │
│ Server │◄──►│ Queue │◄──►│ Worker │
└─────────────┘ └─────────────┘ └─────────────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ File │ │ Document │
│ Storage │ │ Converters │
└─────────────┘ └─────────────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Webhook │ │ Image │
│ Service │ │ Processing │
└─────────────┘ └─────────────┘
With uv:
uv run pytest tests/ -v
uv run pytest --cov=src tests/ # With coverageTraditional:
pytest tests/ -vWith uv:
# Format code
uv run black src/ tests/
# Lint code
uv run flake8 src/ tests/
# Type checking
uv run mypy src/Traditional:
black src/ tests/
flake8 src/ tests/
mypy src/With uv:
# Add runtime dependency
uv add package-name
# Add development dependency
uv add --dev pytest-package
# Update dependencies
uv lock --upgrade && uv sync- Set up environment variables
- Configure Redis for persistence
- Set up monitoring (health endpoints available)
- Configure webhook endpoint security
- Set up file storage with proper permissions
- Configure load balancing for multiple workers
MIT License