A modern OCR application that extracts text and images from PDFs and image files using Mistral AI's OCR API.
mistral/
├── backend/ # FastAPI backend
│ ├── app/
│ │ ├── __init__.py
│ │ └── main.py
│ └── requirements.txt
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── services/ # API services
│ │ ├── styles/ # CSS styles
│ │ ├── App.jsx # Main App component
│ │ └── main.jsx # Entry point
│ ├── public/
│ ├── index.html
│ ├── package.json
│ └── vite.config.js
└── README.md
- 📄 Upload PDF, JPG, PNG, and WEBP files
- 🔍 Extract text using OCR
- 🖼️ Extract images from documents
- 📝 View extracted content in beautiful markdown format
- 🎨 Modern, responsive UI with gradient design
- ⚡ Fast and efficient processing
- Python 3.8+
- Node.js 16+ and npm
- Mistral API key
Create a .env file in the root directory:
MISTRAL_API_KEY=your_mistral_api_key_hereNavigate to the backend directory and install dependencies:
cd backend
pip install -r requirements.txtStart the backend server:
cd app
python main.pyOr using uvicorn directly:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000The backend will be available at http://localhost:8000
Navigate to the frontend directory and install dependencies:
cd frontend
npm installStart the development server:
npm run devThe frontend will be available at http://localhost:3000
To build the frontend for production:
cd frontend
npm run buildThe built files will be in the dist folder.
GET /api/health
POST /api/process
Content-Type: multipart/form-data
Body:
file: <file>
The backend is a FastAPI application. You can view the interactive API documentation at http://localhost:8000/docs when the server is running.
The frontend is built with:
- React 18 - UI library
- Vite - Build tool and dev server
- Axios - HTTP client for API calls
- Marked - Markdown parser
- CSS Modules - Component-scoped styling
- Clean API-only implementation
- CORS enabled for frontend
- File validation and size checking
- Integration with Mistral OCR API
- Error handling and proper HTTP status codes
- Component-based architecture
- Separation of concerns:
- Components: UI elements
- Services: API communication
- Styles: CSS modules for each component
- State management with React hooks
- Responsive design
- Accessible UI
Header- Application header with titleUploadArea- Drag-and-drop file uploadLoadingSpinner- Loading state indicatorErrorAlert- Error message displayResultsDisplay- Display OCR resultsImageModal- Full-size image viewer
- Maximum file size: 16MB
- Supported formats: PDF, PNG, JPG, JPEG, WEBP
- Maximum images extracted: 50 per document
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License