Markdown Converter UI

A streamlined web interface for converting various file formats to Markdown using the MarkItDown library from Microsoft. Transform any document into clean, LLM-ready Markdown with this powerful conversion tool.

🧠 LLM-Ready Document Conversion

Markdown Converter UI leverages Microsoft's MarkItDown, an open-source server that transforms virtually any document into clean, LLM-ready Markdown:

Universal Format Support: Convert PDFs, PowerPoint presentations, Word documents, audio files, and even images into consistent Markdown
Advanced Processing: Extracts EXIF data, performs OCR on images, generates transcripts from audio, and adds AI-generated image captions
LLM Integration: Seamlessly prepare documents for local LLM applications like Cursor, Windsurf, Cline, and Claude Desktop
AI Workflow Optimization: Instantly prepare data for fine-tuning and RAG (Retrieval-Augmented Generation) workflows without manual cleanup
Scalable Document Processing: Batch support for processing multiple documents simultaneously

This tool effectively serves as an AI data engineer in your workflow, turning any knowledge base into prompt-ready content for AI assistants.

🚀 Features

Professional UI: Clean, modern interface with intuitive controls
Simple Upload Interface: Drag and drop or select files for conversion
Multiple Format Support: Convert various document formats (DOCX, PDF, HTML, etc.) to clean Markdown
Live Preview: Instantly see the converted Markdown in the browser with vertical scrolling for large documents
Download Options: Save the converted Markdown to your local machine
Batch Processing: Convert multiple files at once with tab interface
Error Handling: Clear feedback when conversion issues occur
Large File Support: Process files up to 50MB with progress indicators
Automatic Cleanup: Temporary files are automatically removed after 2 hours

📋 Requirements

Python 3.8+
Streamlit
MarkItDown library from Microsoft with extended dependencies

🔧 Installation

Clone this repository:

git clone https://github.com/ajitpal/markdown-converter-ui.git
cd markdown-converter-ui

Create and activate a virtual environment:

# Create virtual environment
python -m venv venv

# On Windows
.\venv\Scripts\activate

# On macOS/Linux
source venv/bin/activate

Install the required packages:

pip install -r requirements.txt

# Install MarkItDown with all optional dependencies (PDF, DOCX, etc. support)
pip install "markitdown[all]"

💻 Usage

Start the Streamlit application:
```
streamlit run app.py
```
Open your web browser and navigate to the URL displayed in the terminal (typically http://localhost:8501)
Use the application:
- Upload one or more files using the file upload interface
- Adjust conversion settings in the sidebar if needed
- View converted files in the preview tab
- Download the converted Markdown files using the download buttons
- Use the clean, structured Markdown with your favorite LLM tools
LLM Integration:
- Feed the converted Markdown directly into LLM applications
- Use for training data preparation in fine-tuning workflows
- Build RAG systems with the consistently formatted content
- Create knowledge bases that are instantly AI-ready

🚢 Deployment Options

Local Deployment

Run the app locally as described in the Usage section above.

Streamlit Cloud Deployment

Push your code to GitHub
Visit Streamlit Cloud
Connect your GitHub repository
Deploy the app with the following settings:
- Main file path: app.py
- Python version: 3.8 or higher
- Requirements: requirements.txt
- Advanced settings > Packages: packages.txt

Important Note for Streamlit Cloud:

If you encounter PDF conversion errors like MissingDependencyException, ensure that:

Your requirements.txt includes markitdown[all]>=0.1.0 (not just markitdown>=0.1.0)
You have a packages.txt file with the necessary system dependencies:
```
poppler-utils
tesseract-ocr
libreoffice
ffmpeg
```
If issues persist, you may need to use the Streamlit secrets management to set environment variables for the PDF processing libraries.

Docker Deployment

Create a Dockerfile in the project root:

FROM python:3.11-slim

WORKDIR /app

# Install system dependencies for PDF processing
RUN apt-get update && apt-get install -y \
    poppler-utils \
    tesseract-ocr \
    libreoffice \
    ffmpeg \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

# Explicitly install MarkItDown with all dependencies
RUN pip install "markitdown[all]"

COPY . .

EXPOSE 8501

CMD ["streamlit", "run", "app.py"]

Build and run the Docker container:

docker build -t markdown-converter-ui .
docker run -p 8501:8501 markdown-converter-ui

Access the application at http://localhost:8501

AWS/Azure Deployment

For cloud deployments on AWS, Azure, or GCP:

Build the Docker container as shown above
Push the container to a container registry (ECR, ACR, etc.)
Deploy using a service like:
- AWS App Runner
- Azure Container Instances
- Google Cloud Run

Each service will have specific steps for deployment from a container.

🛠️ Development

Project Structure

markdown-converter-ui/
├── app.py                  # Main application entry point
├── requirements.txt        # Python dependencies
├── src/                    # Source code directory
│   ├── main.py            # Core application logic
│   ├── config.py          # Configuration settings
│   ├── ui/                # UI components
│   │   ├── components.py  # Reusable UI components
│   │   ├── styles.py      # CSS styles
│   │   └── layout.py      # Page layout configuration
│   └── utils/             # Utility functions
│       ├── cleanup.py     # Temporary file cleanup
│       ├── file_helpers.py # File handling utilities
│       └── markdown_converter.py # Markdown conversion logic
├── static/                # Static assets
├── tests/                 # Test files
├── docs/                  # Documentation
└── venv/                  # Virtual environment (not in git)

Contributing

Fork the repository
Create your feature branch: git checkout -b feature/amazing-feature
Commit your changes: git commit -m 'Add some amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

Customization

You can customize the application by:

Adjusting the MAX_FILE_SIZE_MB constant in src/config.py
Modifying the CSS styles in src/ui/styles.py for UI appearance
Changing the max height for preview sections by editing the .preview-container and .stCodeBlock CSS classes
Adding additional conversion options in the sidebar
Updating the header and footer in src/ui/components.py

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Microsoft MarkItDown for the powerful conversion library that makes documents LLM-ready
Streamlit for the web application framework
The AI and LLM community for inspiring tools that bridge the gap between traditional documents and AI-ready content

📸 Screenshots

Main Interface

File Conversion

Preview Result

Troubleshooting

Common Issues

Import Errors
- If you see import errors, make sure you're running the application from the project root directory
- Ensure all dependencies are installed: pip install -r requirements.txt
- Check that your Python path includes the project root
File Conversion Issues
- Verify that the input file format is supported
- Check file size limits (default is 50MB)
- Ensure you have write permissions in the temporary directory
PDF Conversion Issues
- If you see MissingDependencyException errors, ensure you've installed MarkItDown with PDF support:
```
pip install "markitdown[all]"
# or specifically for PDF
pip install "markitdown[pdf]"
```
- Make sure you have the necessary system dependencies installed:
  - On Ubuntu/Debian: sudo apt-get install poppler-utils tesseract-ocr libreoffice ffmpeg
  - On macOS with Homebrew: brew install poppler tesseract libreoffice ffmpeg
  - On Windows: Install the appropriate binaries and ensure they're in your PATH
- For Streamlit Cloud deployment, ensure your packages.txt file includes these dependencies
- Check that your PDF files are not corrupted or password-protected
UI Issues
- Clear your browser cache if the UI is not loading properly
- Ensure you're using a modern browser (Chrome, Firefox, or Edge recommended)
- Check the browser console for any JavaScript errors
- If the "Clean All" button is not visible after uploading files, try refreshing the page
- For large documents, use the vertical scrolling in the preview and raw sections

Getting Help

If you encounter issues not covered here:

Check the application logs for detailed error messages
Review the documentation in the docs/ directory
Open an issue on the project's GitHub repository

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
about.md		about.md
app.py		app.py
packages.txt		packages.txt
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Markdown Converter UI

🧠 LLM-Ready Document Conversion

🚀 Features

📋 Requirements

🔧 Installation

💻 Usage

🚢 Deployment Options

Local Deployment

Streamlit Cloud Deployment

Docker Deployment

AWS/Azure Deployment

🛠️ Development

Project Structure

Contributing

Customization

📄 License

🙏 Acknowledgments

📸 Screenshots

Main Interface

File Conversion

Preview Result

Troubleshooting

Common Issues

Getting Help

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Markdown Converter UI

🧠 LLM-Ready Document Conversion

🚀 Features

📋 Requirements

🔧 Installation

💻 Usage

🚢 Deployment Options

Local Deployment

Streamlit Cloud Deployment

Docker Deployment

AWS/Azure Deployment

🛠️ Development

Project Structure

Contributing

Customization

📄 License

🙏 Acknowledgments

📸 Screenshots

Main Interface

File Conversion

Preview Result

Troubleshooting

Common Issues

Getting Help

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages