GitHub - tezh404/Document-categorizer: AI-Powered Document Categorizer & Organizer

📂 AI-Powered Document Categorizer & Organizer

Supports: PDF, Markdown (.md), and Images

This project uses local LLMs (via LM Studio) to categorize files (PDFs, Markdown, and Images), then organizes them into folders based on their content.

🧠 Powered by LM Studio (Local AI)

Make sure LM Studio is running and a model is served using the OpenAI-compatible API.

To enable the API in LM Studio:

Launch LM Studio.
Load a model (e.g., gemma-3, mistral, etc.).
Go to the Server tab.
Copy the model name and API URL, and add them to your config.json.

🔧 Requirements

pip install -r requirements.txt

⚙️ Configuration (`config.json`)

⚠️ change the name of the config.example.json to config.json

{
    "path": "path-to-your-files",
    "json_path": "Path/output_file_info.json",
    "pages": 3,
    "model_name": "your-model-name",
    "api_url": "http://localhost:1234/v1/chat/completions",
    "api_key": "<API_KEY>",
    "prompt": "\"{file_name}\" , \"{title}\" , \"{text}\" Based on this information, determine the category for this document. It should be a single word in English. Example: Engineering, Computer etc."
}

📁 Step 1: Categorize Files

Run one or more of the following categorizers depending on the file type:

📄 PDF Files

python pdfCategorizer.py

→ Generates pdf_file_info.json

📝 Markdown Files

python mdCategorizer.py

→ Generates md_file_info.json

🖼️ Image Files

⚠️ Requires a model that supports image input (e.g., gemma-3-12b or gemma-3-4b)

python imgCategorizer.py

→ Generates img_file_info.json

📂 Step 2: Organize Files

After categorizing, set "json_path" in your config.json to the relevant .json file created, then run:

python Organizer.py

Files will be moved into folders based on their detected category.

✅ Example Workflow

Start LM Studio and load your model.
Categorize files:
- python pdfCategorizer.py
- python mdCategorizer.py
- python imgCategorizer.py
Run the organizer for each JSON output:
- Update json_path to match (pdf_file_info.json, etc.)
- python Organizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📂 AI-Powered Document Categorizer & Organizer

🧠 Powered by LM Studio (Local AI)

To enable the API in LM Studio:

🔧 Requirements

⚙️ Configuration (`config.json`)

📁 Step 1: Categorize Files

📄 PDF Files

📝 Markdown Files

🖼️ Image Files

📂 Step 2: Organize Files

✅ Example Workflow

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Organizer.py		Organizer.py
Readme.md		Readme.md
config.example.json		config.example.json
imgCategorizer.py		imgCategorizer.py
mdCategorizer.py		mdCategorizer.py
pdfCategorizer.py		pdfCategorizer.py
requirements.txt		requirements.txt

License

tezh404/Document-categorizer

Folders and files

Latest commit

History

Repository files navigation

📂 AI-Powered Document Categorizer & Organizer

🧠 Powered by LM Studio (Local AI)

To enable the API in LM Studio:

🔧 Requirements

⚙️ Configuration (config.json)

📁 Step 1: Categorize Files

📄 PDF Files

📝 Markdown Files

🖼️ Image Files

📂 Step 2: Organize Files

✅ Example Workflow

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

⚙️ Configuration (`config.json`)

Packages