Supports: PDF, Markdown (.md), and Images
This project uses local LLMs (via LM Studio) to categorize files (PDFs, Markdown, and Images), then organizes them into folders based on their content.
Make sure LM Studio is running and a model is served using the OpenAI-compatible API.
- Launch LM Studio.
- Load a model (e.g.,
gemma-3,mistral, etc.). - Go to the Server tab.
- Copy the model name and API URL, and add them to your
config.json.
pip install -r requirements.txt
⚠️ change the name of the config.example.json to config.json
{
"path": "path-to-your-files",
"json_path": "Path/output_file_info.json",
"pages": 3,
"model_name": "your-model-name",
"api_url": "http://localhost:1234/v1/chat/completions",
"api_key": "<API_KEY>",
"prompt": "\"{file_name}\" , \"{title}\" , \"{text}\" Based on this information, determine the category for this document. It should be a single word in English. Example: Engineering, Computer etc."
}Run one or more of the following categorizers depending on the file type:
python pdfCategorizer.py→ Generates pdf_file_info.json
python mdCategorizer.py→ Generates md_file_info.json
gemma-3-12b or gemma-3-4b)
python imgCategorizer.py→ Generates img_file_info.json
After categorizing, set "json_path" in your config.json to the relevant .json file created, then run:
python Organizer.pyFiles will be moved into folders based on their detected category.
-
Start LM Studio and load your model.
-
Categorize files:
python pdfCategorizer.pypython mdCategorizer.pypython imgCategorizer.py
-
Run the organizer for each JSON output:
- Update
json_pathto match (pdf_file_info.json, etc.) python Organizer.py
- Update