- Before converting to vectorDB, extract TXT from various file formats: PDF, Excel, PPT, Word, PNG, JPG.
- When converting to TXT, display the file title and saved path.
paddleocr-gpu / cuda12.6
- main.py
- ppt2pdf.py : convert *.ppt / *.pptx to pdf format by libreoffice