Skip to content

ML2jinmyoung/preprocessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pre-processing for vectorDB

1. Function:

  • Before converting to vectorDB, extract TXT from various file formats: PDF, Excel, PPT, Word, PNG, JPG.
  • When converting to TXT, display the file title and saved path.

2. OCR:

paddleocr-gpu / cuda12.6

3. Functions:

  1. main.py
  2. ppt2pdf.py : convert *.ppt / *.pptx to pdf format by libreoffice

About

preprocessing for vectorDB

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages