Skip to content

mariosotil/text-extractor

Repository files navigation

text-extractor

Extracts text from Office and PDFs files, using POI and PDFxStream, as a very, very tiny alternative to Apache Tika

This library, obviously, NO replaces Apache Tika. Only extracts text from Word, Excel, RTF and PDF files. It's based on the code found on the blog article Extract Text From pdf, office files(.doc, .ppt, .xls), open office files, .rtf, and text/plain files in Java but using the last Apache POI and PDFxStream versions (06/10/2015).

  • org.apache.poi, 3.12
  • com.snowtide.pdfxstream, 3.1.2

About

Tool for extract text from Office and PDFs files as a very, very tiny alternative to Apache Tika

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages