v0.6.1: Integrated PDF processing workflow with extralit-hf-space and incremental Dataset building with imports
LatestThis release delivers major upgrades to document processing, import workflows, and exposed additional dataset-building functionalities in the UI. Highlights include OCRmyPDF-powered PDF processing via Redis jobs, a workspace selector at breadcrumb, and incremental import with dataset mapping.
What's Changed
- [FEAT] integrate OCRmyPDF and on document upload in Redis Queue jobs by @priyankeshh and @JonnyTran in #115
- [FIX] Import Files Flow by @JonnyTran in #120
- [FEAT] Workspace Pinia Store and Dataset Breadcrumb Selector in AppHeader @JonnyTran in #121
- [FIX] Import File Parsing and Matching Flow and Refactoring by @JonnyTran in #122
- [FIX] DocumentAPI to query by params and return multiple documents & fix PDF file fetching by @JonnyTran in #123
- [FEAT] minio presigned url for pdf by @JonnyTran in #124
- [FIX] Import Analysis and Batch Refactoring, File Matching algorithm, Document Panel by @JonnyTran in #130
- [FIX] Consolidating linting configuration by @JonnyTran in #133
- [FEAT] Document workflows with rq jobs by @JonnyTran in #136
- [FEAT] Import dataset mapping by @JonnyTran in #140
Contributors
Many thanks @priyankeshh for work on the https://github.com/Extralit/extralit-hf-space repo for PyMuPDF integration.
Welcome @Mr-Youssef-Sherif!
Full Changelog: v0.6.0...v0.6.1