Skip to content

v0.6.1: Integrated PDF processing workflow with extralit-hf-space and incremental Dataset building with imports

Latest

Choose a tag to compare

@JonnyTran JonnyTran released this 29 Aug 07:30
· 102 commits to develop since this release

This release delivers major upgrades to document processing, import workflows, and exposed additional dataset-building functionalities in the UI. Highlights include OCRmyPDF-powered PDF processing via Redis jobs, a workspace selector at breadcrumb, and incremental import with dataset mapping.

What's Changed

  • [FEAT] integrate OCRmyPDF and on document upload in Redis Queue jobs by @priyankeshh and @JonnyTran in #115
  • [FIX] Import Files Flow by @JonnyTran in #120
  • [FEAT] Workspace Pinia Store and Dataset Breadcrumb Selector in AppHeader @JonnyTran in #121
  • [FIX] Import File Parsing and Matching Flow and Refactoring by @JonnyTran in #122
  • [FIX] DocumentAPI to query by params and return multiple documents & fix PDF file fetching by @JonnyTran in #123
  • [FEAT] minio presigned url for pdf by @JonnyTran in #124
  • [FIX] Import Analysis and Batch Refactoring, File Matching algorithm, Document Panel by @JonnyTran in #130
  • [FIX] Consolidating linting configuration by @JonnyTran in #133
  • [FEAT] Document workflows with rq jobs by @JonnyTran in #136
  • [FEAT] Import dataset mapping by @JonnyTran in #140

Contributors

Many thanks @priyankeshh for work on the https://github.com/Extralit/extralit-hf-space repo for PyMuPDF integration.
Welcome @Mr-Youssef-Sherif!

Full Changelog: v0.6.0...v0.6.1