Releases: StabRise/ScaleDP
Releases · StabRise/ScaleDP
0.3.0
ScaleDP 0.30
What's Changed
🚀 Features
- Added TextEmbeddings transformer, for compute embedding using SentenceTransformers
- Added BaseTextSplitter and TextSplitter for semantic splitting text
- Added support pandas udf for TextSplitter
- Added support TextChunks as input to TextEmbeddings
📚 Documentation
- Added TextEmbedding and TextSplitter docs
📘 Jupyter Notebooks
- TextSplitterAndEmbeddings.ipynb - Read pdf documents, split text into chunks and compute embeddings in scale
0.2.6
ScaleDP 0.2.6
What's Changed
🚀 Features
- Enable support GPU in YoloOnnxDetector
🔄 Updates
- Change default values for detectors
📘 Jupyter Notebooks
- YoloOnnxDetectorBenchamrks.ipynb - Benchmarking YOLO model with different parameters configurations on CPU and GPU
📝 Blog Posts

0.2.5
ScaleDP 0.2.5
What's Changed
🚀 Features
- Added param 'returnEmpty' to ImageCropBoxes for avoid to have exceptions if no boxes are found
- Added labels param to the YoloOnnxDetector
- Improve displaying labels in ImageDrawBoxes
🧰 Maintenance
- Updated versions of dependencies (Pandas, Numpy, OpenCV)
🐛 Bug Fixes
- Fixed convert color schema in YoloOnnxDetector
- Fixed show utils on Google Colab
- Fixed imports of the DataFrame
📘 Jupyter Notebooks
📝 Blog Posts

0.2.4
ScaleDP 0.2.4
What's Changed
🚀 Features
- Added FaceDetector transformer
- Added SignatureDetector transformers
- Added PdfAssembler transformer for assembling PDFs
- Updated ImageCropBoxes to support multiple boxes
- Added LineOrientation detector model to the TesseractRecognizer
- Added possibility to use subfields in Show Utils
- Added padding option to YoloOnnxDetector
🐛 Bug Fixes
- Fixed borders in Show Utils
0.2.2
ScaleDP 0.2.2
What's Changed
- Integrated with Spark PDF DataSource
- Added Object detection by @mykolamelnykml in #47
- Update LLM extractors by @mykolamelnykml in #48
- Improve LLM extractors and another workflows by @mykolamelnykml in #50
- Added LLM Ocr by @mykolamelnykml in #52
- Add LLMNer by @mykolamelnykml in #54
- Added run Black and Ruff in actions by @mykolamelnykml in #55
- Updated VisualLLMExtractor by @mykolamelnykml in #58
- Updated tutorials by @mykolamelnykml in #60
- Improve VisualLLMextractor by @mykolamelnykml in #62
Related posts:
Full Changelog: 0.1.0rc10...0.2.2
0.1.0rc10
What's Changed
- Added EasyOcr in https://github.com/StabRise/spark-pdf/pull/39
- Added DocTR in https://github.com/StabRise/spark-pdf/pull/44
- Added Surya Ocr in https://github.com/StabRise/spark-pdf/pull/38
- Added support PyTesseract lib for binding to tesseract in StabRise/spark-pdf#4
- Added line width param to the ImageDrawBoxes n StabRise/spark-pdf#5
- Added textSize param to the ImageDrawBoxes in StabRise/spark-pdf#7
*Added list of displayed data to the imagedrawregions in StabRise/spark-pdf#9 - Initialize sphinx docs in StabRise/spark-pdf#20
- Improved test coverage in StabRise/spark-pdf#22
- Added TextToDocument transformer in StabRise/spark-pdf#27
- Refactoring in StabRise/spark-pdf#30
- Changed Ner transformer for work with raw text in https://github.com/StabRise/spark-pdf/pull/31
- Added dockerfile in https://github.com/StabRise/spark-pdf/pull/37, StabRise/spark-pdf#16
Full Changelog: https://github.com/StabRise/spark-pdf/commits/0.1.0rc10
