Draft
Conversation
Centralises the resolve → branch → construct pattern for local HF embedding models (VL and non-VL) that was duplicated across batch, inprocess, fused, gpu_pool, recall, retriever, and text_embed code paths into a single `create_local_embedder` factory function. Made-with: Cursor
Extracts duplicated LanceDB row-building, schema definition, and table-creation logic from batch.py and inprocess.py into a shared ingest_modes/lancedb_utils.py module. Made-with: Cursor
- Remove unused Path import and unused _extract_* aliases from inprocess.py - Remove unused pytest import from test_lancedb_utils.py - Apply black formatting to set literal and DataFrame constructor Made-with: Cursor
…import The ingest_modes __init__.py eagerly imports batch/fused/inprocess/online which pull in ray, torch, etc. Pre-populate sys.modules with MagicMock stubs so lancedb_utils tests can run in lightweight CI without those deps. Made-with: Cursor
Centralises gold_to_doc_page, hit_key_and_distance, estimate_processed_pages, and print_pages_per_second that were duplicated across batch, inprocess, online, and fused pipeline examples. Fixes broken imports in fused_pipeline.py that referenced non-existent functions in batch_pipeline.py. Made-with: Cursor
Extracts duplicated detection summary computation and printing into a shared utils/detection_summary.py module, replacing ~200 lines of near-identical logic in batch_pipeline.py and inprocess.py with thin wrappers around the shared implementation. Made-with: Cursor
Consolidates the duplicated _coerce_params pattern and embed parameter flattening logic from batch.py and inprocess.py into a shared params/utils.py module with coerce_params and build_embed_kwargs helpers. Made-with: Cursor
…determine device for nemotron_parse
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Generalized cleanup of dead code, code that has been superseded, and components that generally serve no purpose and just add to confusion
Checklist