Skip to content

Generalized code base cleanup#1529

Draft
jdye64 wants to merge 14 commits intoNVIDIA:mainfrom
jdye64:op-mach
Draft

Generalized code base cleanup#1529
jdye64 wants to merge 14 commits intoNVIDIA:mainfrom
jdye64:op-mach

Conversation

@jdye64
Copy link
Collaborator

@jdye64 jdye64 commented Mar 10, 2026

Generalized cleanup of dead code, code that has been superseded, and components that generally serve no purpose and just add to confusion

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

jdye64 added 14 commits March 6, 2026 18:38
Centralises the resolve → branch → construct pattern for local HF embedding
models (VL and non-VL) that was duplicated across batch, inprocess, fused,
gpu_pool, recall, retriever, and text_embed code paths into a single
`create_local_embedder` factory function.

Made-with: Cursor
Extracts duplicated LanceDB row-building, schema definition, and
table-creation logic from batch.py and inprocess.py into a shared
ingest_modes/lancedb_utils.py module.

Made-with: Cursor
- Remove unused Path import and unused _extract_* aliases from inprocess.py
- Remove unused pytest import from test_lancedb_utils.py
- Apply black formatting to set literal and DataFrame constructor

Made-with: Cursor
…import

The ingest_modes __init__.py eagerly imports batch/fused/inprocess/online
which pull in ray, torch, etc. Pre-populate sys.modules with MagicMock
stubs so lancedb_utils tests can run in lightweight CI without those deps.

Made-with: Cursor
Centralises gold_to_doc_page, hit_key_and_distance, estimate_processed_pages,
and print_pages_per_second that were duplicated across batch, inprocess,
online, and fused pipeline examples. Fixes broken imports in fused_pipeline.py
that referenced non-existent functions in batch_pipeline.py.

Made-with: Cursor
Extracts duplicated detection summary computation and printing into a
shared utils/detection_summary.py module, replacing ~200 lines of
near-identical logic in batch_pipeline.py and inprocess.py with thin
wrappers around the shared implementation.

Made-with: Cursor
Consolidates the duplicated _coerce_params pattern and embed parameter
flattening logic from batch.py and inprocess.py into a shared
params/utils.py module with coerce_params and build_embed_kwargs helpers.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant