updated per bug review by kheiss-uwzoo · Pull Request #1621 · NVIDIA/NeMo-Retriever

kheiss-uwzoo · 2026-03-13T20:08:20Z

Overview

QA review branch that brings in upstream main and applies bug-queue doc and tooling updates. The diff is ~2,300 insertions and ~1,200 deletions across 66 files, touching docs, NeMo Retriever, Helm, the test harness, and retrieval-bench.

Documentation

Extraction docs: Updates across audio, benchmarking, quickstart (guide + library mode), support matrix, content/custom metadata, FAQ, CLI reference, Python API reference, VLM embed, user-defined functions/stages, and v2 API guide.
Naming/links: Consistent use of “NeMo Retriever Library” (replacing “NVIDIA Ingest” / “nv-ingest”) and fixes for support matrix and related links (e.g. RIVA).
Helm: README and values updates; table additions for nimOperator.rerankqa and nimOperator.ocr; nemotron rebranding.
NeMo Retriever
Markdown API: to_markdown() returns None for empty results; markdown I/O and tests adjusted (including test_io_markdown, test_html_convert, test_txt_split).
Image support: Docs for ingesting image files (batch and in-process), including extract_image_files and --input-type image.
Text chunking: .split() for token-count–based chunking (#1547).
Audio: Batched audio extraction improvements; Parakeet CTC ASR and ASR actor updates.
Build/install: Retriever installed as part of Docker build; get_hf_revision removed from code outside nemo_retriever/ (#1612).
Release: Version handling and PyPI wheel naming; NeMo Retriever LICENSE added.
Helm & Harness
Helm: RTX PRO 4500 override and obj-det warmup batch size override; reranker and OCR NIM table docs.
Harness: Wait for healthy reranker when needed for recall; retry for managed Helm port-forwards; docker-compose and Helm service manager improvements; JP20 recall config cleanup and readiness logging.
Retrieval-bench
Pipeline and modality handling improvements; refactors for retriever singletons (ColeEmbed, HF dense, Nemotron ColeEmbed VL v2, Nemotron Embed VL dense).
New BRIGHT agentic submission (bright_agentic.md, sdg.png).
CI / Release
Perform-release: Workflow and release-helm updates; reusable PyPI build/publish and release-helm workflow changes.
Misc: Redis TTL default increased to 48h for VLM captioning; in-process extract fixes for txt and reranker; source_id in LanceDB schema; rerank and release-related fixes.

updated per bug review

050943c

kheiss-uwzoo marked this pull request as ready for review March 13, 2026 20:09

kheiss-uwzoo requested a review from a team as a code owner March 13, 2026 20:09

kheiss-uwzoo requested review from jdye64, jperez999 and sosahi and removed request for a team March 13, 2026 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updated per bug review#1621

updated per bug review#1621
kheiss-uwzoo wants to merge 1 commit intoNVIDIA:26.03from
kheiss-uwzoo:kheiss/qa-review3b

kheiss-uwzoo commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kheiss-uwzoo commented Mar 13, 2026

Overview

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant