Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
cd3c368
Update PDF blueprint architecture diagram
kheiss-uwzoo Feb 19, 2026
70b5a80
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Feb 24, 2026
7f0248c
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Feb 25, 2026
0dd5f1b
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Feb 26, 2026
dea2770
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Feb 27, 2026
3ff2f1f
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Feb 27, 2026
a886244
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 2, 2026
b44f7ad
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 2, 2026
addf637
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 2, 2026
5900322
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 4, 2026
d12df70
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 4, 2026
67e674b
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 10, 2026
83c3c42
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 10, 2026
ae5f3e2
Perform release 5 (#1559)
jdye64 Mar 11, 2026
79f4b4b
Add description name to 'Perform Release' Github action name (#1560)
jdye64 Mar 11, 2026
4131419
fixes (#1561)
jdye64 Mar 11, 2026
adbdc47
Numerous changes for perform-release workflow (#1562)
jdye64 Mar 11, 2026
2b93c78
Ensure pypi setup is correct for publishing wheels through ngcsdk (#1…
jdye64 Mar 11, 2026
4af706f
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 11, 2026
83b538b
(retriever) follow up markdown review (#1553)
jioffe502 Mar 11, 2026
7bccd8e
Kheiss/fix build (#1567)
kheiss-uwzoo Mar 11, 2026
eed1f84
(harness) Clean up JP20 recall config (#1569)
charlesbluca Mar 11, 2026
7322145
(harness) More verbose readiness logs (#1568)
charlesbluca Mar 11, 2026
c5f4d44
Refactor `get_*_model_name` to avoid caching fallback model name (#1527)
charlesbluca Mar 11, 2026
a5812fa
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 11, 2026
8df6398
(retriever) Add .split() for text chunking by token count (#1547)
edknv Mar 11, 2026
6ecb070
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 11, 2026
5110934
Kheiss/5963638 (#1570)
kheiss-uwzoo Mar 11, 2026
1152a6a
(retriever) add documentation for image file support (#1571)
edknv Mar 11, 2026
9d2091f
(harness) Retry mechanism for managed Helm port-forwards (#1573)
charlesbluca Mar 11, 2026
bbc740d
add readme warnng about experiment modes fused and online (#1566)
jperez999 Mar 11, 2026
15e5406
(helm) More nemotron rebranding (#1579)
charlesbluca Mar 11, 2026
ab2d503
Kheiss/qa review2 (#1583)
kheiss-uwzoo Mar 11, 2026
82ae6fc
Increase default Redis TTL to 48h to prevent job expiry during VLM ca…
jioffe502 Mar 11, 2026
f3d44a9
Retriever release action (#1584)
jdye64 Mar 11, 2026
885978a
Add rerank (#1565)
jperez999 Mar 11, 2026
ba92f69
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 11, 2026
00c56d8
Add source_id row back to lancedb schema (#1587)
jdye64 Mar 12, 2026
cb773ea
fix reranker in inproc (#1588)
jperez999 Mar 12, 2026
ff042e7
fix in process extract to handle txt (#1589)
jperez999 Mar 12, 2026
f3e0c9f
Add intro page for agentic retrieval pipeline on BRIGHT (#1582)
jiaruic09 Mar 12, 2026
08ebc41
Fix resulting python wheel names (#1593)
jdye64 Mar 12, 2026
aeb5be9
(harness) Fixes/mods for performance testing (#1595)
charlesbluca Mar 12, 2026
6fea3ef
Add Helm RTX PRO 4500 override, extend obj-det warmup batch size over…
charlesbluca Mar 12, 2026
41d2b07
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 12, 2026
850595c
(retriever) update nemotron_parse extraction method (#1599)
edknv Mar 12, 2026
2e6a100
(retriever) auto-route image files in .extract() for both inprocess a…
edknv Mar 12, 2026
77a5053
(retriever) Install as part of docker build (#1594)
charlesbluca Mar 12, 2026
ba1c72d
Dump libfreetype source in release container (#1600)
charlesbluca Mar 12, 2026
5889d82
(retriever) update pre/post-processing for improved recall (#1596)
edknv Mar 12, 2026
5afb91f
(retriever) Improve batched audio extraction performance (#1598)
charlesbluca Mar 12, 2026
13f098f
(retrieval-bench) Improve pipeline performance and modality handling …
oliverholworthy Mar 13, 2026
2ae7f76
(harness) Always wait for healthy reranker if it's needed for recall …
charlesbluca Mar 13, 2026
2b79838
Remove get_hf_revision from files outside nemo_retriever/ (#1612)
jdye64 Mar 13, 2026
c00b6bf
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 13, 2026
6fb6eb7
Updated files per bug queue
kheiss-uwzoo Mar 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
389 changes: 208 additions & 181 deletions .github/workflows/perform-release.yml

Large diffs are not rendered by default.

35 changes: 10 additions & 25 deletions .github/workflows/release-helm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,12 @@ jobs:
runs-on: ubuntu-latest
env:
NGC_CLI_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
NGC_CLI_ORG: ${{ inputs.ngc-org }}
NGC_CLI_TEAM: ${{ inputs.ngc-team }}
NGC_CLI_FORMAT_TYPE: json
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5
with:
ref: ${{ inputs.source-ref }}

Expand All @@ -54,23 +57,13 @@ jobs:
curl -sSL "https://github.com/norwoodj/helm-docs/releases/download/v${HELM_DOCS_VERSION}/helm-docs_${HELM_DOCS_VERSION}_Linux_x86_64.tar.gz" \
| tar xz -C /usr/local/bin helm-docs

- name: Install NGC CLI
run: |
curl -sSL "https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.55.0/files/ngccli_linux.zip" -o /tmp/ngccli.zip
unzip -q /tmp/ngccli.zip -d /tmp
sudo mv /tmp/ngc-cli/ngc /usr/local/bin/ngc
sudo chmod +x /usr/local/bin/ngc
- name: Setup Python
uses: actions/setup-python@v6
with:
python-version: '3.12'

- name: Configure and verify NGC CLI
run: |
ngc config set <<EOF
$NGC_CLI_API_KEY
json
${{ inputs.ngc-org }}
${{ inputs.ngc-team }}
EOF
echo "NGC CLI configured. Verifying authentication..."
ngc config current
- name: Install Python dependencies
run: pip install ngcsdk pyyaml

- name: Update Helm README
run: helm/update_helm_readme.sh
Expand All @@ -88,14 +81,6 @@ jobs:
helm dependency update helm/
helm dependency build helm/

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.12'

- name: Install Python dependencies
run: pip install pyyaml

- name: Release Helm chart
run: |
DRY_RUN_FLAG=""
Expand Down
10 changes: 7 additions & 3 deletions .github/workflows/retriever-unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,15 @@ jobs:
with:
python-version: "3.12"

- name: Install uv
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.local/bin" >> "$GITHUB_PATH"

- name: Install unit test dependencies
run: |
python -m pip install --upgrade pip
python -m pip install pytest pandas pydantic pyyaml typer scikit-learn
python -m pip install api/
uv pip install --system -e src/ -e api/ -e client/
uv pip install --system -e nemo_retriever

- name: Run retriever unit tests
env:
Expand Down
19 changes: 15 additions & 4 deletions .github/workflows/reusable-pypi-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@ on:
required: false
type: string
default: 'main'
workflow-ref:
description: 'Git ref of the workflow branch (used to overlay pyproject.toml files)'
required: false
type: string
default: ''
runner:
description: 'GitHub runner to use'
required: false
Expand All @@ -36,10 +41,16 @@ jobs:

steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5
with:
ref: ${{ inputs.source-ref }}

- name: Overlay build config from workflow branch
if: ${{ inputs.workflow-ref != '' && inputs.workflow-ref != inputs.source-ref }}
run: |
git fetch --depth=1 origin "${{ inputs.workflow-ref }}"
git checkout FETCH_HEAD -- api/pyproject.toml client/pyproject.toml src/pyproject.toml nemo_retriever/pyproject.toml

- name: Determine version
id: set-version
run: |
Expand All @@ -52,7 +63,7 @@ jobs:
echo "Building version: $VERSION"

- name: Setup Python
uses: actions/setup-python@v5
uses: actions/setup-python@v6
with:
python-version: '3.12'

Expand Down Expand Up @@ -103,12 +114,12 @@ jobs:
PY
RETRIEVER_RELEASE_TYPE=${{ inputs.release-type }} \
RETRIEVER_VERSION=${{ steps.set-version.outputs.version }} \
RETRIEVER_BUILD_NUMBER=${{ github.run_number }} \
RETRIEVER_BUILD_NUMBER=${{ inputs.release-type == 'release' && '0' || github.run_number }} \
RETRIEVER_GIT_SHA=${{ github.sha }} \
python -m build

- name: Upload wheel artifacts
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@v5
with:
name: python-wheels
path: |
Expand Down
11 changes: 8 additions & 3 deletions .github/workflows/reusable-pypi-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,18 @@ jobs:

steps:
- name: Download wheel artifacts
uses: actions/download-artifact@v4
uses: actions/download-artifact@v5
with:
name: python-wheels
path: ./dist

- name: Setup Python
uses: actions/setup-python@v6
with:
python-version: '3.12'

- name: Install twine
run: pip install twine
run: pip install 'twine>=6.1'

- name: Publish wheels to Artifactory
env:
Expand All @@ -31,7 +36,7 @@ jobs:
ARTIFACTORY_PASSWORD: ${{ secrets.ARTIFACTORY_PASSWORD }}
run: |
# Publish all wheels
twine upload \
twine upload --verbose \
--repository-url $ARTIFACTORY_URL \
-u $ARTIFACTORY_USERNAME \
-p $ARTIFACTORY_PASSWORD \
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,13 @@ RUN chmod +x scripts/install_ffmpeg.sh \
# For GPL-licensed components, we provide their source code in the container
# via `apt-get source` below to satisfy GPL requirements.
ARG GPL_LIBS="\
libfreetype6 \
libltdl7 \
libhunspell-1.7-0 \
libhyphen0 \
libdbus-1-3 \
"
ARG FORCE_REMOVE_PKGS="\
libfreetype6 \
ucf \
liblangtag-common \
libjbig0 \
Expand Down
4 changes: 2 additions & 2 deletions api/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ readme = "README.md"
authors = [
{name = "Jeremy Dyer", email = "jdyer@nvidia.com"}
]
license = {file = "LICENSE"}
license = "Apache-2.0"
license-files = ["LICENSE"]
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
]
dependencies = [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@
import tritonclient.grpc as grpcclient

from nv_ingest_api.internal.primitives.nim import ModelInterface
from nv_ingest_api.internal.primitives.nim.model_interface.decorators import multiprocessing_cache
from nv_ingest_api.internal.primitives.nim.model_interface.decorators import global_cache
from nv_ingest_api.internal.primitives.nim.model_interface.decorators import lock
from nv_ingest_api.internal.primitives.nim.model_interface.helpers import preprocess_image_for_paddle
from nv_ingest_api.util.image_processing.transforms import base64_to_numpy

Expand Down Expand Up @@ -752,12 +753,11 @@ def _format_single_batch(
raise ValueError("Invalid protocol specified. Must be 'grpc' or 'http'.")


@multiprocessing_cache(max_calls=100) # Cache results first to avoid redundant retries from backoff
@backoff.on_predicate(backoff.expo, max_time=30)
def get_ocr_model_name(ocr_grpc_endpoint=None, default_model_name=DEFAULT_OCR_MODEL_NAME):
"""
Determines the OCR model name by checking the environment, querying the gRPC endpoint,
or falling back to a default.
or falling back to a default. Only caches when the repository is successfully queried.
"""
# 1. Check for an explicit override from the environment variable first.
ocr_model_name = os.getenv("OCR_MODEL_NAME", None)
Expand All @@ -769,14 +769,25 @@ def get_ocr_model_name(ocr_grpc_endpoint=None, default_model_name=DEFAULT_OCR_MO
logger.debug(f"No OCR gRPC endpoint provided. Falling back to default model name '{default_model_name}'.")
return default_model_name

# 3. Attempt to query the gRPC endpoint to discover the model name.
# 3. Check cache (only populated on successful repository query).
key = (
"get_ocr_model_name",
(ocr_grpc_endpoint,),
frozenset({"default_model_name": default_model_name}.items()),
)
with lock:
if key in global_cache:
return global_cache[key]

# 4. Attempt to query the gRPC endpoint to discover the model name.
try:
client = grpcclient.InferenceServerClient(ocr_grpc_endpoint)
model_index = client.get_model_repository_index(as_json=True)
model_names = [x["name"] for x in model_index.get("models", [])]
ocr_model_name = model_names[0]
with lock:
global_cache[key] = ocr_model_name
return ocr_model_name
except Exception:
logger.warning(f"Failed to get ocr model name after 30 seconds. Falling back to '{default_model_name}'.")
ocr_model_name = default_model_name

return ocr_model_name
return default_model_name
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

from nv_ingest_api.internal.primitives.nim import ModelInterface
import tritonclient.grpc as grpcclient
from nv_ingest_api.internal.primitives.nim.model_interface.decorators import global_cache
from nv_ingest_api.internal.primitives.nim.model_interface.decorators import lock
from nv_ingest_api.internal.primitives.nim.model_interface.decorators import multiprocessing_cache
from nv_ingest_api.internal.primitives.nim.model_interface.helpers import get_model_name
from nv_ingest_api.util.image_processing import scale_image_to_encoding_size
Expand Down Expand Up @@ -135,10 +137,36 @@ def __init__(
self.class_labels = class_labels

if endpoints:
self.model_name = get_yolox_model_name(endpoints[0], default_model_name="yolox_ensemble")
self._grpc_uses_bls = self.model_name == "pipeline"
self._yolox_grpc_endpoint = endpoints[0]
self._model_name = None
self._grpc_uses_bls_value = None # Resolved on first use
else:
self._grpc_uses_bls = False
self._yolox_grpc_endpoint = None
self._model_name = None
self._grpc_uses_bls_value = False

def _resolve_yolox_model_name_if_needed(self) -> None:
"""Resolve model name and BLS flag from the gRPC endpoint on first use. Cached on the instance."""
if self._yolox_grpc_endpoint is None:
return
if self._model_name is not None:
return
self._model_name = get_yolox_model_name(self._yolox_grpc_endpoint, default_model_name="yolox_ensemble")
self._grpc_uses_bls_value = self._model_name == "pipeline"

@property
def model_name(self) -> Optional[str]:
self._resolve_yolox_model_name_if_needed()
return self._model_name

@model_name.setter
def model_name(self, value: Optional[str]) -> None:
self._model_name = value

@property
def _grpc_uses_bls(self) -> bool:
self._resolve_yolox_model_name_if_needed()
return bool(self._grpc_uses_bls_value)

def prepare_data_for_inference(self, data: Dict[str, Any]) -> Dict[str, Any]:
"""
Expand Down Expand Up @@ -2117,7 +2145,6 @@ def postprocess_included_texts(boxes, confs, labels, classes):
return boxes, labels, confs


@multiprocessing_cache(max_calls=100) # Cache results first to avoid redundant retries from backoff
@backoff.on_predicate(backoff.expo, max_time=30)
def get_yolox_model_name(yolox_grpc_endpoint, default_model_name="yolox"):
# If a gRPC endpoint isn't provided (common when using HTTP-only NIM endpoints),
Expand All @@ -2131,6 +2158,15 @@ def get_yolox_model_name(yolox_grpc_endpoint, default_model_name="yolox"):
):
return default_model_name

key = (
"get_yolox_model_name",
(yolox_grpc_endpoint,),
frozenset({"default_model_name": default_model_name}.items()),
)
with lock:
if key in global_cache:
return global_cache[key]

try:
client = grpcclient.InferenceServerClient(yolox_grpc_endpoint)
model_index = client.get_model_repository_index(as_json=True)
Expand All @@ -2148,14 +2184,23 @@ def get_yolox_model_name(yolox_grpc_endpoint, default_model_name="yolox"):
"nemoretriever-page-elements-v2",
):
if preferred in model_names:
return preferred
result = preferred
with lock:
global_cache[key] = result
return result

# Otherwise pick a best-effort match for newer model names.
candidates = [m for m in model_names if isinstance(m, str) and ("yolox" in m or "page-elements" in m)]
if candidates:
return sorted(candidates)[0]

return default_model_name
result = sorted(candidates)[0]
with lock:
global_cache[key] = result
return result

result = default_model_name
with lock:
global_cache[key] = result
return result
except Exception as e:
logger.warning(
"Failed to inspect YOLOX model repository at '%s' (%s). Falling back to '%s'.",
Expand Down
8 changes: 1 addition & 7 deletions api/src/nv_ingest_api/internal/transform/split_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,14 +56,8 @@ def _get_tokenizer(
if cache_key in _tokenizer_cache:
return _tokenizer_cache[cache_key]

from nemo_retriever.utils.hf_model_registry import get_hf_revision

logger.info("Loading and caching tokenizer: %s", tokenizer_identifier)
tokenizer = AutoTokenizer.from_pretrained(
tokenizer_identifier,
revision=get_hf_revision(tokenizer_identifier),
token=token,
)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_identifier, token=token)
_tokenizer_cache[cache_key] = tokenizer
return tokenizer

Expand Down
Loading
Loading