Revert "Eval kit support (#1239)" by Kipok · Pull Request #1294 · NVIDIA-NeMo/Skills

Kipok · 2026-03-06T19:47:32Z

This reverts commit b237e33.

That pr broke gpu tests (and likely slurm tests as well)

Summary by CodeRabbit

Chores
- Removed VLMEvalKit integration and related evaluation functionality, including documentation and configuration files.
- Removed in-process generation mode support.
- Simplified evaluation pipeline configuration and command assembly.

This reverts commit b237e33. Signed-off-by: Igor Gitman <igitman@nvidia.com>

coderabbitai · 2026-03-06T19:49:26Z

📝 Walkthrough

Walkthrough

This pull request removes the VLMEvalKit integration from NeMo Skills, eliminating documentation, module exports, dataset utilities, evaluation metrics, inference implementations, and pipeline logic that supported VLMEvalKit-based evaluation workflows.

Changes

Cohort / File(s)	Summary
Documentation & Requirements `docs/evaluation/eval-kit.md`, `docs/evaluation/index.md`, `requirements/eval-kit.txt`	Removed VLMEvalKit documentation, index reference, and placeholder requirements file.
Dataset Module `nemo_skills/dataset/eval_kit/__init__.py`, `nemo_skills/dataset/utils.py`	Removed eval_kit module exports (GENERATION_MODULE, METRICS_TYPE, constants, and get_extra_generation_args function); removed special-case dotted dataset name handling in get_default_dataset_module.
Evaluation Metrics `nemo_skills/evaluation/metrics/eval_kit_metrics.py`, `nemo_skills/evaluation/metrics/map_metrics.py`, `nemo_skills/evaluation/metrics/translation_metrics.py`	Deleted EvalKitMetrics class; removed eval_kit entry from METRICS_MAP; centralized corpus_bleu import in translation_metrics.
Audio Evaluation `nemo_skills/evaluation/evaluator/audio.py`	Simplified generation extraction logic; removed specialized handling for AudioBench, ST-EN-ZH, and MathQA task types; streamlined ASR-translation routing.
Inference Implementations `nemo_skills/inference/eval/eval_kit.py`, `nemo_skills/inference/mcore_skills.py`	Deleted EvalKitGenerationTask and MegatronMCoreGenerationTask classes with all supporting logic for VLMEvalKit integration, mcore in-process generation, distributed data handling, and metrics computation.
Inference Base & Factory `nemo_skills/inference/factory.py`, `nemo_skills/inference/generate.py`	Removed mcore_skills enum member and module mapping; removed CONTAINER_KEY, USE_TORCHRUN, and related classmethods (is_self_contained, get_env_prefix, get_extra_package_dirs) from GenerationTask base class.
Pipeline Evaluation `nemo_skills/pipeline/eval.py`	Removed _apply_task_overrides helper; eliminated dynamic per-task GPU/container overrides, torchrun configuration, and extra package directory propagation; simplified container assignment and command assembly.
Pipeline Utils `nemo_skills/pipeline/utils/eval.py`, `nemo_skills/pipeline/utils/generation.py`	Removed _resolve_generation_task_class; removed self_contained_task, num_gpus, and generation_task_class fields from BenchmarkArgs; made input_file mandatory; simplified venv bootstrap to always use uv; added input validation to get_generation_cmd.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Eval kit support #1239: Directly inverse operation—adds the eval_kit integration that this PR removes; touches identical files, classes, and exports.
Add nemo-skills-core subpackage for lightweight installs #1229: Modifies dataset loading and eval_kit special-case handling in nemo_skills/dataset/utils.py and related pipeline dataset logic.
Fix run.Script refactor #1133: Concurrent changes to get_generation_cmd in nemo_skills/pipeline/utils/generation.py (venv and input handling modifications).

Suggested reviewers

melllinia
gwarmstrong

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Revert "Eval kit support (`#1239`)"' accurately reflects the primary change: reverting a previous commit that added eval kit support to the NeMo Skills project.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch revert-eval-kit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/pipeline/utils/eval.py`:
- Around line 115-122: The current block overwrites the previously resolved
input_file with a global data_dir, breaking datasets located outside data_dir;
instead, when data_dir is truthy only compute the unmounted check path (use
data_dir_unmounted = pipeline_utils.get_unmounted_path(cluster_config, data_dir)
and set check_path = f"{data_dir_unmounted}/{benchmark.replace('.',
'/')}/{split}.jsonl") and do not reassign input_file (leave the already-resolved
input_file/unmounted_path logic intact); update the code around the data_dir
branch to only set check_path from data_dir_unmounted and preserve the existing
input_file variable.
- Around line 97-104: The current branch sets input_file to the
container-mounted path even when cluster_config["executor"] == "none" and
local_data_path exists; change logic in the not is_on_cluster block so that when
executor == "none" you use the host/unmounted path (unmounted_path) as
input_file. Locate the block using pipeline_utils.is_mounted_filepath,
input_file, unmounted_input_file, unmounted_path and adjust: if local_data_path
is not None and executor == "none" assign input_file = unmounted_path (or
compute unmounted_path via local_data_path or pipeline_utils.get_unmounted_path)
instead of the mounted f"{data_path}/..."; keep existing get_unmounted_path
fallback for the non-local case.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4420c18f-469e-4e1b-9400-2659573c997f

📥 Commits

Reviewing files that changed from the base of the PR and between b237e33 and 99f2c08.

📒 Files selected for processing (16)

docs/evaluation/eval-kit.md
docs/evaluation/index.md
nemo_skills/dataset/eval_kit/__init__.py
nemo_skills/dataset/utils.py
nemo_skills/evaluation/evaluator/audio.py
nemo_skills/evaluation/metrics/eval_kit_metrics.py
nemo_skills/evaluation/metrics/map_metrics.py
nemo_skills/evaluation/metrics/translation_metrics.py
nemo_skills/inference/eval/eval_kit.py
nemo_skills/inference/factory.py
nemo_skills/inference/generate.py
nemo_skills/inference/mcore_skills.py
nemo_skills/pipeline/eval.py
nemo_skills/pipeline/utils/eval.py
nemo_skills/pipeline/utils/generation.py
requirements/eval-kit.txt

💤 Files with no reviewable changes (11)

docs/evaluation/index.md
docs/evaluation/eval-kit.md
nemo_skills/evaluation/metrics/eval_kit_metrics.py
nemo_skills/evaluation/metrics/map_metrics.py
nemo_skills/dataset/eval_kit/init.py
nemo_skills/inference/factory.py
nemo_skills/inference/eval/eval_kit.py
requirements/eval-kit.txt
nemo_skills/dataset/utils.py
nemo_skills/inference/generate.py
nemo_skills/inference/mcore_skills.py

coderabbitai · 2026-03-06T20:02:24Z

nemo_skills/pipeline/utils/eval.py

+    if not is_on_cluster:
+        if pipeline_utils.is_mounted_filepath(cluster_config, data_path) or cluster_config["executor"] == "none":
            input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
-            unmounted_path = pipeline_utils.get_unmounted_path(cluster_config, input_file)
-
-        unmounted_path = str(unmounted_path)
-        # When data_dir is specified, use it for both input_file and the existence check
-        # data_dir is always assumed to be a mounted path
-        if data_dir:
-            data_dir_unmounted = pipeline_utils.get_unmounted_path(cluster_config, data_dir)
-            input_file = f"{data_dir}/{benchmark.replace('.', '/')}/{split}.jsonl"
-            check_path = f"{data_dir_unmounted}/{benchmark.replace('.', '/')}/{split}.jsonl"
-        else:
-            check_path = unmounted_path
-        # checking if data file exists (can check locally as well)
-        if is_on_cluster:
-            if not pipeline_utils.cluster_path_exists(cluster_config, check_path):
-                raise ValueError(
-                    f"Data file {check_path} does not exist on cluster. "
-                    "Please check the benchmark and split parameters. "
-                    "Did you forget to run prepare data commands or add data_dir argument?"
-                )
+            if local_data_path is not None:
+                unmounted_path = f"{local_data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
+            else:
+                unmounted_input_file = pipeline_utils.get_unmounted_path(cluster_config, input_file)
+                unmounted_path = str(Path(__file__).parents[3] / unmounted_input_file.replace("/nemo_run/code/", ""))


⚠️ Potential issue | 🟠 Major

Use the host path for remapped datasets when executor == "none".

_resolve_data_path() converts external roots into /nemo_run/code/..., but this branch still feeds that mounted path into input_file even when local_data_path is present. For the "none" executor the command is emitted/run outside the container, so the existence check passes on the host path and generation later tries to open a container-only path.

Suggested fix

- if pipeline_utils.is_mounted_filepath(cluster_config, data_path) or cluster_config["executor"] == "none": - input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl" - if local_data_path is not None: - unmounted_path = f"{local_data_path}/{benchmark.replace('.', '/')}/{split}.jsonl" + if cluster_config["executor"] == "none" and local_data_path is not None: + input_file = f"{local_data_path}/{benchmark.replace('.', '/')}/{split}.jsonl" + unmounted_path = input_file + elif pipeline_utils.is_mounted_filepath(cluster_config, data_path): + input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl" + if local_data_path is not None: + unmounted_path = f"{local_data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/pipeline/utils/eval.py` around lines 97 - 104, The current branch sets input_file to the container-mounted path even when cluster_config["executor"] == "none" and local_data_path exists; change logic in the not is_on_cluster block so that when executor == "none" you use the host/unmounted path (unmounted_path) as input_file. Locate the block using pipeline_utils.is_mounted_filepath, input_file, unmounted_input_file, unmounted_path and adjust: if local_data_path is not None and executor == "none" assign input_file = unmounted_path (or compute unmounted_path via local_data_path or pipeline_utils.get_unmounted_path) instead of the mounted f"{data_path}/..."; keep existing get_unmounted_path fallback for the non-local case.

coderabbitai · 2026-03-06T20:02:24Z

nemo_skills/pipeline/utils/eval.py

+    # When data_dir is specified, use it for both input_file and the existence check
+    # data_dir is always assumed to be a mounted path
+    if data_dir:
+        data_dir_unmounted = pipeline_utils.get_unmounted_path(cluster_config, data_dir)
+        input_file = f"{data_dir}/{benchmark.replace('.', '/')}/{split}.jsonl"
+        check_path = f"{data_dir_unmounted}/{benchmark.replace('.', '/')}/{split}.jsonl"
+    else:
+        check_path = unmounted_path


⚠️ Potential issue | 🟠 Major

Don't overwrite the resolved dataset path with the global data_dir.

The code above has already computed the correct input_file / unmounted_path pair for mounted paths, copied extra datasets, and local runs. Replacing both values here with data_dir throws that resolution away, so benchmarks whose files live outside data_dir — and local executor runs that need the mounted /nemo_run/code/... path — now point at the wrong file.

Suggested fix

- # When data_dir is specified, use it for both input_file and the existence check - # data_dir is always assumed to be a mounted path - if data_dir: - data_dir_unmounted = pipeline_utils.get_unmounted_path(cluster_config, data_dir) - input_file = f"{data_dir}/{benchmark.replace('.', '/')}/{split}.jsonl" - check_path = f"{data_dir_unmounted}/{benchmark.replace('.', '/')}/{split}.jsonl" - else: - check_path = unmounted_path + check_path = unmounted_path

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# When data_dir is specified, use it for both input_file and the existence check

# data_dir is always assumed to be a mounted path

if data_dir:

data_dir_unmounted = pipeline_utils.get_unmounted_path(cluster_config, data_dir)

input_file = f"{data_dir}/{benchmark.replace('.', '/')}/{split}.jsonl"

check_path = f"{data_dir_unmounted}/{benchmark.replace('.', '/')}/{split}.jsonl"

else:

check_path = unmounted_path

check_path = unmounted_path

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/pipeline/utils/eval.py` around lines 115 - 122, The current block overwrites the previously resolved input_file with a global data_dir, breaking datasets located outside data_dir; instead, when data_dir is truthy only compute the unmounted check path (use data_dir_unmounted = pipeline_utils.get_unmounted_path(cluster_config, data_dir) and set check_path = f"{data_dir_unmounted}/{benchmark.replace('.', '/')}/{split}.jsonl") and do not reassign input_file (leave the already-resolved input_file/unmounted_path logic intact); update the code around the data_dir branch to only set check_path from data_dir_unmounted and preserve the existing input_file variable.

Revert "Eval kit support (#1239)"

99f2c08

This reverts commit b237e33. Signed-off-by: Igor Gitman <igitman@nvidia.com>

Kipok force-pushed the revert-eval-kit branch from 269fd2c to 99f2c08 Compare March 6, 2026 19:48

Kipok requested a review from gwarmstrong March 6, 2026 19:49

Kipok enabled auto-merge (squash) March 6, 2026 19:49

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

Kipok disabled auto-merge March 6, 2026 20:13

Kipok merged commit a5da597 into main Mar 6, 2026
5 checks passed

Kipok deleted the revert-eval-kit branch March 6, 2026 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Eval kit support (#1239)"#1294

Revert "Eval kit support (#1239)"#1294
Kipok merged 1 commit intomainfrom
revert-eval-kit

Kipok commented Mar 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 6, 2026

Uh oh!

coderabbitai bot Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kipok commented Mar 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kipok commented Mar 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 6, 2026 •

edited

Loading