Skip to content

Revert "Eval kit support (#1239)"#1294

Merged
Kipok merged 1 commit intomainfrom
revert-eval-kit
Mar 6, 2026
Merged

Revert "Eval kit support (#1239)"#1294
Kipok merged 1 commit intomainfrom
revert-eval-kit

Conversation

@Kipok
Copy link
Collaborator

@Kipok Kipok commented Mar 6, 2026

This reverts commit b237e33.

That pr broke gpu tests (and likely slurm tests as well)

Summary by CodeRabbit

  • Chores
    • Removed VLMEvalKit integration and related evaluation functionality, including documentation and configuration files.
    • Removed in-process generation mode support.
    • Simplified evaluation pipeline configuration and command assembly.

This reverts commit b237e33.

Signed-off-by: Igor Gitman <igitman@nvidia.com>
@Kipok Kipok force-pushed the revert-eval-kit branch from 269fd2c to 99f2c08 Compare March 6, 2026 19:48
@Kipok Kipok requested a review from gwarmstrong March 6, 2026 19:49
@Kipok Kipok enabled auto-merge (squash) March 6, 2026 19:49
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

📝 Walkthrough

Walkthrough

This pull request removes the VLMEvalKit integration from NeMo Skills, eliminating documentation, module exports, dataset utilities, evaluation metrics, inference implementations, and pipeline logic that supported VLMEvalKit-based evaluation workflows.

Changes

Cohort / File(s) Summary
Documentation & Requirements
docs/evaluation/eval-kit.md, docs/evaluation/index.md, requirements/eval-kit.txt
Removed VLMEvalKit documentation, index reference, and placeholder requirements file.
Dataset Module
nemo_skills/dataset/eval_kit/__init__.py, nemo_skills/dataset/utils.py
Removed eval_kit module exports (GENERATION_MODULE, METRICS_TYPE, constants, and get_extra_generation_args function); removed special-case dotted dataset name handling in get_default_dataset_module.
Evaluation Metrics
nemo_skills/evaluation/metrics/eval_kit_metrics.py, nemo_skills/evaluation/metrics/map_metrics.py, nemo_skills/evaluation/metrics/translation_metrics.py
Deleted EvalKitMetrics class; removed eval_kit entry from METRICS_MAP; centralized corpus_bleu import in translation_metrics.
Audio Evaluation
nemo_skills/evaluation/evaluator/audio.py
Simplified generation extraction logic; removed specialized handling for AudioBench, ST-EN-ZH, and MathQA task types; streamlined ASR-translation routing.
Inference Implementations
nemo_skills/inference/eval/eval_kit.py, nemo_skills/inference/mcore_skills.py
Deleted EvalKitGenerationTask and MegatronMCoreGenerationTask classes with all supporting logic for VLMEvalKit integration, mcore in-process generation, distributed data handling, and metrics computation.
Inference Base & Factory
nemo_skills/inference/factory.py, nemo_skills/inference/generate.py
Removed mcore_skills enum member and module mapping; removed CONTAINER_KEY, USE_TORCHRUN, and related classmethods (is_self_contained, get_env_prefix, get_extra_package_dirs) from GenerationTask base class.
Pipeline Evaluation
nemo_skills/pipeline/eval.py
Removed _apply_task_overrides helper; eliminated dynamic per-task GPU/container overrides, torchrun configuration, and extra package directory propagation; simplified container assignment and command assembly.
Pipeline Utils
nemo_skills/pipeline/utils/eval.py, nemo_skills/pipeline/utils/generation.py
Removed _resolve_generation_task_class; removed self_contained_task, num_gpus, and generation_task_class fields from BenchmarkArgs; made input_file mandatory; simplified venv bootstrap to always use uv; added input validation to get_generation_cmd.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • melllinia
  • gwarmstrong
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Revert "Eval kit support (#1239)"' accurately reflects the primary change: reverting a previous commit that added eval kit support to the NeMo Skills project.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch revert-eval-kit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/pipeline/utils/eval.py`:
- Around line 115-122: The current block overwrites the previously resolved
input_file with a global data_dir, breaking datasets located outside data_dir;
instead, when data_dir is truthy only compute the unmounted check path (use
data_dir_unmounted = pipeline_utils.get_unmounted_path(cluster_config, data_dir)
and set check_path = f"{data_dir_unmounted}/{benchmark.replace('.',
'/')}/{split}.jsonl") and do not reassign input_file (leave the already-resolved
input_file/unmounted_path logic intact); update the code around the data_dir
branch to only set check_path from data_dir_unmounted and preserve the existing
input_file variable.
- Around line 97-104: The current branch sets input_file to the
container-mounted path even when cluster_config["executor"] == "none" and
local_data_path exists; change logic in the not is_on_cluster block so that when
executor == "none" you use the host/unmounted path (unmounted_path) as
input_file. Locate the block using pipeline_utils.is_mounted_filepath,
input_file, unmounted_input_file, unmounted_path and adjust: if local_data_path
is not None and executor == "none" assign input_file = unmounted_path (or
compute unmounted_path via local_data_path or pipeline_utils.get_unmounted_path)
instead of the mounted f"{data_path}/..."; keep existing get_unmounted_path
fallback for the non-local case.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4420c18f-469e-4e1b-9400-2659573c997f

📥 Commits

Reviewing files that changed from the base of the PR and between b237e33 and 99f2c08.

📒 Files selected for processing (16)
  • docs/evaluation/eval-kit.md
  • docs/evaluation/index.md
  • nemo_skills/dataset/eval_kit/__init__.py
  • nemo_skills/dataset/utils.py
  • nemo_skills/evaluation/evaluator/audio.py
  • nemo_skills/evaluation/metrics/eval_kit_metrics.py
  • nemo_skills/evaluation/metrics/map_metrics.py
  • nemo_skills/evaluation/metrics/translation_metrics.py
  • nemo_skills/inference/eval/eval_kit.py
  • nemo_skills/inference/factory.py
  • nemo_skills/inference/generate.py
  • nemo_skills/inference/mcore_skills.py
  • nemo_skills/pipeline/eval.py
  • nemo_skills/pipeline/utils/eval.py
  • nemo_skills/pipeline/utils/generation.py
  • requirements/eval-kit.txt
💤 Files with no reviewable changes (11)
  • docs/evaluation/index.md
  • docs/evaluation/eval-kit.md
  • nemo_skills/evaluation/metrics/eval_kit_metrics.py
  • nemo_skills/evaluation/metrics/map_metrics.py
  • nemo_skills/dataset/eval_kit/init.py
  • nemo_skills/inference/factory.py
  • nemo_skills/inference/eval/eval_kit.py
  • requirements/eval-kit.txt
  • nemo_skills/dataset/utils.py
  • nemo_skills/inference/generate.py
  • nemo_skills/inference/mcore_skills.py

Comment on lines +97 to +104
if not is_on_cluster:
if pipeline_utils.is_mounted_filepath(cluster_config, data_path) or cluster_config["executor"] == "none":
input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
unmounted_path = pipeline_utils.get_unmounted_path(cluster_config, input_file)

unmounted_path = str(unmounted_path)
# When data_dir is specified, use it for both input_file and the existence check
# data_dir is always assumed to be a mounted path
if data_dir:
data_dir_unmounted = pipeline_utils.get_unmounted_path(cluster_config, data_dir)
input_file = f"{data_dir}/{benchmark.replace('.', '/')}/{split}.jsonl"
check_path = f"{data_dir_unmounted}/{benchmark.replace('.', '/')}/{split}.jsonl"
else:
check_path = unmounted_path
# checking if data file exists (can check locally as well)
if is_on_cluster:
if not pipeline_utils.cluster_path_exists(cluster_config, check_path):
raise ValueError(
f"Data file {check_path} does not exist on cluster. "
"Please check the benchmark and split parameters. "
"Did you forget to run prepare data commands or add data_dir argument?"
)
if local_data_path is not None:
unmounted_path = f"{local_data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
else:
unmounted_input_file = pipeline_utils.get_unmounted_path(cluster_config, input_file)
unmounted_path = str(Path(__file__).parents[3] / unmounted_input_file.replace("/nemo_run/code/", ""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the host path for remapped datasets when executor == "none".

_resolve_data_path() converts external roots into /nemo_run/code/..., but this branch still feeds that mounted path into input_file even when local_data_path is present. For the "none" executor the command is emitted/run outside the container, so the existence check passes on the host path and generation later tries to open a container-only path.

Suggested fix
-    if pipeline_utils.is_mounted_filepath(cluster_config, data_path) or cluster_config["executor"] == "none":
-        input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
-        if local_data_path is not None:
-            unmounted_path = f"{local_data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
+    if cluster_config["executor"] == "none" and local_data_path is not None:
+        input_file = f"{local_data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
+        unmounted_path = input_file
+    elif pipeline_utils.is_mounted_filepath(cluster_config, data_path):
+        input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
+        if local_data_path is not None:
+            unmounted_path = f"{local_data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/pipeline/utils/eval.py` around lines 97 - 104, The current branch
sets input_file to the container-mounted path even when
cluster_config["executor"] == "none" and local_data_path exists; change logic in
the not is_on_cluster block so that when executor == "none" you use the
host/unmounted path (unmounted_path) as input_file. Locate the block using
pipeline_utils.is_mounted_filepath, input_file, unmounted_input_file,
unmounted_path and adjust: if local_data_path is not None and executor == "none"
assign input_file = unmounted_path (or compute unmounted_path via
local_data_path or pipeline_utils.get_unmounted_path) instead of the mounted
f"{data_path}/..."; keep existing get_unmounted_path fallback for the non-local
case.

Comment on lines +115 to +122
# When data_dir is specified, use it for both input_file and the existence check
# data_dir is always assumed to be a mounted path
if data_dir:
data_dir_unmounted = pipeline_utils.get_unmounted_path(cluster_config, data_dir)
input_file = f"{data_dir}/{benchmark.replace('.', '/')}/{split}.jsonl"
check_path = f"{data_dir_unmounted}/{benchmark.replace('.', '/')}/{split}.jsonl"
else:
check_path = unmounted_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't overwrite the resolved dataset path with the global data_dir.

The code above has already computed the correct input_file / unmounted_path pair for mounted paths, copied extra datasets, and local runs. Replacing both values here with data_dir throws that resolution away, so benchmarks whose files live outside data_dir — and local executor runs that need the mounted /nemo_run/code/... path — now point at the wrong file.

Suggested fix
-    # When data_dir is specified, use it for both input_file and the existence check
-    # data_dir is always assumed to be a mounted path
-    if data_dir:
-        data_dir_unmounted = pipeline_utils.get_unmounted_path(cluster_config, data_dir)
-        input_file = f"{data_dir}/{benchmark.replace('.', '/')}/{split}.jsonl"
-        check_path = f"{data_dir_unmounted}/{benchmark.replace('.', '/')}/{split}.jsonl"
-    else:
-        check_path = unmounted_path
+    check_path = unmounted_path
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# When data_dir is specified, use it for both input_file and the existence check
# data_dir is always assumed to be a mounted path
if data_dir:
data_dir_unmounted = pipeline_utils.get_unmounted_path(cluster_config, data_dir)
input_file = f"{data_dir}/{benchmark.replace('.', '/')}/{split}.jsonl"
check_path = f"{data_dir_unmounted}/{benchmark.replace('.', '/')}/{split}.jsonl"
else:
check_path = unmounted_path
check_path = unmounted_path
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/pipeline/utils/eval.py` around lines 115 - 122, The current block
overwrites the previously resolved input_file with a global data_dir, breaking
datasets located outside data_dir; instead, when data_dir is truthy only compute
the unmounted check path (use data_dir_unmounted =
pipeline_utils.get_unmounted_path(cluster_config, data_dir) and set check_path =
f"{data_dir_unmounted}/{benchmark.replace('.', '/')}/{split}.jsonl") and do not
reassign input_file (leave the already-resolved input_file/unmounted_path logic
intact); update the code around the data_dir branch to only set check_path from
data_dir_unmounted and preserve the existing input_file variable.

@Kipok Kipok disabled auto-merge March 6, 2026 20:13
@Kipok Kipok merged commit a5da597 into main Mar 6, 2026
5 checks passed
@Kipok Kipok deleted the revert-eval-kit branch March 6, 2026 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant