Skip to content

update super endpoints to build.nvidia#152

Merged
AjayThorve merged 10 commits intoNVIDIA-AI-Blueprints:developfrom
AjayThorve:update-models
Mar 16, 2026
Merged

update super endpoints to build.nvidia#152
AjayThorve merged 10 commits intoNVIDIA-AI-Blueprints:developfrom
AjayThorve:update-models

Conversation

@AjayThorve
Copy link
Copy Markdown
Collaborator

No description provided.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 16, 2026

Greptile Summary

This PR migrates all Nemotron Super 120B references from a temporary NVCF invocation endpoint (5582751c-96c0-47e6-b076-af816ee781dd.invocation.api.nvcf.nvidia.com) and a temp- prefixed model name to the official integrate.api.nvidia.com endpoint with the proper model name nvidia/nemotron-3-super-120b-a12b. Additionally, because the Build API currently has limited availability for this model, the Nemotron Super LLM block is commented out across all configs and notebooks, and nemotron_nano_llm is set as the new default for the researcher, clarifier, and planner roles. A secondary behavioral change enables parallel_tool_calls=True in the clarifier agent.

Key changes:

  • All 4 main workflow configs, 3 benchmark configs, 3 notebooks, and numerous documentation pages updated to use the correct integrate.api.nvidia.com endpoint for Nemotron Super (previously a temporary per-instance NVCF URL)
  • nemotron_super_llm is commented out in every config file with an explanatory comment; nemotron_nano_llm becomes the default for clarifier_agent.llm, clarifier_agent.planner_llm, and deep_research_agent.researcher_llm
  • README.md removes the pre-release "Active Development Branch" warning and introduces a minor duplicate model entry in the Software Components list (nano-30b listed twice)
  • src/aiq_agent/agents/clarifier/agent.py enables parallel_tool_calls=True, allowing the clarifier to invoke multiple context-gathering tools concurrently

Confidence Score: 4/5

  • Safe to merge — changes are primarily endpoint URL updates and config defaults; the one behavioral code change is low-risk.
  • The vast majority of changes are mechanical config/doc updates (endpoint URL swap, model name correction, commenting out a block). The only code change — enabling parallel tool calls in the clarifier — is contained within a well-structured LangGraph node and the existing ToolMessage handling logic is compatible with parallel results. The minor README duplicate entry is cosmetic and does not affect functionality.
  • README.md (duplicate model entry at lines 76–78); src/aiq_agent/agents/clarifier/agent.py (parallel_tool_calls behavioral change)

Important Files Changed

Filename Overview
configs/config_cli_default.yml Replaced temporary NVCF endpoint for nemotron_super with the public integrate.api.nvidia.com endpoint and commented out the entire block; all agent references updated to use nemotron_nano_llm as the default, with inline comments pointing to the optional Super.
configs/config_frontier_models.yml Same Super-LLM endpoint migration and comment-out as other configs; deep_research_agent.researcher_llm now defaults to nemotron_nano_llm.
configs/config_web_default_llamaindex.yml Consistent update: Super LLM commented out, Nano set as default for clarifier and deep research roles.
configs/config_web_frag.yml Consistent update: Super LLM commented out, Nano set as default for clarifier and deep research roles.
src/aiq_agent/agents/clarifier/agent.py Changed parallel_tool_calls=False to True in _build_graph; enables concurrent tool invocations in the clarifier workflow — a behavioral improvement but worth confirming with integration tests.
README.md Removed the develop-branch warning, updated model table and config descriptions to reflect Nano as default and Super as optional — but introduces a duplicate nemotron-3-nano-30b-a3b entry in the Software Components list.
frontends/benchmarks/deepresearch_bench/configs/config_deep_research_bench.yml Migrated from NVCF temp endpoint to integrate.api.nvidia.com for all models; Super LLM commented out, researcher defaults to Nano.
frontends/benchmarks/deepsearch_qa/configs/config_deepsearch_qa.yml Same endpoint migration and Super LLM comment-out as the other benchmark configs.
frontends/benchmarks/freshqa/configs/config_full_workflow.yml Same endpoint migration and Super LLM comment-out; intent classifier and shallow/deep research agents now all use Nano by default.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User Query] --> B[clarifier_agent\nnow uses nemotron_nano_llm]
    B --> C{Tool calls\nparallel_tool_calls=True}
    C -- "parallel tools" --> D[ToolNode\nmultiple ToolMessages]
    D --> B
    C -- "clarification needed" --> E[ask_for_clarification node]
    E --> B
    C -- "complete" --> F{enable_plan_approval?}
    F -- yes --> G[plan_preview node]
    F -- no --> H[END]
    G --> H

    B2[deep_research_agent\nresearcher_llm] --> I{API Endpoint}
    I -- "default\nnemotron_nano_llm" --> J["integrate.api.nvidia.com\nnvidia/nemotron-3-nano-30b-a3b"]
    I -- "optional\nnemotron_super_llm\ncommented out" --> K["integrate.api.nvidia.com\nnvidia/nemotron-3-super-120b-a12b\n⚠️ limited availability"]
Loading

Comments Outside Diff (2)

  1. README.md, line 76-78 (link)

    Duplicate model entry

    nemotron-3-nano-30b-a3b appears twice in the software components list — once as (agents, researcher) and again as (intent classifier). These should be merged into a single entry to avoid confusion.

  2. src/aiq_agent/agents/clarifier/agent.py, line 521 (link)

    Parallel tool calls enabled

    This changes parallel_tool_calls from False to True, which allows the LLM to invoke multiple tools simultaneously within the clarifier agent. This is a behavioral change worth confirming: with parallel tool calls, the ToolNode will return multiple ToolMessage objects, and the agent_node check isinstance(state.messages[-1], ToolMessage) correctly handles this since the last element will still be a ToolMessage. Ensure this has been tested with the clarifier's multi-turn flow, as concurrent web searches could occasionally produce conflicting or redundant context that the LLM must reconcile before generating the JSON clarification response.

Last reviewed commit: 58ab19b

…nemotron-3-super-120b-a12b and new base URL for improved integration.
…ron Super model

- Introduced a new Jupyter notebook for getting started with the NVIDIA AI-Q Blueprint, detailing installation, environment setup, and usage instructions.
- Added a script to install and serve the Nemotron Super model with tensor parallelism, including configuration options and usage instructions.
- Updated troubleshooting documentation to address potential issues with the Nemotron Super build endpoint stability and recommended self-hosting solutions.
@AjayThorve AjayThorve requested a review from raykallen March 16, 2026 23:22
Copy link
Copy Markdown
Collaborator

@raykallen raykallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. we'll patch update for super when ready.

@AjayThorve AjayThorve merged commit 7e53572 into NVIDIA-AI-Blueprints:develop Mar 16, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants