A two-level agentic system combining NVIDIA AI-Q Research Assistant with Universal Deep Research (UDR) for complex, multi-domain research tasks.
This project implements a novel architecture that synthesizes two distinct NVIDIA AI blueprints:
- NVIDIA AI-Q Research Assistant (https://github.com/NVIDIA-AI-Blueprints/aiq-research-assistant) - Production-ready research agent with RAG capabilities
- NVIDIA Universal Deep Research (UDR) - Strategy-as-code engine for dynamic research workflows
The system features a two-level agentic architecture:
- Level 1: AI-Q orchestrator (built on LangGraph) that decides research strategy
- Level 2: UDR engine that dynamically generates and executes custom research code when complexity warrants
This allows the agent to move beyond predefined RAG pipelines and adapt its strategy on-the-fly for complex queries like "Generate a report on 'NIMs on EKS' and include a cost-benefit analysis."
| Component | Technology | Purpose |
|---|---|---|
| User Interface | React/Next.js + CopilotKit | Real-time agentic flow visualization |
| Agent Backend | FastAPI + LangGraph | State management and agent orchestration |
| Reasoning LLM | Nemotron-Super-49B NIM | Planning and reflection |
| Instruct LLM | Llama-3.3-70B NIM | Report writing |
| Embedding Model | NeMo Retriever NIM | Vector search |
| RAG Pipeline | NVIDIA RAG Blueprint | Multi-modal document retrieval |
| Dynamic Strategy | UDR Integration | Strategy-as-code execution |
| Infrastructure | AWS EKS + Karpenter | GPU auto-scaling |
User Prompt
β
[Planner Node] β Nemotron NIM
β
Decision: Complex or Simple?
βββ Simple β [Standard RAG Pipeline]
βββ Complex β [UDR Strategy Execution]
βββ Compile Strategy (Natural Language β Python)
βββ Execute (Calls NIMs, RAG, Web Search)
βββ Synthesize Results
β
[Final Report Node]
β
User receives report + citations
Key Feature: Every step streams state updates to the CopilotKit UI for real-time visualization.
- AWS Account with EKS permissions
- NVIDIA NGC API Key (Get it here)
- Tavily API Key (optional, for web search)
- Tools:
terraform,kubectl,helm,docker,aws-cli
# Set environment variables
export TF_VAR_ngc_api_key="YOUR_NGC_API_KEY"
export TAVILY_API_KEY="YOUR_TAVILY_KEY" # Optional
export AWS_DEFAULT_REGION="us-west-2"
# 1. Deploy infrastructure (EKS + Karpenter + GPU Operator)
cd infrastructure/terraform
./install.sh # ~20 minutes
# 2. Deploy NVIDIA NIMs
cd ../kubernetes
./deploy-nims.sh # ~30 minutes
# 3. Deploy AI-Q + UDR Agent
./deploy-agent.sh # ~10 minutes
# 4. Access the application
# The script will output the LoadBalancer URLDeploy the NVIDIA RAG Blueprint with Milvus for production-grade document retrieval:
# 5. Deploy NVIDIA RAG Blueprint (enterprise vector store)
cd ../helm
./deploy-rag-blueprint.sh # ~15 minutes
# 6. Ingest US Customs Tariff PDFs (99 chapters)
cd ../../scripts
./setup_tariff_rag_enterprise.sh # ~20 minutes
# Test queries:
# - "What is the tariff for replacement batteries for a Raritan remote management card?"
# - "What's the tariff of Reese's Pieces?"
# - "Tariff of a replacement Roomba vacuum motherboard, used"Features:
- β Milvus Vector Database - Enterprise-grade, scalable
- β Hybrid Search - Vector + keyword (BM25) for tariff codes
- β GPU-Accelerated PDF Processing - NVIDIA NIM microservices
- β Citation Support - Returns source documents with answers
π Full Guide: NVIDIA_RAG_BLUEPRINT_DEPLOYMENT.md
π Quick Start: QUICKSTART_RAG_ENTERPRISE.md
-
Nemotron Reasoning NIM (llama-3.3-nemotron-super-49b-v1.5)
- Purpose: Planning, reflection, strategy compilation
- GPU: 1x NVIDIA A10G (24GB)
- Service:
nemotron-nano-service.nim.svc.cluster.local:8000
-
Llama 3.3 70B Instruct NIM
- Purpose: Report writing and Q&A
- GPU: 2x NVIDIA A10G (48GB)
- Service:
instruct-llm-service.nim.svc.cluster.local:8000
-
Embedding NIM (Arctic Embed Large)
- Purpose: Vector embeddings for RAG
- GPU: 1x NVIDIA A10G (24GB)
- Service:
embedding-service.nim.svc.cluster.local:8000
-
Milvus Vector Database
- Purpose: Scalable vector storage for document collections
- Storage: 100Gi EBS gp3
- Service:
milvus-standalone.rag-blueprint.svc.cluster.local:19530
-
RAG Query Server
- Purpose: Search and retrieval with hybrid search (vector + BM25)
- Replicas: 2 (for HA)
- Service:
rag-query-server.rag-blueprint.svc.cluster.local:8081
-
RAG Ingest Server
- Purpose: GPU-accelerated PDF processing and document ingestion
- GPU: 1x NVIDIA A10G (for PDF processing)
- Service:
rag-ingest-server.rag-blueprint.svc.cluster.local:8082
-
AI-Q + UDR Agent Backend
- FastAPI service with CopilotKit integration
- Namespace:
aiq-agent - Replicas: 2 (for HA)
-
Frontend UI
- Next.js application with real-time agent visualization
- Exposed via AWS LoadBalancer
- EKS Cluster (Kubernetes 1.28)
- Karpenter (GPU node auto-scaling)
- NVIDIA GPU Operator (Driver management)
- VPC (3 AZs, public + private subnets)
Total GPU Requirement:
- Base deployment: 4x NVIDIA A10G GPUs (Reasoning, Instruct, Embedding)
- With enterprise RAG: 5x NVIDIA A10G GPUs (+ PDF processing)
Estimated Cost:
- Base: ~$15-20/hour when fully running
- With RAG Blueprint: ~$20-25/hour
- Tip: Use Spot instances to reduce costs by 50-70%
Save ~90% on compute costs when not actively developing:
# End of day
bash infrastructure/scripts/sleep-cluster.sh
# Next morning
bash infrastructure/scripts/wake-cluster.sh
bash infrastructure/scripts/monitor-cluster-readiness.sh # Auto-exits when readyWhat happens:
- Sleep: Scales down NIMs + Backend (GPU-intensive)
- Keeps running: Milvus + Frontend (lightweight, ~$1/day)
- Wake time: ~17 minutes (Milvus stays warm)
- Cost savings: ~90% reduction
For maximum savings when gone for 2+ days:
# Before leaving
bash scripts/deep-sleep-cluster.sh
# When back
bash scripts/deep-wake-cluster.shTrade-offs:
- β 95% cost savings (stops everything)
- β All-in-one script (built-in monitoring)
- β ~20+ minute wake time (Milvus rehydration)
| Script | Purpose | Wake Time | Savings |
|---|---|---|---|
infrastructure/scripts/sleep-cluster.sh |
Daily use (recommended) | ~17 min | 90% |
infrastructure/scripts/wake-cluster.sh |
Quick wake | - | - |
infrastructure/scripts/monitor-cluster-readiness.sh |
Wait for ready (auto-exits) | - | - |
infrastructure/scripts/test-sleep-wake-cycle.sh |
Full lifecycle test | 17 min | - |
scripts/deep-sleep-cluster.sh |
Extended downtime (2+ days) | ~20+ min | 95% |
scripts/deep-wake-cluster.sh |
Wake from deep sleep | - | - |
π‘ Tip: Use
infrastructure/scripts/for daily workflow (faster, modular). Usescripts/deep-sleep-cluster.shonly for extended downtime when you need maximum savings. Seescripts/README.mdfor details.
cd backend
# Create virtual environment
python3.12 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export NEMOTRON_NIM_URL="http://localhost:8000" # Or hosted NIM URL
export INSTRUCT_LLM_URL="http://localhost:8001"
export RAG_SERVER_URL="http://localhost:8081/v1"
export NGC_API_KEY="your_key"
# Run backend
python main.pycd frontend
# Install dependencies
npm install
# Run dev server (auto-detects localhost backend)
npm run dev
# Open http://localhost:3000The frontend supports multiple ways to configure the backend URL, in priority order:
-
Runtime Config (Recommended for Production) - No rebuild required!
Edit
frontend/public/config.js:window.__RUNTIME_CONFIG__ = { BACKEND_URL: "http://your-backend-url.example.com" };
For Kubernetes deployments, this file is mounted via ConfigMap. Update
values.yaml:frontend.runtimeConfig: backendUrl: "http://your-backend-elb.amazonaws.com"
Then run
helm upgrade- pods restart automatically. -
Build-time Environment Variable
# During Docker build docker build --build-arg NEXT_PUBLIC_BACKEND_URL="http://backend:8000" ...
-
Automatic Detection - The frontend auto-detects:
localhostβ useshttp://localhost:8000- AWS ELB hostname β uses configured backend ELB
Note:
NEXT_PUBLIC_*variables are baked into the JavaScript bundle at build time in Next.js. For true runtime configuration without rebuilding, use option 1 (Runtime Config).
from aiq_aira.udr_integration import UDFIntegration
from langchain_openai import ChatOpenAI
# Initialize UDR
llm = ChatOpenAI(base_url="http://nemotron-nim:8000/v1")
udr = UDFIntegration(
compiler_llm=llm,
rag_url="http://rag-server:8081/v1",
nemotron_nim_url="http://nemotron-nim:8000",
embedding_nim_url="http://embedding-nim:8000"
)
# Execute a dynamic strategy
strategy = """
1. Search RAG for 'NIMs on EKS deployment patterns'
2. Search web for 'AWS EKS GPU pricing'
3. Synthesize findings into cost-benefit analysis
"""
result = await udr.execute_dynamic_strategy(strategy, context={})
print(result.synthesized_report)Research_as_a_Code/
βββ aira/ # Copied from NVIDIA AI-Q repo
β βββ src/aiq_aira/ # Core AI-Q agent code
β βββ hackathon_agent.py # β Enhanced agent with UDR
β βββ udr_integration.py # β UDR strategy-as-code engine
βββ backend/ # FastAPI backend
β βββ main.py # β CopilotKit integration
β βββ requirements.txt
β βββ Dockerfile
βββ frontend/ # Next.js UI
β βββ app/
β β βββ layout.tsx # CopilotKit provider
β β βββ page.tsx # Main page
β β βββ components/
β β βββ AgentFlowDisplay.tsx # β Real-time flow visualization
β β βββ ResearchForm.tsx
β β βββ ReportDisplay.tsx
β βββ package.json
β βββ Dockerfile
βββ infrastructure/
β βββ terraform/ # IaaC for EKS
β β βββ main.tf
β β βββ variables.tf
β β βββ karpenter-provisioner.yaml
β β βββ install.sh
β βββ kubernetes/ # K8s manifests
β βββ agent-deployment.yaml
β βββ deploy-nims.sh
β βββ deploy-agent.sh
βββ configs/ # AI-Q configuration
β βββ config.yml
βββ demo/ # Demo assets
βββ deploy/ # Original AI-Q deployment files
βββ README.md # This file
β = New files created for the hackathon
CopilotKit provides the "glue" between the LangGraph backend and React frontend:
Backend (Python):
from copilotkit import CopilotKit
copilot = CopilotKit()
copilot.add_langgraph_endpoint(
app_id="ai_q_researcher",
endpoint="/copilotkit",
graph=agent_graph,
config_factory=lambda: config
)
app.include_router(copilot.router)Frontend (TypeScript):
import { useCoAgentStateRender } from "@copilotkit/react-core";
const { state } = useCoAgentStateRender<AgentState>({
name: "ai_q_researcher", // Must match backend app_id
render: ({ state }) => {
// Render state.logs, state.queries, etc.
}
});The UDR module converts natural language plans into executable Python:
Natural Language:
"1. Search RAG for X
2. Search web for Y
3. Synthesize Z"
β (Compiler)
Python Code:
result1 = await search_rag("X", collection)
result2 = await search_web("Y")
report = await synthesize_findings([result1, result2])
return {"report": report, "sources": [...]}
β (Executor)
Actual NIM calls executed in sandbox
When a NIM pod requests a GPU:
resources:
limits:
nvidia.com/gpu: 1Karpenter:
- Detects unschedulable pod
- Provisions g5.xlarge Spot instance (~$0.50/hr)
- NVIDIA GPU Operator installs drivers
- Pod scheduled on new node
- When idle, node terminated to save costs
Prompt: "What is Amazon EKS?"
Expected Flow:
- Planner selects "Simple RAG"
- Standard AI-Q pipeline executes
- Report generated from RAG + web sources
Prompt: "Generate a report on 'NIMs on EKS' and include a cost-benefit analysis comparing on-premise vs hosted deployment"
Expected Flow:
- Planner selects "Dynamic UDR Strategy"
- UDR compiles multi-step research plan
- Plan executes (RAG + web + synthesis)
- Comprehensive report with analysis
- Submit any query
- Watch the "Agentic Flow" panel
- Should see logs streaming in real-time:
- "π€ Analyzing research complexity..."
- "β Strategy: DYNAMIC_STRATEGY"
- "π Executing dynamic UDR strategy..."
- etc.
| Requirement | Implementation | Status |
|---|---|---|
| β Use NVIDIA NIM | 3x NIMs deployed (Nemotron, Llama, Embedding) | β |
| β Deploy on EKS | Terraform + Karpenter on AWS EKS | β |
| β Agentic Framework | LangGraph (NVIDIA NeMo Agent Toolkit) | β |
| β Visualize Agent Flow | CopilotKit useCoAgentStateRender | β |
| β Infrastructure as Code | Terraform + Helm + K8s manifests | β |
| β Innovation | Two-level agent with UDR strategy-as-code | β |
- Design Plan: Comprehensive architectural design (735 lines)
- AI-Q Blueprint: Original AI-Q documentation
- UDR Paper: Universal Deep Research research paper
- Data on EKS: AWS EKS blueprints
- CopilotKit Docs: CopilotKit documentation
This project integrates and builds upon:
-
NVIDIA AI-Q Research Assistant (GitHub)
- Apache 2.0 License
- Production-ready research agent with RAG
-
NVIDIA Universal Deep Research (GitHub)
- Strategy-as-code paradigm
- Dynamic research planning
-
AWS Data on EKS (GitHub)
- Apache 2.0 License
- EKS + Karpenter blueprints
-
CopilotKit (Website)
- MIT License
- AG-UI protocol for agentic UI
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Components used:
- NVIDIA AI-Q: Apache 2.0
- NVIDIA UDR: Apache 2.0
- AWS Blueprints: Apache 2.0
- CopilotKit: MIT
Solution: Check GPU availability and NGC API key
kubectl get pods -n nim
kubectl describe pod <nim-pod> -n nim
kubectl logs -n nim <nim-pod>
# Check if Karpenter provisioned GPU nodes
kubectl get nodes --show-labels | grep nvidiaSymptoms:
ERR_CONNECTION_REFUSEDerrorsFailed to load runtime info (http://localhost:8000/copilotkit/info)Agent ai_q_researcher not found
Solution 1: Check if backend URL is configured correctly
The frontend may be trying to connect to localhost:8000 instead of the deployed backend.
For Kubernetes deployments, update the runtime config:
# Edit values.yaml
frontend.runtimeConfig:
backendUrl: "http://your-backend-elb.amazonaws.com"
# Apply changes
helm upgrade <release-name> ./deploy/helm/aiq-airaOr directly edit the ConfigMap:
kubectl edit configmap <release-name>-frontend-config -n <namespace>Solution 2: Check service networking
kubectl get svc -n aiq-agent
kubectl logs -n aiq-agent -l component=backendSolution: Check Nemotron NIM connectivity
kubectl exec -n aiq-agent deployment/aiq-agent-backend -- \
curl http://nemotron-nano-service.nim.svc.cluster.local:8000/v1/models-
Introduction (30s)
- "This is the AI-Q Research Assistant enhanced with Universal Deep Research"
- Show architecture diagram
-
Simple Query (1 min)
- Enter: "What is Amazon EKS?"
- Show: Agent flow selecting "Simple RAG"
- Show: Report generated
-
Complex Query (2 min)
- Enter: "Generate a report on NIMs on EKS with cost-benefit analysis"
- Show: Agent flow selecting "Dynamic UDR Strategy"
- Show: Real-time logs (compilation, execution)
- Show: Comprehensive multi-section report
-
Infrastructure (1 min)
- Show:
kubectl get nodes(Karpenter-provisioned GPUs) - Show:
kubectl get pods -n nim(3 NIMs running) - Show: EKS console
- Show:
-
Conclusion (30s)
- Recap: Two-level agentic system
- Highlight: Dynamic strategy adaptation
- Call to action: Try it yourself!
For questions about this hackathon submission:
- GitHub Issues: Create an issue
- Hackathon: AWS & NVIDIA Agentic AI Unleashed 2025
Built with β€οΈ using NVIDIA AI and AWS EKS