huggingface · davanstrien · Dec 17, 2025 · Dec 16, 2025 · Dec 16, 2025 · Dec 16, 2025
diff --git a/docs/inference-providers/_toctree.yml b/docs/inference-providers/_toctree.yml
@@ -38,6 +38,8 @@
     title: Overview
   - local: integrations/adding-integration
     title: Add Your Integration
+  - local: integrations/datadesigner
+    title: NeMo Data Designer
   - local: integrations/macwhisper
     title: MacWhisper
   - local: integrations/opencode

diff --git a/docs/inference-providers/integrations/datadesigner.md b/docs/inference-providers/integrations/datadesigner.md
@@ -0,0 +1,105 @@
+# NeMo Data Designer
+
+[DataDesigner](https://github.com/NVIDIA-NeMo/DataDesigner) is NVIDIA NeMo's framework for generating high-quality synthetic datasets using LLMs. It enables you to create diverse data using statistical samplers, LLMs, or existing seed datasets while maintaining control over field relationships and data quality.
+
+## Overview
+
+DataDesigner supports OpenAI-compatible endpoints, making it easy to use any model available through Hugging Face Inference Providers for synthetic data generation.
+
+## Prerequisites
+
+- DataDesigner installed (`pip install data-designer`)
+- A Hugging Face account with [API token](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained) (needs "Make calls to Inference Providers" permission)
+
+## Configuration
+
+### 1. Set your HF token
+
+```bash
+export HF_TOKEN="hf_your_token_here"
+```
+
+### 2. Configure HF as a provider
+
+```python
+from data_designer.essentials import (
+    CategorySamplerParams,
+    DataDesigner,
+    DataDesignerConfigBuilder,
+    LLMTextColumnConfig,
+    ModelConfig,
+    ModelProvider,
+    SamplerColumnConfig,
+    SamplerType,
+)
+
+# Define HF Inference Provider (OpenAI-compatible)
+hf_provider = ModelProvider(
+    name="huggingface",
+    endpoint="https://router.huggingface.co/v1",
+    provider_type="openai",
+    api_key="HF_TOKEN",  # Reads from environment variable
+)
+
+# Define a model available via HF Inference Providers
+hf_model = ModelConfig(
+    alias="hf-gpt-oss",
+    model="openai/gpt-oss-120b",
+    provider="huggingface",
+)
+
+# Create DataDesigner with HF provider
+data_designer = DataDesigner(model_providers=[hf_provider])
+config_builder = DataDesignerConfigBuilder(model_configs=[hf_model])
+```
+
+### 3. Generate synthetic data
+
+```python
+# Add a sampler column
+config_builder.add_column(
+    SamplerColumnConfig(
+        name="category",
+        sampler_type=SamplerType.CATEGORY,
+        params=CategorySamplerParams(
+            values=["Electronics", "Books", "Clothing"],
+        ),
+    )
+)
+
+# Add an LLM-generated column
+config_builder.add_column(
+    LLMTextColumnConfig(
+        name="product_name",
+        model_alias="hf-gpt-oss",
+        prompt="Generate a creative product name for a {{ category }} item.",
+    )
+)
+
+# Preview the generated data
+preview = data_designer.preview(config_builder=config_builder, num_records=5)
+preview.display_sample_record()
+
+# Access the DataFrame
+df = preview.dataset
+print(df)
+```
+
+## Using Different Models
+
+You can use any model available through [Inference Providers](https://huggingface.co/models?inference_provider=all). Simply update the `model` field:
+
+```python
+# Use a different model
+hf_model = ModelConfig(
+    alias="hf-olmo",
+    model="allenai/OLMo-3-7B-Instruct",
+    provider="huggingface",
+)
+```
+
+## Resources
+
+- [DataDesigner Documentation](https://nvidia-nemo.github.io/DataDesigner/)
+- [GitHub Repository](https://github.com/NVIDIA-NeMo/DataDesigner)
+- [Available Models on Inference Providers](https://huggingface.co/models?inference_provider=all&pipeline_tag=text-generation)
diff --git a/docs/inference-providers/integrations/index.md b/docs/inference-providers/integrations/index.md
@@ -18,6 +18,7 @@ This table lists _some_ tools, libraries, and applications that work with Huggin
 | Integration                                                                                                         | Description                                                    | Resources                                                                                                             |
 | ------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
 | [CrewAI](https://www.crewai.com/)                                                                                   | Framework for orchestrating AI agent teams                     | [Official docs](https://docs.crewai.com/en/concepts/llms#hugging-face)                                                 |
+| [NeMo Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner)                                                         | Synthetic dataset generation framework                         | [HF docs](./datadesigner)                                                                                              |
 | [GitHub Copilot Chat](https://docs.github.com/en/copilot)                                                           | AI pair programmer in VS Code                                  | [HF docs](./vscode)                                                                                                    |
 | [fast-agent](https://fast-agent.ai/)                                                                                | Flexible framework building MCP/ACP powered Agents, Workflows and evals | [Official docs](https://fast-agent.ai/models/llm_providers/#hugging-face)                                     |
 | [Haystack](https://haystack.deepset.ai/)                                                                            | Open-source LLM framework for building production applications | [Official docs](https://docs.haystack.deepset.ai/docs/huggingfaceapichatgenerator)                                     |
@@ -71,6 +72,12 @@ LLM application frameworks and orchestration platforms.
 - [PydanticAI](https://ai.pydantic.dev/) - Framework for building AI agents with Python ([Official docs](https://ai.pydantic.dev/models/huggingface/))
 - [smolagents](https://huggingface.co/docs/smolagents) - Framework for building LLM agents with tool integration ([Official docs](https://huggingface.co/docs/smolagents/reference/models#smolagents.InferenceClientModel))
 
+### Synthetic Data
+
+Tools for creating synthetic datasets.
+
+- [NeMo Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner) - NVIDIA NeMo framework for synthetic data generation ([HF docs](./datadesigner))
+
 <!-- ## Add Your Integration
 
 Building something with Inference Providers? [Let us know](./adding-integration) and we'll add it to the list. -->