diff --git a/AGENTS.md b/AGENTS.md index eb71694a2..feec7eac1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,574 +1,254 @@ # AGENTS.md -This file provides guidance to agents when working with code in this repository. - -## Project Overview - -**DataDesigner** is an NVIDIA NeMo project for creating synthetic datasets from scratch. It's a comprehensive framework that generates structured data using multiple generation strategies: - -- **Sampled data**: Built-in generators (UUID, DateTime, etc.) and Faker integration -- **LLM-generated content**: Text, code, and structured data via LiteLLM -- **Expression-based columns**: Derived columns using Jinja2 templates -- **Validation & scoring**: Python, SQL, and remote validators; LLM-based judge scoring -- **Seed dataset-based generation**: Generate from existing datasets - -### Architecture - -The project follows a layered architecture: - -1. **Config Layer** ([packages/data-designer-config/src/data_designer/config/](packages/data-designer-config/src/data_designer/config/)): User-facing configuration API - - `config_builder.py`: Main builder API for constructing configurations - - `column_configs.py`: Column configuration types (Sampler, LLMText, LLMCode, LLMStructured, LLMJudge, Expression, Validation, SeedDataset) - - `models.py`: Model configurations and inference parameters - - `sampler_params.py`: Parametrized samplers (Uniform, Category, Person, DateTime, etc.) - -2. **Engine Layer** ([packages/data-designer-engine/src/data_designer/engine/](packages/data-designer-engine/src/data_designer/engine/)): Internal generation and processing - - `column_generators/`: Generates individual columns from configs - - `dataset_builders/`: Orchestrates full dataset generation with DAG-based dependency management - - `models/`: LLM integration via LiteLLM with response parsing - - `validators/`: Column validation (Python, SQL, Code, Remote) - - `sampling_gen/`: Sophisticated person/entity sampling - -3. **Interface Layer** ([packages/data-designer/src/data_designer/interface/](packages/data-designer/src/data_designer/interface/)): Public API - - `data_designer.py`: Main `DataDesigner` class (primary entry point) - - `results.py`: Result containers - - `errors.py`: Public error types - -### Recommended Import Pattern - -```python -import data_designer.config as dd -from data_designer.interface import DataDesigner - -# Usage: -data_designer = DataDesigner() -config_builder = dd.DataDesignerConfigBuilder() -config_builder.add_column( - dd.SamplerColumnConfig( - name="category", - sampler_type=dd.SamplerType.CATEGORY, - params=dd.CategorySamplerParams(values=["A", "B"]), - ) -) +This file is the operating guide for agents working in this repository. + +## Mission + +Help users build, debug, and extend **DataDesigner** quickly and safely. + +Optimize for: +- Correctness first (schema validity, generation behavior, validation behavior) +- Fast iteration (small diffs, targeted tests, clear failure diagnosis) +- API stability (prefer compatibility-preserving changes) + +## First 5 Minutes Checklist + +1. Read this file and user request. +2. Identify impacted layer(s): `config`, `engine`, `interface`, plugin package(s). +3. Locate the execution path from public API to concrete implementation. +4. Pick `make` targets for install/check/test steps before using direct `uv` commands. +5. Make minimal code changes with explicit type annotations. +6. Run focused lint/tests for changed scope, then broaden only if needed. + +## Architecture At A Glance + +DataDesigner is split into three packages: +- `data-designer-config`: user-facing config models/builders +- `data-designer-engine`: compilation + generation/validation runtime +- `data-designer`: interface entrypoint and result types + +```mermaid +flowchart TD + U[User or App Code] --> C[DataDesignerConfigBuilder] + U --> I[DataDesigner Interface] + CLI[data-designer CLI] --> I + + subgraph Workspace_Packages + direction LR + CFG[data-designer-config] + ENG[data-designer-engine] + API[data-designer] + end + + C --> CFG + I --> API + API --> ENG + ENG --> CFG + + C --> BLD[build to DataDesignerConfig] + BLD --> CP[compile_data_designer_config] + + I --> RP[create_resource_provider] + RP --> MR[ModelRegistry] + RP --> MCP[MCPRegistry] + RP --> SR[SeedReaderRegistry] + RP --> AS[ArtifactStorage] + + CP --> DWB[ColumnWiseDatasetBuilder] + DWB --> DAG[DAG sort and config compile] + DAG --> REG[DataDesignerRegistry] + REG --> GEN[Column Generators] + GEN --> BM[DatasetBatchManager] + BM --> PROC[Processors] + PROC --> AS + + AS --> PROF[DatasetProfiler] + PROF --> RES[Results] + + PLUG[PluginRegistry] --> REG + PLUG --> SR ``` -### Key Design Patterns - -- **Builder pattern**: Configuration construction via `DataDesignerConfigBuilder` -- **Registry pattern**: Plugin system for column generators, validators, and profilers -- **Strategy pattern**: Multiple generation approaches (sampled, LLM, expression, seed) -- **DAG-based execution**: Column dependencies managed as directed acyclic graph +## High-Value File Map -## Development Workflow +- `packages/data-designer/src/data_designer/interface/data_designer.py`: primary public entrypoint +- `packages/data-designer-config/src/data_designer/config/config_builder.py`: config assembly API +- `packages/data-designer-config/src/data_designer/config/column_configs.py`: column config schemas +- `packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py`: orchestration core +- `packages/data-designer-engine/src/data_designer/engine/column_generators/`: generation implementations +- `packages/data-designer-engine/src/data_designer/engine/validators/`: validator implementations +- `packages/data-designer-engine/src/data_designer/engine/models/`: model adapters/registry -This project uses `uv` for dependency management and `make` for common tasks: - -```bash -# Install dependencies -uv sync +## Core Patterns To Preserve -# Install with dev dependencies -uv sync --all-extras +- Builder pattern for config creation (`DataDesignerConfigBuilder`) +- Registry pattern for generators/validators/profilers/models +- Strategy pattern for sampled/LLM/expression/seed-driven generation +- DAG-based dependency handling across column generation -# Run the main module (if applicable) -uv run python -m data_designer -``` +## Non-Negotiable Coding Rules -### Code Quality +- Add type annotations to all functions/methods/class attributes (including tests). +- Use absolute imports only. +- Keep imports at module scope (unless unavoidable). +- Use double quotes for strings. +- Keep lines `<= 120` chars. +- Avoid vacuous comments; comment only non-obvious logic. +- Prefer `make` targets over raw tool commands when a matching `Makefile` target exists. +- Do not use raw `uv sync` for standard setup/install flows; use the corresponding `make install*` target. +- Preserve or add NVIDIA SPDX headers in Python files: -```bash -# Using Make (recommended) -make lint # Run ruff linter -make lint-fix # Fix linting issues automatically -make format # Format code with ruff -make format-check # Check code formatting without changes -make check-all # Run all checks (format-check + lint) -make check-all-fix # Run all checks with autofix (format + lint-fix) - -# Direct commands -uv run ruff check # Lint all files -uv run ruff check --fix # Lint with autofix -uv run ruff format # Format all files -uv run ruff format --check # Check formatting +```python +# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 ``` -### Running Tests +- For newly created Python files, create the file first, then run: ```bash -# Run all tests -uv run pytest - -# Run tests with verbose output -uv run pytest -v - -# Run a specific test file -uv run pytest tests/config/test_sampler_constraints.py - -# Run tests with coverage -uv run pytest --cov=data_designer --cov-report=term-missing --cov-report=html - -# Using Make -make test # Run all tests -make coverage # Run tests with coverage report +make update-license-headers ``` -## Key Files - -- [packages/data-designer/src/data_designer/interface/data_designer.py](packages/data-designer/src/data_designer/interface/data_designer.py) - Main entry point (`DataDesigner` class) -- [packages/data-designer-config/src/data_designer/config/config_builder.py](packages/data-designer-config/src/data_designer/config/config_builder.py) - Configuration API (`DataDesignerConfigBuilder`) -- [packages/data-designer-config/src/data_designer/config/__init__.py](packages/data-designer-config/src/data_designer/config/__init__.py) - User-facing config API exports -- [packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py](packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py) - Generation orchestrator -- [pyproject.toml](pyproject.toml) - Project dependencies and tool configurations -- [Makefile](Makefile) - Common development commands - -## Working Guidelines - -- **Comments**: Only insert comments when code is especially important to understand. For basic code blocks, comments aren't necessary. We want readable code without vacuous comments. -- **License headers**: All Python files must include the NVIDIA SPDX license header: - ```python - # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. - # SPDX-License-Identifier: Apache-2.0 - ``` - Use `make update-license-headers` to add headers to all files automatically. -- **Imports**: Avoid importing Python modules inside method definitions. Prefer module-level imports for better performance and clarity. -- **Type annotations**: ALWAYS add type annotations to all functions, methods, and class attributes (including tests). - -## Code Style - -This project uses `ruff` (v0.12.3) for linting and formatting. Follow these guidelines to avoid linter errors: - -### General Formatting - -- **Line length**: Maximum 120 characters per line -- **Quote style**: Always use double quotes (`"`) for strings -- **Indentation**: Use 4 spaces (never tabs) -- **Target version**: Python 3.11+ +## Imports, Lazy Loading, TYPE_CHECKING -### Type Annotations +For heavy third-party modules, use lazy imports via: +- `packages/data-designer-config/src/data_designer/lazy_heavy_imports.py` -Type annotations are REQUIRED for all code in this project. This is strictly enforced for code quality and maintainability. - -- **ALWAYS** add type annotations to all functions, methods, and class attributes (including tests) -- Use primitive types when possible: `list` not `List`, `dict` not `Dict`, `set` not `Set`, `tuple` not `Tuple` -- Use modern union syntax with `|` for optional and union types (Python 3.10+): - - `str | None` not `Optional[str]` - - `int | str` not `Union[int, str]` -- Only import from `typing` when absolutely necessary for complex generic types -- For Pydantic models, use field-level type annotations - - ```python - # Good - def process_items(items: list[str], max_count: int | None = None) -> dict[str, int]: - return {item: len(item) for item in items} - - # Avoid - missing type annotations - def process_items(items, max_count=None): - return {item: len(item) for item in items} - - # Avoid - old-style typing - from typing import List, Dict, Optional - def process_items(items: List[str], max_count: Optional[int] = None) -> Dict[str, int]: - return {item: len(item) for item in items} - ``` - -### Import Style - -- **ALWAYS** use absolute imports, never relative imports -- Place imports at module level, not inside functions (exception: it is unavoidable for performance reasons) -- Import sorting is handled by `ruff`'s `isort` - imports should be grouped and sorted: - 1. Standard library imports - 2. Third-party imports (use `lazy_heavy_imports` for heavy libraries) - 3. First-party imports (`data_designer`) -- Use standard import conventions (enforced by `ICN`) -- See [Lazy Loading and TYPE_CHECKING](#lazy-loading-and-type_checking) section for optimization guidelines - - ```python - # Good - from data_designer.config.config_builder import DataDesignerConfigBuilder - - # Bad - relative import (will cause linter errors) - from .config_builder import DataDesignerConfigBuilder - - # Good - imports at module level - from pathlib import Path - - def process_file(filename: str) -> None: - path = Path(filename) - - # Bad - import inside function - def process_file(filename: str) -> None: - from pathlib import Path - path = Path(filename) - ``` - -### Lazy Loading and TYPE_CHECKING - -This project uses lazy loading for heavy third-party dependencies to optimize import performance. - -#### When to Use Lazy Loading - -**Heavy third-party libraries** (>100ms import cost) should be lazy-loaded via `lazy_heavy_imports.py`: +Pattern: ```python -# ❌ Don't import directly -import pandas as pd -import numpy as np - -# ✅ Use lazy loading with IDE support from typing import TYPE_CHECKING -from data_designer.lazy_heavy_imports import pd, np +from data_designer.lazy_heavy_imports import pd if TYPE_CHECKING: - import pandas as pd # For IDE autocomplete and type hints - import numpy as np + import pandas as pd ``` -This pattern provides: -- Runtime lazy loading (fast startup) -- Full IDE support (autocomplete, type hints) -- Type checker validation - -**See [lazy_heavy_imports.py](packages/data-designer-config/src/data_designer/lazy_heavy_imports.py) for the current list of lazy-loaded libraries.** - -#### Adding New Heavy Dependencies - -If you add a new dependency with significant import cost (>100ms): +Rules: +- Put `data_designer` type-only imports in `TYPE_CHECKING` blocks. +- Do **not** move runtime-required imports (e.g., Pydantic model field types) into `TYPE_CHECKING`. -1. **Add to `lazy_heavy_imports.py`:** - ```python - _LAZY_IMPORTS = { - # ... existing entries ... - "your_lib": "your_library_name", - } - ``` +## Testing and Validation Workflow -2. **Update imports across codebase:** - ```python - from typing import TYPE_CHECKING - from data_designer.lazy_heavy_imports import your_lib +Use `make` targets by default. Use direct `uv run ...` only when no equivalent `make` target exists (for example, one +specific test file during tight iteration). - if TYPE_CHECKING: - import your_library_name as your_lib # For IDE support - ``` +Quick `make` command map: +- Install baseline: `make install` +- Install dev environment: `make install-dev` +- Install with notebook deps: `make install-dev-notebooks` +- Install with recipe deps: `make install-dev-recipes` +- Lint + format checks: `make check-all` +- Autofix checks: `make check-all-fix` +- Package tests: `make test-config`, `make test-engine`, `make test-interface` +- Full tests: `make test` +- Coverage: `make coverage` +- License headers: `make update-license-headers` -3. **Verify with performance test:** - ```bash - make perf-import CLEAN=1 - ``` +Preferred local flow: -#### Using TYPE_CHECKING Blocks - -`TYPE_CHECKING` blocks defer imports that are only needed for type hints, preventing circular dependencies and reducing import time. - -**For internal data_designer imports:** - -```python -from __future__ import annotations # Always include at top +```bash +# install workspace and dev tooling +make install-dev -from typing import TYPE_CHECKING +# lint + format +make check-all -# Runtime imports -from pathlib import Path -from data_designer.config.base import ConfigBase +# targeted package tests during iteration +make test-config +# or: make test-engine +# or: make test-interface -if TYPE_CHECKING: - # Type-only imports - only visible to type checkers - from data_designer.engine.models.facade import ModelFacade +# optional: one specific test file when needed +uv run pytest path/to/test_file.py -v -def get_model(model: ModelFacade) -> str: - return model.name +# full test pass before large merge-ready change +make test ``` -**For lazy-loaded libraries (see pattern in "When to Use Lazy Loading" above):** -- Import from `lazy_heavy_imports` for runtime -- Add full import in `TYPE_CHECKING` block for IDE support - -**Rules for TYPE_CHECKING:** - -✅ **DO put in TYPE_CHECKING:** -- Internal `data_designer` imports used **only** in type hints -- Imports that would cause circular dependencies -- **Full imports of lazy-loaded libraries for IDE support** (e.g., `import pandas as pd` in addition to runtime `from data_designer.lazy_heavy_imports import pd`) - -❌ **DON'T put in TYPE_CHECKING:** -- **Standard library imports** (`Path`, `Any`, `Callable`, `Literal`, `TypeAlias`, etc.) -- **Pydantic model types** used in field definitions (needed at runtime for validation) -- **Types used in discriminated unions** (Pydantic needs them at runtime) -- **Any import used at runtime** (instantiation, method calls, base classes, etc.) - -**Examples:** - -```python -# ✅ CORRECT - Lazy-loaded library with IDE support -from typing import TYPE_CHECKING -from data_designer.lazy_heavy_imports import pd - -if TYPE_CHECKING: - import pandas as pd # IDE gets full type hints - -def load_data(path: str) -> pd.DataFrame: # IDE understands pd.DataFrame - return pd.read_csv(path) - -# ✅ CORRECT - Standard library NOT in TYPE_CHECKING -from pathlib import Path -from typing import Any +Other useful commands: -def process_file(path: Path) -> Any: - return path.read_text() - -# ✅ CORRECT - Internal type-only import -from typing import TYPE_CHECKING - -if TYPE_CHECKING: - from data_designer.engine.models.facade import ModelFacade - -def get_model(model: ModelFacade) -> str: # Only used in type hint - return model.name - -# ❌ INCORRECT - Pydantic field type in TYPE_CHECKING -from typing import TYPE_CHECKING - -if TYPE_CHECKING: - from data_designer.config.models import ModelConfig # Wrong! - -class MyConfig(BaseModel): - model: ModelConfig # Pydantic needs this at runtime! - -# ✅ CORRECT - Pydantic field type at runtime -from data_designer.config.models import ModelConfig - -class MyConfig(BaseModel): - model: ModelConfig +```bash +make install +make install-dev-notebooks +make install-dev-recipes +make check-all-fix +make coverage +make update-license-headers +make verify-imports ``` -### Naming Conventions (PEP 8) - -Follow PEP 8 naming conventions: - -- **Functions and variables**: `snake_case` -- **Classes**: `PascalCase` -- **Constants**: `UPPER_SNAKE_CASE` -- **Private attributes**: prefix with single underscore `_private_var` - - ```python - # Good - class DatasetGenerator: - MAX_RETRIES = 3 - - def __init__(self) -> None: - self._cache: dict[str, str] = {} - - def generate_dataset(self, config: dict[str, str]) -> pd.DataFrame: - pass - - # Bad - class dataset_generator: # Should be PascalCase - maxRetries = 3 # Should be UPPER_SNAKE_CASE - - def GenerateDataset(self, Config): # Should be snake_case - pass - ``` - -### Common Pitfalls to Avoid - -1. **Mutable default arguments**: - - ```python - # Bad - mutable default argument - def add_item(item: str, items: list[str] = []) -> list[str]: - items.append(item) - return items - - # Good - def add_item(item: str, items: list[str] | None = None) -> list[str]: - if items is None: - items = [] - items.append(item) - return items - ``` - -2. **Unused imports and variables**: - - ```python - # Bad - unused import - from pathlib import Path - from typing import Any # Not used - - def process() -> None: - pass - - # Good - only import what you use - from pathlib import Path - - def process() -> None: - pass - ``` - -3. **Simplify code where possible** (enforced by `SIM`): - - ```python - # Bad - if condition: - return True - else: - return False - - # Good - return condition - - # Bad - if key in my_dict: - value = my_dict[key] - else: - value = default - - # Good - value = my_dict.get(key, default) - ``` - -4. **Use comprehensions properly**: - - ```python - # Bad - list([x for x in items]) # Unnecessary list() call - - # Good - [x for x in items] - - # Bad - dict([(k, v) for k, v in items]) +Command preference: +- Use `make` targets whenever applicable (lint, format, checks, tests, coverage, headers) to keep behavior consistent with project workflows. - # Good - {k: v for k, v in items} - ``` +## Change Strategy -5. **Proper return statements**: +When implementing a request: +1. Trace from user-facing API to runtime implementation path. +2. Make the smallest coherent patch. +3. Add/adjust tests in same scope as behavior change. +4. Verify lint + tests. +5. Summarize behavior change and residual risks. - ```python - # Bad - unnecessary else after return - def get_value(condition: bool) -> str: - if condition: - return "yes" - else: - return "no" +## Common Pitfalls - # Good - def get_value(condition: bool) -> str: - if condition: - return "yes" - return "no" - ``` +- Circular/plugin import order issues. + - If plugin config imports trigger plugin discovery too early, check: + - `packages/data-designer-config/src/data_designer/plugins/plugin.py` + - `packages/data-designer-config/src/data_designer/plugins/registry.py` + - `packages/data-designer-config/src/data_designer/config/column_types.py` +- Registry state leakage across subclasses. + - Validate class-level mutable registry storage in shared base classes. +- Type hints diverging from runtime behavior. + - Keep signatures/docs synchronized with concrete return values. -### Active Linter Rules +## Column Config Types (Quick Reference) -The following ruff linter rules are currently enabled (see [pyproject.toml](pyproject.toml)): +- `SamplerColumnConfig` +- `LLMTextColumnConfig` +- `LLMCodeColumnConfig` +- `LLMStructuredColumnConfig` +- `LLMJudgeColumnConfig` +- `ExpressionColumnConfig` +- `ValidationColumnConfig` +- `SeedDatasetColumnConfig` -- `W`: pycodestyle warnings -- `F`: pyflakes (unused imports, undefined names) -- `I`: isort (import sorting) -- `ICN`: flake8-import-conventions (standard import names) -- `PIE`: flake8-pie (miscellaneous lints) +See: +- `packages/data-designer-config/src/data_designer/config/column_configs.py` -**Note**: Additional rules (E, N, UP, ANN, B, C4, DTZ, RET, SIM, PTH) are commented out but may be enabled in the future. Write code that would pass these checks for future-proofing. +## Model Config (Quick Reference) -## Testing Patterns +`ModelConfig` includes: +- `alias` +- `model` +- `inference_parameters` +- `provider` +- `skip_health_check` -The project uses `pytest` with the following patterns: +See: +- `packages/data-designer-config/src/data_designer/config/models.py` -- **Fixtures**: Shared test data and configurations in [tests/conftest.py](tests/conftest.py) -- **Stub configs**: YAML-based configuration stubs for testing (see `stub_data_designer_config_str` fixture) -- **Mocking**: Use `unittest.mock.patch` for external services and dependencies -- **Async support**: pytest-asyncio for async tests (`asyncio_default_fixture_loop_scope = "session"`) -- **HTTP mocking**: pytest-httpx for mocking HTTP requests -- **Coverage**: Track test coverage with pytest-cov - -Example test structure: +## Recommended Public API Usage ```python -import pytest -from data_designer.config.config_builder import DataDesignerConfigBuilder - -def test_something(stub_model_configs): - """Test description.""" - builder = DataDesignerConfigBuilder(model_configs=stub_model_configs) - # ... test implementation - assert expected == actual -``` - -## Column Configuration Types - -When working with column configurations, understand these key types: - -- **`SamplerColumnConfig`**: Built-in samplers (UUID, Category, Uniform, Gaussian, Person, DateTime, etc.) -- **`LLMTextColumnConfig`**: LLM text generation with Jinja2 templating -- **`LLMCodeColumnConfig`**: Code generation with language specification -- **`LLMStructuredColumnConfig`**: Structured JSON generation with schema -- **`LLMJudgeColumnConfig`**: Judge/scoring columns for quality assessment -- **`ExpressionColumnConfig`**: Expression-based derived columns (Python eval or Jinja2) -- **`ValidationColumnConfig`**: Validation results (Python, SQL, Code, Remote validators) -- **`SeedDatasetColumnConfig`**: Data from seed datasets - -See [packages/data-designer-config/src/data_designer/config/column_configs.py](packages/data-designer-config/src/data_designer/config/column_configs.py) for detailed schemas. - -## Model Configuration - -Models are configured via `ModelConfig` with: - -- `alias`: User-defined alias for the model -- `model`: Model ID (e.g., from build.nvidia.com) -- `inference_parameters`: Temperature, top_p, max_tokens (can be distribution-based) -- `system_prompt`: Optional system prompt -- `image_modality`: Support for image inputs - -See [packages/data-designer-config/src/data_designer/config/models.py](packages/data-designer-config/src/data_designer/config/models.py) for details. - -## Registry System - -The project uses a registry pattern for extensibility. Key registries: - -- **Column generators**: [packages/data-designer-engine/src/data_designer/engine/column_generators/registry.py](packages/data-designer-engine/src/data_designer/engine/column_generators/registry.py) -- **Validators**: [packages/data-designer-engine/src/data_designer/engine/validators/](packages/data-designer-engine/src/data_designer/engine/validators/) -- **Column profilers**: [packages/data-designer-engine/src/data_designer/engine/analysis/column_profilers/registry.py](packages/data-designer-engine/src/data_designer/engine/analysis/column_profilers/registry.py) -- **Models**: [packages/data-designer-engine/src/data_designer/engine/models/registry.py](packages/data-designer-engine/src/data_designer/engine/models/registry.py) - -When adding new generators or validators, register them appropriately. - -## Pre-commit Hooks - -The project uses pre-commit hooks to enforce code quality. Install them with: - -```bash -uv run pre-commit install -``` - -Hooks include: -- Trailing whitespace removal -- End-of-file fixer -- YAML/JSON/TOML validation -- Merge conflict detection -- Debug statement detection -- Ruff linting and formatting - -## Common Development Tasks - -```bash -# Clean up generated files -make clean - -# Update license headers -make update-license-headers - -# Run all checks before committing -make check-all-fix -make test +import data_designer.config as dd +from data_designer.interface import DataDesigner -# Generate coverage report -make coverage -# View htmlcov/index.html in browser +data_designer = DataDesigner() +config_builder = dd.DataDesignerConfigBuilder() +config_builder.add_column( + dd.SamplerColumnConfig( + name="category", + sampler_type=dd.SamplerType.CATEGORY, + params=dd.CategorySamplerParams(values=["A", "B"]), + ) +) ``` -## Additional Resources +## Quality Gate Before Hand-Off -- **README.md**: Installation and basic usage examples -- **packages/data-designer-config/src/data_designer/config/**: Configuration API documentation -- **tests/**: Comprehensive test suite with usage examples +- Changed behavior covered by tests or justified if not testable. +- If new Python files were added, `make update-license-headers` was run. +- Ruff checks pass for modified files/scope (prefer `make check-all` or package-scoped checks). +- No regressions in import style/type annotation rules. +- User-facing behavior and tradeoffs explained clearly.