Skip to content

docs: Added starter dev notes on push to hugging face hub#355

Open
nabinchha wants to merge 4 commits intomainfrom
nmulepati/docs/dev-notes-push-to-huggingface-hub
Open

docs: Added starter dev notes on push to hugging face hub#355
nabinchha wants to merge 4 commits intomainfrom
nmulepati/docs/dev-notes-push-to-huggingface-hub

Conversation

@nabinchha
Copy link
Contributor

@nabinchha nabinchha commented Feb 26, 2026

Adds a dev note post to cover push_to_hub feature of Data Designer

@nabinchha nabinchha requested a review from a team as a code owner February 26, 2026 18:20
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

This PR adds a developer notes post documenting the push_to_hub feature of Data Designer, enabling users to publish generated datasets directly to the Hugging Face Hub. The post is well-written, covering the core API, the upload pipeline, processor handling, auto-generated dataset cards, auth, and reproducible pipeline round-trips — with supporting diagrams and working code examples.

  • New blog post push-datasets-to-hugging-face-hub.md added with clear examples for both the results.push_to_hub() and HuggingFaceHubClient.push_to_hub_from_folder() workflows
  • Four new supporting images added (push-to-hub-hero.png, push-to-hub-pipeline.png, push-to-hub-round-trip.png, push-to-hub-schema-transform.png)
  • New author nmulepati (Nabin Mulepati) registered in .authors.yml and correctly referenced in post frontmatter
  • mkdocs.yml nav updated to include the new post
  • Minor: the dataset card template path reference at line 229 (integrations/huggingface/dataset_card_template.md) is incomplete — the actual path in the repo is packages/data-designer/src/data_designer/integrations/huggingface/dataset_card_template.md

Confidence Score: 5/5

  • This is a documentation-only PR with no code changes; safe to merge after addressing the template path reference.
  • All changes are documentation and static assets. The post is technically accurate, the <!-- more --> marker fix from a previous round has been applied correctly (single marker after the intro), the .authors.yml entry is consistent with the post frontmatter, and the mkdocs.yml nav entry is correctly placed. The only issue is a minor incorrect file path hint for the dataset card template.
  • docs/devnotes/posts/push-datasets-to-hugging-face-hub.md — line 229 has an incomplete template path reference.

Important Files Changed

Filename Overview
docs/devnotes/posts/push-datasets-to-hugging-face-hub.md New dev-notes post documenting the push_to_hub feature; well-structured with clear code examples and diagrams. One minor issue: the dataset card template path reference is incomplete and points to a non-existent location from the repo root.
docs/devnotes/.authors.yml Adds new author entry nmulepati (Nabin Mulepati) with correct avatar URL and description, matching the post frontmatter.
mkdocs.yml Adds the new devnotes post to the nav in the correct position (newest first under Dev Notes).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[DataDesigner.create] --> B[results object]
    B -->|results.push_to_hub| C{Upload Pipeline}
    D[Saved folder on disk] -->|HuggingFaceHubClient\n.push_to_hub_from_folder| C

    C --> E[1. Upload README.md\nauto-generated dataset card]
    E --> F[2. Upload data/*.parquet\nremapped from parquet-files/]
    F --> G[3. Upload images/*\nskipped if no image columns]
    G --> H[4. Upload processor dirs\nremapped from processors-files/]
    H --> I[5. Upload builder_config.json]
    I --> J[6. Upload metadata.json\npaths rewritten for HF layout]

    J --> K[HuggingFace Hub Repo]

    K -->|from_config URL| L[DataDesignerConfigBuilder\nfully hydrated]
    L --> M[Inspect / Tweak / Re-run]
    M --> A
Loading

Last reviewed commit: ba8a055

dhruvnathawani
dhruvnathawani previously approved these changes Feb 26, 2026
Copy link
Contributor

@dhruvnathawani dhruvnathawani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you use AI for the images?
LGTM

Move the single <\!-- more --> to after the intro paragraph for a shorter
blog teaser and remove the 6 redundant markers throughout the post.

1. Explicit `token=` parameter
2. `HF_TOKEN` env var
3. Cached creds from `hf auth login`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hf auth login availability note

hf auth login is the newer CLI subcommand style introduced in huggingface_hub ≥ 0.22. Users on older installations will only have huggingface-cli login. Consider mentioning the legacy form as a fallback so the auth section stays accurate for a wider range of installed versions:

Suggested change
3. Cached creds from `hf auth login`
3. Cached creds from `huggingface-cli login` (or `hf auth login` on huggingface_hub ≥ 0.22)
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/devnotes/posts/push-datasets-to-hugging-face-hub.md
Line: 239

Comment:
**`hf auth login` availability note**

`hf auth login` is the newer CLI subcommand style introduced in `huggingface_hub` ≥ 0.22. Users on older installations will only have `huggingface-cli login`. Consider mentioning the legacy form as a fallback so the auth section stays accurate for a wider range of installed versions:

```suggestion
3. Cached creds from `huggingface-cli login` (or `hf auth login` on huggingface_hub ≥ 0.22)
```

How can I resolve this? If you propose a fix, please make it concise.

@nabinchha
Copy link
Contributor Author

Did you use AI for the images? LGTM

@dhruvnathawani, yes!

Comment on lines +229 to +230
The template lives at `integrations/huggingface/dataset_card_template.md` if you
want to see the Jinja2 source.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect template file path

The path given for the dataset card template is incomplete. The actual location in the repository is packages/data-designer/src/data_designer/integrations/huggingface/dataset_card_template.md. A developer following this hint and trying to find or open integrations/huggingface/dataset_card_template.md from the repo root will not find it.

Suggested change
The template lives at `integrations/huggingface/dataset_card_template.md` if you
want to see the Jinja2 source.
The template lives at `packages/data-designer/src/data_designer/integrations/huggingface/dataset_card_template.md` if you
want to see the Jinja2 source.
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/devnotes/posts/push-datasets-to-hugging-face-hub.md
Line: 229-230

Comment:
**Incorrect template file path**

The path given for the dataset card template is incomplete. The actual location in the repository is `packages/data-designer/src/data_designer/integrations/huggingface/dataset_card_template.md`. A developer following this hint and trying to `find` or open `integrations/huggingface/dataset_card_template.md` from the repo root will not find it.

```suggestion
The template lives at `packages/data-designer/src/data_designer/integrations/huggingface/dataset_card_template.md` if you
want to see the Jinja2 source.
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants