Skip to content

Conversation

@KE7
Copy link
Owner

@KE7 KE7 commented Oct 30, 2025

No description provided.

@KE7 KE7 merged commit 13290aa into main Oct 30, 2025
2 of 3 checks passed
@KE7 KE7 deleted the license branch October 30, 2025 22:44
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on November 3

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Bug: License Metadata Inconsistency

The _create_metadata method still hardcodes CC BY-NC 4.0 license details, including commercial_license_required=True and license_contact. This creates inconsistent licensing in generated datasets, as the embedded metadata contradicts the project's Apache 2.0 license, CLI messages, and README.

graid/src/graid/data/generate_dataset.py#L1495-L1498

# Create repository
create_repo(
hub_repo_id, repo_type="dataset", private=hub_private, exist_ok=True
)

graid/src/graid/data/generate_dataset.py#L1299-L1302

4. HuggingFace dataset construction with embedded PIL images
5. Optional local saving and Hub upload
Key Features:

graid/src/graid/data/generate_dataset.py#L1618-L1622

Raises:
KeyboardInterrupt: If user cancels the selection process
Example:

graid/src/graid/data/generate_dataset.py#L1068-L1071

dataset = dataset.cast_column("image", HFImage())
# Add metadata
metadata = self._create_metadata()

graid/src/graid/data/generate_dataset.py#L1423-L1427

>>> from graid.models import YoloModel, DetectronModel
>>> models = [YoloModel("yolov8x.pt"), DetectronModel("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")]
>>> dataset = generate_dataset(
... dataset_name="bdd",
... split="train",

graid/src/graid/data/generate_dataset.py#L1435-L1438

... hub_repo_id="myuser/bdd-reasoning-dataset"
... )
Custom question configuration:

graid/src/graid/data/generate_dataset.py#L1030-L1033

f"Processed {processed_images} images, generated {total_qa_pairs} QA pairs"
)
def build(self):

graid/src/graid/data/generate_dataset.py#L1356-L1363

num_workers (int): Number of parallel workers for data loading.
Should typically match CPU core count. Default: 4
qa_workers (int): Number of parallel workers for QA generation.
- 1: Sequential processing (debugging, memory-limited)
- >1: Parallel processing (production, high-throughput)
Recommended: 2-4x CPU cores. Default: 4

Fix in Cursor Fix in Web


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants