A comprehensive 3D object dataset and annotation pipeline for robotics simulations and vision-language-action (VLA) models. This project provides semantic and physical annotations for thousands of 3D objects using Large Language Model (LLM)-based automated annotation.
RoboTwin-Objects/
βββ objects/ # 3D object assets
β βββ objects_glb/ # GLB format 3D models with collision meshes (.coacd.ply)
β βββ objects_glb_pictures/ # Preview images for GLB objects
β βββ objects_urdf/ # URDF format models with physics properties
β βββ objects_xml/ # XML format scene descriptions
β βββ prompt_test/ # LLM prompt testing and evaluation
β βββ textures/ # Material textures for rendering
βββ prompt/ # LLM prompts for annotation tasks
βββ scripts/ # Automation and processing scripts
β βββ call_llm/ # LLM API interaction scripts
β βββ upload_hf/ # Hugging Face dataset upload utilities
β βββ utils/ # Data processing utilities
βββ *.json # Dataset metadata and annotations
robotwin_objects.json- Base object catalog with IDs, names, categories, and tagsrobotwin_info_generated_by_llm.json- Complete LLM-generated annotations including:- Physical dimensions (
real_sizein meters) - Material properties (
density,static_friction,dynamic_friction,restitution) - Semantic descriptions (
Basic_description,Functional_description)
- Physical dimensions (
robotwin_real_sizes_meter.json- Real-world size measurements in metersrobotwin_uuid_map.csv- UUID to object name mappings
filtered_robotwin_*.json- Filtered subsets with image associationssize_valid_refine_scale_results.json- Scale validation and refinement resultsupdated_robotwin_info.json- Post-processed annotations
The project uses multiple LLM-based annotation stages to generate comprehensive object metadata:
Analyzes 6-view composite images to extract:
- Object name and category classification
- Real-world size estimation (bounding box in meters)
- Material identification and physical properties
- Functional descriptions and use cases
Input: Multi-view object images + object names
Output: Complete semantic and physical annotations
Estimates the longest dimension of objects for scale calculation:
- Conservative real-world size estimation
- Handles perspective distortion in images
- Outputs measurements in meters
Input: Object images
Output: {"longest_m": <float>}
Validates computed object sizes against real-world expectations:
- Checks if dimensions are reasonable for object category
- Identifies objects that are too large/small
- Suggests corrective scale factors
Input: Object dimensions, scales, computed sizes
Output: Validation results with suggested corrections
robotwin_call_gpt_image.py- Multi-threaded GPT API calls with image inputsrobotwin_call_gpt_image_new.py- Updated version with refined promptsrobotwin_generate_real_size.py- Dedicated size estimation pipelinevalidate_real_sizes.py- Automated size validation
compute_scale_from_longest.py- Calculate scaling factors from longest dimensionscalculate_real_sizes.py- Convert model units to real-world metersconvert_real_size_to_m.py- Unit conversion utilitiesproceed_output_scale.py- Apply scale corrections to object data
hf_upload_info.py- Upload datasets to Hugging Face Hub
The dataset covers diverse object categories including:
- Household items: mugs, plates, utensils, containers
- Food items: fruits, packaged foods, beverages
- Furniture: tables, chairs, cabinets, shelves
- Electronics: appliances, devices
- Tools and equipment: kitchen tools, office supplies
- Personal items: shoes, books, accessories
Each object includes detailed physical annotations:
real_size: 3D bounding box [width, depth, height] in metersscale: Scaling factors to convert from model units to meters
density: Mass density in g/cmΒ³static_friction: Static friction coefficient (ΞΌβ) on wood surfacesdynamic_friction: Dynamic friction coefficient (ΞΌβ) on wood surfacesrestitution: Coefficient of restitution (bounce) on wood surfaces
Basic_description: Concise physical descriptionFunctional_description: List of primary use cases and functionscategory: Object classification (mug, food, furniture, etc.)tags: "StructuralEntities" (large/fixed) or "DynamicEntities" (movable)
The objects/prompt_test/ directory contains comparative testing of different LLM prompts:
claude_prompt.txt- Anthropic Claude optimized promptsgpt_prompt.txt- OpenAI GPT optimized promptsdoubao_prompt.txt- ByteDance Doubao promptsdeepseek_prompt.txt- DeepSeek model promptsgrok_prompt.txt- xAI Grok prompts
results.md- Comparative performance analysisrobotwin_scale_generated_by_*_*.json- Results from different model/prompt combinationssort_json.py- Utility for organizing test results
import json
# Load complete object database
with open('robotwin_info_generated_by_llm.json', 'r') as f:
objects = json.load(f)
# Get object properties
obj = objects['00aff23a-2075-44d5-a4eb-da6d5998a409']
print(f"Object: {obj['object_name']}")
print(f"Size: {obj['real_size']} meters")
print(f"Density: {obj['density']} g/cmΒ³")
print(f"Functions: {obj['Functional_description']}")# Run size validation
python scripts/call_llm/validate_real_sizes.py \
--input robotwin_info_generated_by_llm.json \
--output size_validation_results.json# Compute scales from longest dimensions
python scripts/utils/compute_scale_from_longest.py \
-l robotwin_longest_m_by_gpt41.json \
-d filtered_robotwin_dim_img.json \
-o robotwin_scale_from_longest.json{
"00aff23a-2075-44d5-a4eb-da6d5998a409": {
"object_name": "boxed_playing_cards",
"category": "cards",
"real_size": [0.065, 0.022, 0.09],
"density": 0.65,
"static_friction": 0.45,
"dynamic_friction": 0.34,
"restitution": 0.3,
"Basic_description": "A rectangular box containing a standard deck of playing cards",
"Functional_description": [
"used for card games",
"used for magic tricks",
"used for educational purposes"
]
}
}{
"is_proper": true,
"assessment": null,
"typical_size_range": "0.06β0.10 m length, 0.02β0.03 m thickness",
"suggested_scale": null
}OPENAI_MODEL- Model selection (gpt-4o, gpt-4o-mini, etc.)WORKERS- Concurrent processing threads (default: 6)- API Keys - Configure OpenAI, Anthropic, or other LLM provider credentials
MAX_RETRIES- API retry attempts (default: 3)- Scale precision - 4 decimal places with truncation
- Image formats - Support for PNG, JPG composite views