Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions .github/workflows/check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,12 @@ jobs:
- name: Checkout Repository
uses: actions/checkout@v4

- name: Install Poetry
- name: Install uv
run: |
curl -sSL https://install.python-poetry.org | python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH
curl -LsSf https://astral.sh/uv/install.sh | sh

- name: Install Dependencies
run: poetry install
run: uv sync && uv sync --group dev

- name: Ensure check.sh exists and is executable
run: |
Expand Down
312 changes: 266 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,85 +1,305 @@
# GRAID: Generating Reasoning questions from Analysis of Images via Discriminative artificial intelligence
## GRAID: <u>G</u>enerating <u>R</u>easoning questions from <u>A</u>nalysis of <u>I</u>mages via <u>D</u>iscriminative artificial intelligence

[Design Doc](https://docs.google.com/document/d/1zgb1odK3zfwLg2zKts2eC1uQcQfUd6q_kKeMzd1q-m4/edit?tab=t.0)

## 🚀 Quick Start

### Installation
0. 0. Install uv (optional if you already have it): `curl -LsSf https://astral.sh/uv/install.sh | sh` (or see [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/))
1. Create a virtual environment: `uv venv`
2. Activate it: `source .venv/bin/activate` (or use direnv with the provided .envrc)
3. Install dependencies: `uv sync`
4. Install all backends: `uv run install_all`

### Using GRAID CLI
### 🤗 HuggingFace Dataset Generation

**Interactive Mode (Recommended):**
**Generate high-quality VQA datasets for modern ML workflows:**
```bash
# Using conda environment
/work/ke/miniconda3/envs/scenic_reason/bin/python scenic_reasoning/src/scenic_reasoning/graid_cli.py generate
# Interactive mode with step-by-step guidance
graid generate-dataset

# Using uv (after installation)
uv run graid generate
# Or using uv run (equivalent, but not necessary after uv sync)
uv run graid generate-dataset
```

**Non-Interactive Mode:**
**Key Features:**
- **🎯 Object Filtering**: Smart allowable sets for focused object detection
- **🔬 Multi-Model Ensemble**: Weighted Boxes Fusion (WBF) for improved accuracy
- **⚙️ Flexible Configuration**: JSON configs for reproducible experiments
- **🌐 HuggingFace Hub Integration**: Direct upload to share datasets
- **🖼️ PIL Image Support**: Ready for modern vision-language models
- **📊 Rich Metadata**: Comprehensive dataset documentation

**Quick Examples:**
```bash
# Generate ground truth database
uv run graid generate --dataset bdd --split val --interactive false
# Generate with specific object types (autonomous driving focus)
uv run graid generate-dataset --allowable-set "person,car,truck,bicycle,traffic light"

# Multi-model ensemble for enhanced accuracy
uv run graid generate-dataset --config examples/wbf_ensemble.json

# Upload directly to HuggingFace Hub
uv run graid generate-dataset --upload-to-hub --hub-repo-id "your-org/dataset-name"

# Use pre-configured model
uv run graid generate --dataset nuimage --split train --backend ultralytics --model yolov8x --conf 0.3 --interactive false
# List all valid COCO objects
uv run graid generate-dataset --list-objects
```

**Available Commands:**
```bash
uv run graid --help # Show help
uv run graid list-models # List available models
uv run graid info # Show project information
### 🎛️ Configuration-Driven Workflows

**Create reusable configurations for systematic experiments:**

**Basic Configuration:**
```json
{
"dataset_name": "bdd",
"split": "val",
"models": [
{
"backend": "detectron",
"model_name": "faster_rcnn_R_50_FPN_3x",
"confidence_threshold": 0.7
},
{
"backend": "mmdetection",
"model_name": "co_detr",
"confidence_threshold": 0.6
}
],
"use_wbf": true,
"wbf_config": {
"iou_threshold": 0.6,
"model_weights": [1.0, 1.2]
},
"allowable_set": ["person", "car", "truck", "bus", "motorcycle", "bicycle"],
"confidence_threshold": 0.5,
"batch_size": 4
}
```

**Advanced Configuration with Custom Questions and Transforms:**
```json
{
"dataset_name": "bdd",
"split": "val",
"models": [
{
"backend": "ultralytics",
"model_name": "yolov8x.pt",
"confidence_threshold": 0.6
}
],
"use_wbf": false,
"allowable_set": ["person", "car", "bicycle", "motorcycle", "traffic light"],
"confidence_threshold": 0.5,
"batch_size": 2,

"questions": [
{
"name": "HowMany",
"params": {}
},
{
"name": "Quadrants",
"params": {
"N": 3,
"M": 3
}
},
{
"name": "WidthVsHeight",
"params": {
"threshold": 0.4
}
},
{
"name": "LargestAppearance",
"params": {
"threshold": 0.35
}
},
{
"name": "MostClusteredObjects",
"params": {
"threshold": 80
}
}
],

"transforms": {
"type": "yolo_bdd",
"new_shape": [640, 640]
},

"save_path": "./datasets/custom_bdd_vqa",
"upload_to_hub": true,
"hub_repo_id": "your-org/bdd-reasoning-dataset",
"hub_private": false
}
```

**Custom Model Configuration:**
```json
{
"dataset_name": "custom",
"split": "train",
"models": [
{
"backend": "detectron",
"model_name": "custom_retinanet",
"custom_config": {
"config": "path/to/config.yaml",
"weights": "path/to/model.pth"
}
},
{
"backend": "ultralytics",
"model_name": "custom_yolo",
"custom_config": {
"model_path": "path/to/custom_yolo.pt"
}
}
],
"transforms": {
"type": "yolo_bdd",
"new_shape": [832, 832]
},
"questions": [
{
"name": "IsObjectCentered",
"params": {}
},
{
"name": "LeftOf",
"params": {}
},
{
"name": "RightOf",
"params": {}
}
]
}
```

### 📦 Custom Dataset Support

**Bring Your Own Data**: GRAID supports any PyTorch-compatible dataset:

```python
from graid.data.generate_dataset import generate_dataset
from torch.utils.data import Dataset

class CustomDataset(Dataset):
"""Your custom dataset implementation"""
def __getitem__(self, idx):
# Return: (image_tensor, optional_annotations, metadata)
# Annotations are only needed for mAP/mAR evaluation
# For VQA generation, only images are required
pass

# Generate HuggingFace dataset from your data
dataset = generate_dataset(
dataset_name="custom",
split="train",
models=your_models,
allowable_set=["person", "vehicle"],
save_path="./datasets/custom_vqa"
)
```

## Status
**Key Point**: Custom datasets only require images for VQA generation. Annotations are optional and only needed if you want to evaluate model performance with mAP/mAR metrics.

## 🔧 Advanced Features

### **Multi-Model Ensemble with WBF**
Combine predictions from multiple models using Weighted Boxes Fusion for enhanced detection accuracy:
- Improved precision through model consensus
- Configurable fusion parameters and model weights
- Supports mixed backends (Detectron2 + MMDetection + Ultralytics)

### **Intelligent Object Filtering**
Focus datasets on specific object categories:
- **Common presets**: Autonomous driving, indoor scenes, animals
- **Interactive selection**: Visual picker from 80 COCO categories
- **Manual specification**: Comma-separated object lists
- **Validation**: Automatic checking against COCO standard

### **Production-Ready Outputs**
Generated datasets include:
- **PIL Images**: Direct compatibility with vision-language models
- **Rich Annotations**: Bounding boxes, confidence scores, object classes
- **Structured QA Pairs**: Question templates with precise answers
- **Comprehensive Metadata**: Model info, generation parameters, statistics

## 📊 Supported Models & Datasets

### Backends

| | Ultralytics | Detectron | MMDetection |
|-----------------------|-------------|-----------|-------------|
| Object Detection | ✅ | ✅ | ✅ |
| Instance Segmentation | ✅ | ✅ | ✅ |
| | Detectron2 | MMDetection | Ultralytics |
|-----------------------|-------------|-------------|-------------|
| Object Detection | ✅ | ✅ | ✅ |
| Instance Segmentation | ✅ | ✅ | ✅ |
| WBF Ensemble | ✅ | ✅ | ✅ |

### Datasets
### Built-in Datasets

| | BDD100K | Waymo | NuImages |
|-----------------------|-------------|-----------|-------------|
| Object Detection | ✅ | ✅ | ✅ |
| Instance Segmentation | ✅ | ✅ | ✅ |
| | BDD100K | NuImages | Waymo |
|-----------------------|-------------|-------------|-------------|
| Object Detection | ✅ | ✅ | ✅ |
| Instance Segmentation | ✅ | ✅ | ✅ |
| HuggingFace Export | ✅ | ✅ | ✅ |

## 🧠 Supported Models
### Example Models

**Detectron2:** `retinanet_R_101_FPN_3x`, `faster_rcnn_R_50_FPN_3x`
**MMDetection:** `co_detr`, `dino`
**Detectron2:** `faster_rcnn_R_50_FPN_3x`, `retinanet_R_101_FPN_3x`
**MMDetection:** `co_detr`, `dino`, `rtmdet`
**Ultralytics:** `yolov8x`, `yolov10x`, `yolo11x`, `rtdetr-x`

## ✨ GRAID Features

- **Interactive CLI**: User-friendly prompts for dataset and model selection
- **Multiple Backends**: Support for Detectron2, MMDetection, and Ultralytics
- **Custom Models**: Bring your own model configurations
- **Ground Truth Support**: Generate databases using original annotations
- **Batch Processing**: Support for non-interactive scripted usage
## 🎯 Research Applications

## 📁 Project Structure
This framework enables systematic evaluation of:
- **Vision-Language Models**: Generate targeted VQA benchmarks
- **Object Detection Methods**: Compare model performance on specific object types
- **Reasoning Capabilities**: Create challenging spatial and counting questions
- **Domain Adaptation**: Generate domain-specific evaluation sets
- **Ensemble Methods**: Evaluate fusion strategies across detection models

The project has been renamed from `scenic-reasoning` to **GRAID**. Key components:
## 📈 Quality Assurance

- **Package**: `scenic_reasoning/src/graid/` (new GRAID package)
- **CLI**: `scenic_reasoning/src/scenic_reasoning/graid_cli.py`
- **Original**: `scenic_reasoning/src/scenic_reasoning/` (backward compatibility)
Generated datasets undergo comprehensive validation:
- **Model Verification**: Automatic testing of model loading and inference
- **Annotation Quality**: Confidence score filtering and duplicate removal
- **Metadata Integrity**: Complete provenance tracking for reproducibility
- **Format Compliance**: COCO-standard annotations with HuggingFace compatibility

## 📊 Databases
## 🔍 Legacy Support

Generated databases are saved in:
**Interactive CLI**: User-friendly prompts for dataset and model selection
```bash
uv run graid generate
```
data/databases_ablations/{dataset}_{split}_{conf}_{backend}_{model}.sqlite

**Available Commands:**
```bash
uv run graid --help # Show help
uv run graid list-models # List available models
uv run graid list-questions # List available question types with parameters
uv run graid info # Show project information
uv run graid generate-dataset # Modern HuggingFace generation

# Interactive features
uv run graid generate-dataset --interactive-questions # Select questions interactively
uv run graid generate-dataset --list-questions # Show available questions
```

✅ **Ready to use!**
## ✨ Key Advantages

- **🚀 Modern Format**: HuggingFace datasets for seamless ML integration
- **🎯 Targeted Generation**: Focus on relevant object categories
- **🔬 Ensemble Support**: Multi-model fusion for enhanced accuracy
- **⚙️ Reproducible**: Configuration-driven experiments
- **🌐 Shareable**: Direct HuggingFace Hub integration
- **📊 Comprehensive**: Rich metadata and quality metrics
- **🔧 Extensible**: Support for custom datasets and models

**✅ Ready for production VQA research and applications!**
Loading
Loading