Support private datasets via `configs/` directory

## Summary

This issue proposes enabling AWML to consume private datasets without modifying the upstream repo.
Concretely, we introduce a top-level `configs/` directory and make the config loader discover additional config roots so that each company can maintain its own private dataset/model configs externally.

## Background and Goal

Today, dataset configs live `under autoware_ml/configs/...` and project configs under `projects/...` I think this has two drawbacks for usability/OSS adoption and extensibility.

- Current layout

```
- AWML/
  - autoware_ml/
    - ...
    - configs/
      - t4dataset/
        - db_jpntaxi_v1.yaml
        - ...
      - detection3d/
        - t4dataset/
          - base.py
          - pretrain.py
          - ...
      - ...
  - projects/
    - CenterPoint/
      - configs/
        - t4dataset/
          - CenterPoint/
            - base_model.py
            - j6gen2_model.py
            - ...
```

- 1. Usability

We believe that enabling each company to train with its own private dataset is important as an OSS strategy.
At present, dataset configs are managed only inside AWML, which limits extensibility.
By shifting to an architecture that is easier for various companies to use, we aim to make it easier to adopt AWML in products.

- 2. Low extensibility of configs under `project/` that load the upstream/base configs

When we train now, we use a config under project, for example:
`python3 tools/detection3d/train.py projects/CenterPoint/configs/t4dataset/base_model.py`
This works because the loading flow is that `projects/CenterPoint/configs/t4dataset/CenterPoint/base_model.py` loads `/autoware_ml/configs/detection3d/dataset/t4dataset/base.py`.
Due to this loading scheme, whenever we create a new private dataset, we also have to recreate the config on the CenterPoint side, which is not friendly for OSS users.

- 3. Not adhering to the separation-of-concerns principle

The DevOps engineers who want to add datasets, update configs, and retrain, and the engineers who develop the core libraries have entirely different domains of concern, yet both reference `autoware_ml/`.
This is not ideal from a domain-driven development perspective and can become a factor that makes maintenance more difficult.

## Proposal

I design the framework and introduce `/configs` and decouple model configs from dataset configs.

```md
- AWML/
  - autoware_ml/
    - ...
  - projects/
    - CenterPoint/
      - configs/
        - centerpoint.py
        - centerpoint_short_range.py
        - centerpoint_convnextpc.py
  - configs/
    - examples/
      - datasets/
        - t4dataset/
          - example.yaml
        - detection3d/
          - t4dataset/
            - base.py
      - CenterPoint/
        - centerpoint.py
        - centerpoint_short_range.py
        - centerpoint_convnextpc.py
    - {repository_name}/
      - datasets/
        - t4dataset/
          - db_jptaxi_v1.yaml
          - ...
        - detection3d/
          - base.py
          - ...
        - detection2d/
          - ...
      - CenterPoint/
        - CenterPoint/
          - base.py
          - j6gen2.py
        - CenterPoint-ShortRange/
          - base.py
```

- Config whole design

In this proposal, I move configs from `/autoware_ml/configs` into `/configs/examples`.
In addition to AWML examples, I make `/configs/{repository_name}` for the place to maintain configs for each users include TIER IV.
`/configs/{repository_name}` is ignored by git management in `AWML` repository.


- .gitignore

Officially, AWML supports frameworks to use private datasets.
To track only examples by `AWML` but ignore private contents, I set gitignore as below.

```
configs/*
!configs/examples/**
!configs/README.md
```

- `projects/CenterPoint/configs/`

For now, `projects/CenterPoint/configs/*` contains setting for T4dataset.
From this change, it contains model-only configs as below.

```py
model = dict(
    data_preprocessor=dict(
        type="Det3DDataPreprocessor",
        voxel=True,
        voxel_layer=dict(
            max_num_points=32,
            voxel_size=voxel_size,
            point_cloud_range=point_cloud_range,
            max_voxels=(64000, 64000),
            deterministic=True,
        ),
    ),
    pts_voxel_encoder=dict(
        type="PillarFeatureNet",
        in_channels=4,
        feat_channels=[32, 32],
        with_distance=False,
        with_cluster_center=True,
        with_voxel_center=True,
        point_cloud_range=point_cloud_range,
        voxel_size=voxel_size,
        norm_cfg=dict(type="BN1d", eps=1e-3, momentum=0.01),
        legacy=False,
    ),
    pts_middle_encoder=dict(type="PointPillarsScatter", in_channels=32, output_shape=(grid_size[0], grid_size[1])),
    pts_backbone=dict(
        type="SECOND",
        in_channels=32,
        out_channels=[64, 128, 256],
        layer_nums=[3, 5, 5],
        layer_strides=[1, 2, 2],
        norm_cfg=dict(type="BN", eps=1e-3, momentum=0.01),
        conv_cfg=dict(type="Conv2d", bias=False),
    ),
    pts_neck=dict(
        type="SECONDFPN",
        in_channels=[64, 128, 256],
        out_channels=[128, 128, 128],
        upsample_strides=[1, 2, 4],
        norm_cfg=dict(type="BN", eps=0.001, momentum=0.01),
        upsample_cfg=dict(type="deconv", bias=False),
        use_conv_for_no_stride=True,
    ),
    pts_bbox_head=dict(
        type="CenterHead",
        in_channels=sum([128, 128, 128]),
        tasks=[
            dict(num_class=5, class_names=["car", "truck", "bus", "bicycle", "pedestrian"]),
        ],
        bbox_coder=dict(
            voxel_size=voxel_size,
            pc_range=point_cloud_range,
            # No filter by range
            post_center_range=[-200.0, -200.0, -10.0, 200.0, 200.0, 10.0],
            out_size_factor=out_size_factor,
        ),
        # sigmoid(-4.595) = 0.01 for initial small values
        separate_head=dict(type="CustomSeparateHead", init_bias=-4.595, final_kernel=1),
        loss_cls=dict(type="mmdet.AmpGaussianFocalLoss", reduction="none", loss_weight=1.0),
        loss_bbox=dict(type="mmdet.L1Loss", reduction="mean", loss_weight=0.25),
        norm_bbox=True,
    ),
    train_cfg=dict(
        pts=dict(
            grid_size=grid_size,
            voxel_size=voxel_size,
            point_cloud_range=point_cloud_range,
            out_size_factor=out_size_factor,
        ),
    ),
    test_cfg=dict(
        pts=dict(
            grid_size=grid_size,
            out_size_factor=out_size_factor,
            pc_range=point_cloud_range,
            voxel_size=voxel_size,
            # No filter by range
            post_center_limit_range=[-200.0, -200.0, -10.0, 200.0, 200.0, 10.0],
        ),
    ),
)
```

- `configs/example`

Private dataset configs live in `configs/example/configs/`.
As the config to use for training, we use `python3 tools/detection3d/train.py configs/example/CenterPoint/CenterPoint/base.py` instead of `python3 tools/detection3d/train.py projects/CenterPoint/configs/t4dataset/base_model.py`

- `configs/{repository_name}/detection3d/base.py`

Each user include TIER IV setup the repository for private dataset configs.

- `configs/{repository_name}/CenterPoint/CenterPoint/base.py`

Each user also setup the training configs with each model (like CenterPoint).

- `configs/t4dataset/example.yaml`

I think either of the following would work well:

1. Make public T4dataset.
2. Keep the currently published config as-is and provide it as an example (still accessible only to those who have WebAuto permissions).
3. Provide a script to convert the nuScenes dataset into the T4dataset format and use that as the example.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support private datasets via `configs/` directory #107

Summary

Background and Goal

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support private datasets via configs/ directory #107

Description

Summary

Background and Goal

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Support private datasets via `configs/` directory #107