-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Hi, thank you for your great work. As the FSF model on AV2 isn't released, I'm trying to train on AV2 myself. I found that the pretrained FSD model on nuScenes and AV2 are very different in terms of size (900+MB on nuScenes and 100+MB on AV2). Looking into the configs I found that the Sparse UNet backbone setting are very different, with nuScenes's backbone channel number are generally 2 times larger than on AV2. Is there any insight behind the design or just tuned by experience? Thank you very much for any feedback.
config of AV2:
backbone=dict(
type='SimpleSparseUNet',
in_channels=64,
sparse_shape=[32, 2048, 2048],
order=('conv', 'norm', 'act'),
norm_cfg=dict(type='naiveSyncBN1d', eps=1e-3, momentum=0.01),
base_channels=64,
output_channels=128,
encoder_channels=((64, ), (64, 64, 64), (64, 64, 64), (128, 128, 128)),
encoder_paddings=((1, ), (1, 1, 1), (1, 1, 1), ((0, 1, 1), 1, 1)),
decoder_channels=((128, 128, 64), (64, 64, 64), (64, 64, 64), (64, 64, 64)),
decoder_paddings=((1, 0), (1, 0), (0, 0), (0, 1)),
),config of nuScenes:
backbone=dict(
type='SimpleSparseUNet',
in_channels=64,
sparse_shape=sparse_shape,
order=('conv', 'norm', 'act'),
norm_cfg=dict(type='naiveSyncBN1d', eps=1e-3, momentum=0.01),
base_channels=64,
output_channels=128,
encoder_channels=((128, ), (128, 128, 128), (128, 128, 128), (256, 256, 256), (512, 512, 512)),
encoder_paddings=((1, ), (1, 1, 1), (1, 1, 1), ((0, 1, 1), 1, 1), (1, 1, 1)),
decoder_channels=((512, 512, 256), (256, 256, 128), (128, 128, 128), (128, 128, 128), (128, 128, 128)),
decoder_paddings=((1, 1), (1, 0), (1, 0), (0, 0), (0, 1)),
),Metadata
Metadata
Assignees
Labels
No labels