Skip to content

Results change during testing / evaluation #21

@yt2639

Description

@yt2639

Hii authors @quantaji , thanks for providing the code. I used code from https://huggingface.co/labelmaker/PTv3-ARKit-LabelMaker/tree/main/scannet200/insseg-pointgroup-v1m1-pt-v3m1-ppt-ft to finetune PTv3. But I found the results will change during testing / evaluation.

Here is my config:

from PTv3.code.pointcept.models.point_prompt_training.point_prompt_training_v1m2_decoupled import PointPromptTraining
from PTv3.code.pointcept.models.point_transformer_v3.point_transformer_v3m1_base import PointTransformerV3

backbone=dict(
                        in_channels=15, # originally this is 6, but I need to use a different type of input so input dim is now 15
                        order=('z', 'z-trans', 'hilbert', 'hilbert-trans'),
                        stride=(2, 2, 2, 2),
                        enc_depths=(3, 3, 3, 6, 3),
                        enc_channels=(48, 96, 192, 384, 512),
                        enc_num_head=(3, 6, 12, 24, 32),
                        enc_patch_size=(1024, 1024, 1024, 1024, 1024),
                        dec_depths=(3, 3, 3, 3),
                        dec_channels=(64, 96, 192, 384),
                        dec_num_head=(4, 6, 12, 24),
                        dec_patch_size=(1024, 1024, 1024, 1024),
                        mlp_ratio=4,
                        qkv_bias=True,
                        qk_scale=None,
                        attn_drop=0.0,
                        proj_drop=0.0,
                        drop_path=0.3,
                        shuffle_orders=True,
                        pre_norm=True,
                        enable_rpe=False,
                        enable_flash=True,
                        upcast_attention=False,
                        upcast_softmax=False,
                        cls_mode=False,
                        pdnorm_bn=True,
                        pdnorm_ln=True,
                        pdnorm_decouple=True,
                        pdnorm_adaptive=False,
                        pdnorm_affine=True,
                        pdnorm_conditions=('ScanNet', 'ScanNet200', 'ScanNet++',
                                        'Structured3D', 'ALC'))

backbone = PointTransformerV3(**backbone)

model = dict(
                    backbone=backbone,
                    context_channels=256,
                    conditions=('ScanNet', 'ScanNet200', 'ScanNet++', 'Structured3D', 'ALC'),
                    num_classes=(20, 200, 100, 25, 185),
                    backbone_mode=True, # I only use this to extract features
)
        
self.segmentor = PointPromptTraining(**model)

And obtain the results by

ptv3_output = self.segmentor(data_dict) # [total_num_points, 64]

I changed PointPromptTraining to be a feature extractor only (which corresponds to setting backbone_mode=True). I made sure all modules self.training is False and all parameters requires_grad is False. Also, I made sure data_dict is not changed. Then, when I run the above code several times, it gives totally different results which is definitely not due to floating precision sort of stuff.

I am wondering what I was missing here? How do we obtain a deterministic result during testing / evaluation?

Many thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions