Results change during testing / evaluation

Hii authors @quantaji , thanks for providing the code. I used code from https://huggingface.co/labelmaker/PTv3-ARKit-LabelMaker/tree/main/scannet200/insseg-pointgroup-v1m1-pt-v3m1-ppt-ft to finetune PTv3. But I found the results will change during testing / evaluation.

Here is my config:
```
from PTv3.code.pointcept.models.point_prompt_training.point_prompt_training_v1m2_decoupled import PointPromptTraining
from PTv3.code.pointcept.models.point_transformer_v3.point_transformer_v3m1_base import PointTransformerV3

backbone=dict(
                        in_channels=15, # originally this is 6, but I need to use a different type of input so input dim is now 15
                        order=('z', 'z-trans', 'hilbert', 'hilbert-trans'),
                        stride=(2, 2, 2, 2),
                        enc_depths=(3, 3, 3, 6, 3),
                        enc_channels=(48, 96, 192, 384, 512),
                        enc_num_head=(3, 6, 12, 24, 32),
                        enc_patch_size=(1024, 1024, 1024, 1024, 1024),
                        dec_depths=(3, 3, 3, 3),
                        dec_channels=(64, 96, 192, 384),
                        dec_num_head=(4, 6, 12, 24),
                        dec_patch_size=(1024, 1024, 1024, 1024),
                        mlp_ratio=4,
                        qkv_bias=True,
                        qk_scale=None,
                        attn_drop=0.0,
                        proj_drop=0.0,
                        drop_path=0.3,
                        shuffle_orders=True,
                        pre_norm=True,
                        enable_rpe=False,
                        enable_flash=True,
                        upcast_attention=False,
                        upcast_softmax=False,
                        cls_mode=False,
                        pdnorm_bn=True,
                        pdnorm_ln=True,
                        pdnorm_decouple=True,
                        pdnorm_adaptive=False,
                        pdnorm_affine=True,
                        pdnorm_conditions=('ScanNet', 'ScanNet200', 'ScanNet++',
                                        'Structured3D', 'ALC'))

backbone = PointTransformerV3(**backbone)

model = dict(
                    backbone=backbone,
                    context_channels=256,
                    conditions=('ScanNet', 'ScanNet200', 'ScanNet++', 'Structured3D', 'ALC'),
                    num_classes=(20, 200, 100, 25, 185),
                    backbone_mode=True, # I only use this to extract features
)
        
self.segmentor = PointPromptTraining(**model)
```
And obtain the results by
```
ptv3_output = self.segmentor(data_dict) # [total_num_points, 64]
```
I changed PointPromptTraining to be a feature extractor only (which corresponds to setting backbone_mode=True). I made sure all modules `self.training` is False and all parameters `requires_grad` is False. Also, I made sure data_dict is not changed. Then, when I run the above code several times, it gives totally different results which is definitely not due to floating precision sort of stuff.

I am wondering what I was missing here? How do we obtain a deterministic result during testing / evaluation?

Many thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results change during testing / evaluation #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Results change during testing / evaluation #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions