Skip to content

Adding colab notebook demo and an interface#3

Open
idhamari wants to merge 18 commits intocwmok:masterfrom
idhamari:master
Open

Adding colab notebook demo and an interface#3
idhamari wants to merge 18 commits intocwmok:masterfrom
idhamari:master

Conversation

@idhamari
Copy link
Copy Markdown

The updated code includes:

  • adding colab notebook demo for new users to run the code without problems.
  • adding an interface to run either training or testing with specific parameters.
  • the notebook downloads the code and the dataset from the forked repository and run the training
  • fix some minor bugs.

Future to do:

  • improve the code
  • add testing example
  • add more explaination to the code and the repository as a tutorial
  • use tensorboard to draw the training and the testing losses

@idhamari
Copy link
Copy Markdown
Author

Dear @cwmok thanks for sharing. I noticed when running the notebook that the training loss is a large negative value e.g.

    Training lvl1...
     step "869" -> training loss "-10781668352.0000" - sim_NCC "-10781668352.000000" - Jdet "136.1196746826" -smo "72.8813"one epoch pass
     step "1396" -> training loss "-118240641024.0000" - sim_NCC "-118240641024.000000" - Jdet "133.8394470215" -smo "72.2927"

Is this normal?

@cwmok
Copy link
Copy Markdown
Owner

cwmok commented Apr 13, 2021

Hi idhamari, This is not normal. I think the problem could be related to your dataset/preprocessing pipeline. Did you normalize the intensity of your data, i.e.: [0, 1]? Please also check out the related issue at #2 .

@idhamari
Copy link
Copy Markdown
Author

Hi @cwmok thanks for your quick reply. I am using your code without modification and one of the original challenge public datasets. I think there is already normalization process in the generator class

@cwmok
Copy link
Copy Markdown
Owner

cwmok commented Apr 13, 2021

Thanks for spotting it out. The zero-mean 1-std normalization is not working with the normalized cross-correlation function (the background intensity of the image should be 0). During my experiment, I preprocess the training data within [0,1] and turn the "norm=False". I will update the code very soon. I apologize for the confusion.

@idhamari
Copy link
Copy Markdown
Author

@cwmok thanks again for the clarification. I modified the code as you suggested and also added a normalisation code. However, I still get a negative loss value (it is smaller now). I also get zero Jacobian determinant. e.g.

      Training lvl1...
      step "113" -> training loss "-0.1964" - sim_NCC "-0.198268" - Jdet "0.0000000000" -smo "0.0019"

Maybe there is still there is something missing?

@cwmok
Copy link
Copy Markdown
Owner

cwmok commented Apr 13, 2021

Hi @idhamari, I found that the "normalization code" is not executed as I set Dataset_epoch(names, norm=False) in the training_scripts (Train_LapIRN_disp.py and Train_LapIRN_diff.py) and testing script as well. Therefore, you might either set norm=True in the training script and implement the [0, 1] normalization, or simply normalize the dataset first before you run this code. Remember your setting should be consistent in both the training script and testing script to avoid domain shift issue.

@cwmok
Copy link
Copy Markdown
Owner

cwmok commented Apr 13, 2021

Now the loss function looks good. In our paper, we use negative NCC and therefore the value of sim_NCC range from -1 to 0.

@idhamari
Copy link
Copy Markdown
Author

idhamari commented Apr 13, 2021

@cwmok Thanks for your comment.

I found that the "normalization code" is not executed as I set Dataset_epoch(names, norm=False) in the training_scripts (Train_LapIRN_disp.py and Train_LapIRN_diff.py) and testing script as well.

Yes, it is clear now.

Therefore, you might either set norm=True in the training script and implement the [0, 1] normalization, or simply normalize the dataset first before you run this code. Remember your setting should be consistent in both the training script and testing script to avoid domain shift issue.

I am confused, isn't this conflict with:

The zero-mean 1-std normalization is not working with the normalized cross-correlation function (the background intensity of the image should be 0). During my experiment, I preprocess the training data within [0,1] and turn the "norm=False"

more questions :)

Now the loss function looks good. In our paper, we use negative NCC and therefore the value of sim_NCC range from -1 to 0.

so it is normal that Jdet = 0 ?

I also noticed that the training loss is going up down, is this also expected?

  Training lvl1...
  step "869" -> training loss "-0.2139" - sim_NCC "-0.214692" - Jdet "0.0000000000" -smo "0.0008"one epoch pass
  step "1739" -> training loss "-0.2048" - sim_NCC "-0.208969" - Jdet "0.0000000000" -smo "0.0041"one epoch pass
  step "2609" -> training loss "-0.0881" - sim_NCC "-0.197564" - Jdet "0.0094617307" -smo "0.1095"one epoch pass
  step "3000" -> training loss "-0.1529" - sim_NCC "-0.217951" - Jdet "0.0024806887" -smo "0.0650"one epoch pass

@cwmok
Copy link
Copy Markdown
Owner

cwmok commented Apr 13, 2021

I have fixed the normalization method. Here is the sample training loss for your reference:

Training lvl1...
step "20" -> training loss "-0.2638" - sim_NCC "-0.265795" - Jdet "0.0000000000" -smo "0.0020"

@cwmok
Copy link
Copy Markdown
Owner

cwmok commented Apr 13, 2021

I am confused, isn't this conflict with:

Just remember do not use zero-mean 1-std normalization with NCC function. Please use min-max normalization instead (i.e.: [0, 1].

so it is normal that Jdet = 0 ?

Yes, it is normal at the lower level. At the higher level, the resulting deformation field will be more complex and the Jdet will not = 0 for method parameterized with displacement field. For the diffeomorphic version, Jdet will be closed to zero across all the level.

I also noticed that the training loss is going up down, is this also expected?

Yes, it is expected. Similar (less misalignment) input pair seems to yield a better loss value because it is easier to register. On the contrary, images pair with large appearance difference tends to yield higher loss value. To visualize the training loss, visualizing the loss value for each Epoch will help.

@idhamari
Copy link
Copy Markdown
Author

@cwmok thanks for your patience and support.

Just remember do not use zero-mean 1-std normalization with NCC function. Please use min-max normalization instead (i.e.: [0, 1].

This is what I am doing. As I understood, everything now is as expected. FOrgive me if I have many questions, since your paper is a good paper, my goal is to write a tutorial about it providing a colab notebook with some working cases.

To visualize the training loss, visualizing the loss value for each Epoach will help.

As I understood, since the training loss is going up and down, we can not judge the performance until we test the model, right? More questions :)

  1. Did you use augmentation? if yes, how many total images did you use in your training?
  2. Why there is no testing loss provided?

@cwmok
Copy link
Copy Markdown
Owner

cwmok commented Apr 13, 2021

@cwmok thanks for your patience and support. This is what I am doing. As I understood, everything now is as expected. FOrgive me if I have many questions, since your paper is a good paper, my goal is to write a tutorial about it providing a colab notebook with some working cases.

No problem. And thanks a lot for your interest in our work. :)

As I understood, since the training loss is going up and down, we can not judge the performance until we test the model, right? More questions :)

Yes, like other deep learning applications, you need to implement your own validation code using validation data during training. In my paper, we use Dice score of the segmentation map to search for the best model during training (code is not provided as it is too customized).

Did you use augmentation? if yes, how many total images did you use in your training?

In the paper, we didn't use any augmentation, which provides a fair comparison to other state-of-the-art methods. However, in the Learn2Reg challenge, we did use augmentation (i.e. affine augmentation and random horizontal flipping). Empirically, 3-5% gains in Dice score when augmentation is applied in a small dataset.

@idhamari
Copy link
Copy Markdown
Author

@cwmok

These are the current training results, I noticed the following:

  • the similarity measure is not stable.

  • the training loss is inf after some iterations in level 3 e.g.:

        step "829" -> training loss "1475.7574" - sim_NCC "-0.215631" - Jdet "28.3991413116" -smo "1475.9730"
        step "830" -> training loss "1476.0317" - sim_NCC "-0.288371" - Jdet "28.3438167572" -smo "1476.3201"
        step "831" -> training loss "1476.0468" - sim_NCC "-0.198550" - Jdet "28.3512592316" -smo "1476.2454"
        step "832" -> training loss "1475.9606" - sim_NCC "-0.195672" - Jdet "28.3945274353" -smo "1476.1562"
        step "833" -> training loss "-inf" - sim_NCC "-inf" - Jdet "28.4055004120" -smo "1476.1052"
        step "834" -> training loss "nan" - sim_NCC "0.000000" - Jdet "nan" -smo "nan"
        step "835" -> training loss "nan" - sim_NCC "-0.121906" - Jdet "nan" -smo "nan"
        step "836" -> training loss "nan" - sim_NCC "-0.145650" - Jdet "nan" -smo "nan"
    

Since you already worked with the same dataset, could you please give some feedback to re-produce your result and explain why I am getting these results?

level 1 training

logLvl1.txt

logLvl1_lossAll

logLvl1_lossJdet

logLvl1_lossSim

logLvl1_lossSmo

logLvl1_lossTrn

level 2 training

logLvl2.txt

logLvl2_lossAll

logLvl2_lossJdet

logLvl2_lossSim

logLvl2_lossSmo

logLvl2_lossTrn

level 3 training

logLvl3.txt

logLvl3_lossAll

logLvl3_lossJdet

logLvl3_lossSim

logLvl3_lossSmo

logLvl3_lossTrn

@cwmok
Copy link
Copy Markdown
Owner

cwmok commented Apr 22, 2021

Hi @idhamari,

These are the current training results, I noticed the following:

the similarity measure is not stable.

the training loss is inf after some iterations in level 3 e.g.:

step "829" -> training loss "1475.7574" - sim_NCC "-0.215631" - Jdet "28.3991413116" -smo "1475.9730"
step "830" -> training loss "1476.0317" - sim_NCC "-0.288371" - Jdet "28.3438167572" -smo "1476.3201"
step "831" -> training loss "1476.0468" - sim_NCC "-0.198550" - Jdet "28.3512592316" -smo "1476.2454"
step "832" -> training loss "1475.9606" - sim_NCC "-0.195672" - Jdet "28.3945274353" -smo "1476.1562"
step "833" -> training loss "-inf" - sim_NCC "-inf" - Jdet "28.4055004120" -smo "1476.1052"
step "834" -> training loss "nan" - sim_NCC "0.000000" - Jdet "nan" -smo "nan"
step "835" -> training loss "nan" - sim_NCC "-0.121906" - Jdet "nan" -smo "nan"
step "836" -> training loss "nan" - sim_NCC "-0.145650" - Jdet "nan" -smo "nan"

Since you already worked with the same dataset, could you please give some feedback to re-produce your result and explain why I am getting these results?

Debugging process

Observation:

  1. -smo and -Jdet are extremely large -> the magnitude of the deformation field is extremely large, i.e., out of boundary.
  2. The -sim_NCC is normal even if the magnitude of the deformation field is extremely large -> the reference image and the wrapped image has high similarity, even though the warped image is filled with background intensity.
  3. You are using the L2R_Task3_AbdominalCT dataset without any preprocessing -> the background of the image is not necessarily equal to 0 after normalization.

Conclusion:

  1. Since the intensity value of the image's background is not zero, it would be much easier for the deep learning model to achieve high similarity by warping the image with background pixels.
  2. Please apply the windowing with lower and upper bound set to -500 and 800 as described in our workshop paper. Making sure that the intensity value of the image's background is zero and there is enough contrast between different anatomical structures.
  3. To achieve the results in the Learn2Reg challenge, you have to add the data augmentation code and anatomical label supervision during the training as described in our workshop paper.
  4. The hyperparameters we used for this dataset are: --start_channel=28, --smo=2, --antifold=0, --label_supervision=2, --iteration_lvl3=73000
  5. You may also train your method in an unsupervised manner, but the registration performance will be degraded, comparing to the semi-supervised version.

I hope the debugging process and the conclusion help.

Regards,
Tony

@idhamari
Copy link
Copy Markdown
Author

@cwmok

Thanks a lot for your feedback, I really appreciate it.

This step is not clear: apply the windowing with lower and upper bound set to -500 and 800"

Shall I map normalize the image to [0,1] then map the values to [-500,800] range? this will conflict with "Making sure that the intensity value of the image's background is zero". I checked the paper in your link, it does not explain this step or provide a reference to explain it.

@cwmok
Copy link
Copy Markdown
Owner

cwmok commented Apr 22, 2021

@idhamari

Shall I map normalize the image to [0,1] then map the values to [-500,800] range? this will conflict with "Making sure that the intensity value of the image's background is zero". I checked the paper in your link, it does not explain this step or provide a reference to explain it.

No, windowing also known as grey-level mapping, contrast stretching, histogram modification or contrast enhancement. It is common in CT preprocessing pipeline as to locate the target anatomical structure for subsequent analysis. For more detail, you may refer to this website.

To achieve this, one way is to apply the numpy clip function to the raw image and set the min and max to your desired HU, i.e.: [-500, 800] in this case, so that all the anatomical structures are clear for observation. At the last step, you have to normalize the preprocessed image to [0, 1] such that the background's intensity for each image is zero.

@TVayne
Copy link
Copy Markdown

TVayne commented Oct 17, 2021

@cwmok
I am sorry.I read the dialogue above. But why does this error occur when I simply train with oasis dataset?
Screenshot from 2021-10-17 13-34-09

Screenshot from 2021-10-17 13-37-49

Should I change anything?

@cwmok
Copy link
Copy Markdown
Owner

cwmok commented Oct 17, 2021

@tzp123456
I guess you forgot to change the imgshape as mentioned in the README.MD file. In our code, we use the cropped OASIS dataset by default.

@TVayne
Copy link
Copy Markdown

TVayne commented Oct 19, 2021

Hi @cwmok
Thanks a lot for your feedback.I solved the problem successfully.But I still have some questions.
I croped the images as the same size as yours by numpy. But my cuda(11G) was still out of memory, so how much memory do you have on your GPU?
And in your experiments the moving images and the fixed images during the course of training are selected in order,right? So would it make a difference if I picked it randomly from the dataset?
Looking forward to your reply.

@cwmok
Copy link
Copy Markdown
Owner

cwmok commented Oct 19, 2021

@tzp123456

I croped the images as the same size as yours by numpy. But my cuda(11G) was still out of memory, so how much memory do you have on your GPU?

The code was tested with GTX1080ti GPU and RTX titan GPU, which has 11GB and 24GB GPU memory, respectively. The default setting will consume around 10.8GB GPU memory. Therefore, if your GPU is used for display/OS/chrome, you will face the same error. In this case, try to lower the number of feature maps will help, e.g. set "--start_channel" to 6 instead of 7, it wouldn't make a big difference in terms of registration using OASIS dataset.

And in your experiments the moving images and the fixed images during the course of training are selected in order,right? So would it make a difference if I picked it randomly from the dataset?

No, the moving images and fixed images are randomly selected during training, see training_generator = Data.DataLoader(Dataset_epoch(names, norm=False), batch_size=1, shuffle=True, num_workers=4).

Please open a new issue on my Github repository if you have further questions. Or if you have questions regarding @idhamari's tutorial, please contact him directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants