Zero-replacement, data transformation, implementation of compositional feature dropout

I am trying to reproduce the results from your paper and have a few questions about the code and methodology:

1. Zero replacement: In the "Compositional Feature Dropout" section of the paper, you mention that you add a small positive pseudo-count (the inverse of the library size) to each component before renormalising. Where in the repository is this implemented, and for a given dataset is the same pseudo-count used in the other two augmentation strategies?
2. From the available code, it looks like all models are trained on the data transformed into proportions, without any further data transformation (e.g. CLR, ILR, etc.). Is that correct?
3. To my understanding in your implementation of compositional feature dropout, you set randomly selected entries of the training examples to one rather than to zero as described in the paper (see the definition of augment_X in train_and_evaluate.py). Why is that?
4. Using the paper’s nomenclature, for task 7 (colorectal cancer data), do you apply any preprocessing beyond excluding features with zero standard deviation? Specifically, in task 7 do you use all 980 taxa as-is? Also, since task 7 includes paired samples from the same patients (two samples per patient), how do you account for within-subject correlation?

Thanks in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero-replacement, data transformation, implementation of compositional feature dropout #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Zero-replacement, data transformation, implementation of compositional feature dropout #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions