-
Notifications
You must be signed in to change notification settings - Fork 2
Label ascending #4
Description
Hi, I'm trying to compare my model with your work with same data split
I am in the process of replicating the label-ascending evaluation.
However, it seems that paper description is somehow not enough to understand.
Could you clarify it?
This paragraph is from your article in method section.
"
The second approach is the label-ascending dataset split method for the experiment to figure out the robustness of our model. This approach offers consistent partitioning protocols across different datasets, reducing the need for extensive training and evaluation times while simultaneously assessing the model’s capacity to handle diverse data intricacies. With this method, data are initially organized based on the antigen–antibody taxonomy. Subsequently, mutant labels for each complex are arranged in ascending order. By adhering to predefined ratios, we select the subset containing the highest labels within the mutant group for each complex as our test set. The remaining subset, comprising a smaller proportion of data, is designated as the training set. After this division, our model and baselines are trained and evaluated using these distinct datasets. This strategy ensures a comprehensive evaluation of our model’s performance across different data distributions and complexities.
"
Here are the things that I understood and raises questions accordingly.
[By adhering to predefined ratios, we select the subset containing the highest labels within the mutant group for each complex as our test set.]
- given *order.csv file, I get some portion of highest label in each PDB set as test set
is the portion 20%? I somehow saw the ratio 20% in the github
but I think function data.load.split is not present in this github repository
[The remaining subset, comprising a smaller proportion of data, is designated as the training set.]
- remaining becomes train set, but why the training set is the smaller proportion of it compared to the test set?
Thank you in advance for your reply