Skip to content
This repository was archived by the owner on Jul 17, 2025. It is now read-only.
This repository was archived by the owner on Jul 17, 2025. It is now read-only.

Custom data + negative samples #46

@ercz16

Description

@ercz16

Hello professor Zhang,
I'm experimenting with a custom dataset using the InMemoryDataset class. My link prediction task involves labeling true edges as those going from chemicals to diseases with a negative correlation (meaning chemical treats disease). My sample graph has some 30k nodes and around 1M edges. If the edge is a negative correlation from chem to disease then it is labeled true. All other edges are labeled false. So the dataset contains a list of edges and a list of corresponding labels, plus node features list and edge features list.

My question regards how the negative sampling works in your code. It appears that negative sampling is done to produce instances for the model recognize as false. But I don't know if my dataset makes sense for this code because it already has false labels. It looks like the code here is expecting all the input edges to be true and then generating negative samples to supplement them. However, my data contains both true and false edges from my data preparation. Do I need to restructure my input dataset? For example, only include true/positive edges in the edge input list ([x, y], [a, b] ... ) and let split_edge dict separate pos/neg edges.

Or is it possible to manually let the model recognize the negative edges that I already labeled? Maybe just sort the neg/pos edges into the split_edge dict manually. I'm also trying to keep a pos/neg ratio of 1:10.

Last question is does this implementation keep node embeddings limited to the training data and not used again in test data to possibly prevent data leakage? My project leader wants to confirm this part.

Any help is appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions