-
data/
- raw/
- README.md
- HIV_train_oversampled.csv
- test.csv
- raw/
-
GNN/
- data_preprocessing.py
- hyper_parameters.py
- main.py
- model.py
- test.py
- train.py
- utils.py
-
CNN/
- featurizer.py
- graph_sage_embedding.py
- model.py
- train.py
-
SVM/
- train.py
-
data/
- raw/ Contains raw molecular data stored in CSV format.
HIV_train_oversampled.csv: Dataset with oversampled instances to address class imbalance.test.csv: Dataset for testing and validation purposes.
- raw/ Contains raw molecular data stored in CSV format.
-
GNN/
data_preprocessing.py: Implements a graph converter that transforms molecular strings into graph-structured data using PyTorch Geometric, storing the processed data indata/processed.hyper_parameters.py: Stores hyperparameters for the GNN model for easy tuning.main.py: Main script for training and evaluating the GNN model.model.py: Defines the architecture and implementation of the GNN model.test.py: Provides testing functionality for evaluating the model performance.train.py: Implements training functionality for the GNN model.utils.py: Contains utility functions for evaluating the model's performance, such as calculating confusion metrics and determining the total number of model parameters.
-
CNN/
featurizer.py: Provides functions for converting graph objects into embeddings using GraphSAGE.graph_sage_embedding.py: Defines the architecture of GraphSAGE.model.py: Defines the architecture of a 1-dimensional CNN.train.py: Contains the main logic for training the CNN model.
-
SVM/
train.py: Contains the main logic for fitting the SVM model.
-
requirements.txt: Specifies the packages and their versions used in this project.
-
Optional: Setting up a virtual environment:
python -m venv .venv # Activate the virtual environment .venv\Scripts\activate
-
Installing required packages:
pip install -r requirements.txt
-
Navigate to the GNN directory:
cd GNN -
Run the main script for training the GNN model:
python main.py
Note: For the first run, it may take some time to convert the molecular string into a graph.
-
Navigate to the CNN directory:
cd CNN -
Run the training script for the CNN model:
python train.py
Note: Encoding the graph using GraphSAGE before training the CNN model may take some time.
-
Navigate to the SVM directory:
cd SVM -
Run the ftiiting script for the SVM model:
python train.py
Note: Encoding the graph using GraphSAGE before fitting the SVM model may take some time.