This project focuses on predicting the g-factor of chiral nanoparticles using various machine learning models and feature encoding strategies. The repository contains the complete workflow from data preprocessing and feature encoding to model training, validation, and visualization.
Three different encoding approaches are implemented and compared:
1.Chemical Encoding (chemical_encoding/): Domain-specific feature representation for chiral nanoparticles using chemical descriptors and properties
2.One-Hot Encoding (onehot_encoding/): Categorical variable encoding for machine learning compatibility, creating binary columns for each category
3.Ordinal Encoding (ordinal_encoding/): Ordered categorical encoding preserving inherent relationships between values
Each encoding folder contains:
A processed_data.csv file with the encoded dataset
A complete analysis workflow in Jupyter notebooks (0-9)
- Create virtual environment:
python -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windows- Install dependencies:
pip install numpy pandas scikit-learn matplotlib seaborn Joblib Tqdmβββ raw_data.csv # Original dataset (raw data)
βββ chemical_encoding/ # Chemical feature encoding results and analysis
β βββ processed_data.csv # Processed data after chemical encoding
β βββ 0_Dataset_Description.ipynb
β βββ 1_Scaling_and_Transforming.ipynb
β βββ 2_size_aug_model.ipynb
β βββ 3_augmentation.ipynb
β βββ 4_g_aug_model.ipynb
β βββ 5_Corelation.ipynb
β βββ 6_Single_Output.ipynb
β βββ 7_k-fold_cross_validation.ipynb
β βββ 8_PCA.ipynb
β βββ 9_Plot.ipynb
βββ onehot_encoding/ # One-hot encoding results and analysis
β βββ processed_data.csv # Processed data after one-hot encoding
β βββ 0_Dataset_Description.ipynb
β βββ 1_Scaling_and_Transforming.ipynb
β βββ ... (same notebook structure as above)
β βββ 9_Plot.ipynb
βββ ordinal_encoding/ # Ordinal encoding results and analysis
β βββ processed_data.csv # Processed data after ordinal encoding
β βββ 0_Dataset_Description.ipynb
β βββ ... (same notebook structure as above)
β βββ 9_Plot.ipynb
βββ README.md # This file
raw_data.csv
β
βββ chemical_encoding/ β processed_data.csv β Analysis (0-9 notebooks)
βββ onehot_encoding/ β processed_data.csv β Analysis (0-9 notebooks)
βββ ordinal_encoding/ β processed_data.csv β Analysis (0-9 notebooks)