This repository implements a Conditional Variational Autoencoder (CVAE) for the inverse design of Refractory High Entropy Alloys (RHEAs), focusing on predicting and generating candidate alloys with optimized yield strength under varying testing temperatures.
- Domain: Materials Informatics, Inverse Design, Deep Generative Models
- Objective: Generate new alloy compositions with desired mechanical properties (e.g., yield strength at high temperatures).
- Approach:
- Encode alloy compositions, processing conditions, and properties into latent space.
- Train a CVAE conditioned on temperature.
- Generate candidate alloys matching target yield strength.
- Validate with explainability analysis (correlation + permutation feature importance).
RHEA-Inverse-Design-Using-VAE/
│
├── data/
│ ├── scripts/rhea_data_encoding.py # Data cleaning & encoding
│ ├── data.csv # Raw dataset
│ ├── encoded_data.csv # Cleaned + encoded dataset (model input)
│ └── processed/ # Processed splits + scalers
│
├── models/
│ └── cvae_best.pt # Trained model checkpoint
│
├── outputs/ # Generated alloys, plots, explainability results
│ ├── explainability/
│ ├── correlation_train_vs_generated.png
│ └── pfi_importance.png
│ └── tsne/
│ ├── latent_tsne_all.png
│ ├── latent_tsne_yield_strength.png
│ └── latent_tsne_temperature.png
│
├── src/
│ ├── data_prep.py # Train/val split, scaling
│ ├── cvae.py # CVAE model definition
│ ├── train_cvae.py # Training script with early stopping
│ ├── generate.py # Alloy generation (sampling + refinement)
│ ├── evaluate_cvae.py # Interactive query interface
│ ├── latent_vis.py # Latent space visualization (t-SNE)
│ └── explainability.py # Correlation + PFI explainability
│
└── README.md
git clone <https://github.com/shruti-sivakumar/RHEA-Inverse-Design-Using-VAE>
cd RHEA-Inverse-Design-Using-VAE
pip install -r requirements.txtRequirements:
- Python 3.9+
- PyTorch
- NumPy, Pandas, Scikit-learn
- Seaborn, Matplotlib
- Joblib
python data/scripts/rhea_data_encoding.pyGenerates:
encoded_data.csv(model-ready, numeric)encoded_data_human.csv(readable, for analysis)
python src/data_prep.pyOutputs scalers + splits in data/processed/.
python src/train_cvae.py- Early stopping enabled
- Saves best model to
models/cvae_best.pt
python src/evaluate_cvae.pyInteractive menu with 3 query modes:
- Highest yield strength at given temperature
- Closest to a target yield strength
- Multi-constraint (two temperature constraints)
python src/latent_vis.py- t-SNE plots of latent distribution
- Colored by yield strength & temperature
python src/explainability.pyProduces:
- Correlation heatmap: Train vs Generated alloys
- Permutation Feature Importance (PFI) plot
Correlation Heatmap (Train vs Generated):

Permutation Feature Importance (PFI):

- CVAE can successfully generate novel alloy compositions with desired yield strength.
- Strong conditioning ensures properties are tuned for different testing temperatures.
- Explainability confirms both data fidelity (correlation analysis) and model interpretability (PFI).
- Add case-study Integrated Gradients (IG) for per-alloy local explanations.
- Benchmark against other generative models (GANs, diffusion models).
- Validate generated alloys with external simulation/experimental datasets.
This project is licensed under the MIT License – see the LICENSE file for details.


