This repository contains code, data, and results for generating security checkpoint data using a Tabular Variational Autoencoder (TVAE). It includes data preprocessing and visualization, model training, synthesization and downstream tasks analysis.
-
data_processing.ipynb
Notebook for initial data cleaning, feature engineering, and prepping downstream datasets. -
descriptives.ipynb
Exploratory data analysis and descriptive statistics of the security checkpoint data. -
tvae_train.ipynb
Training workflow for the Tabular Variational Autoencoder model on the training data. -
downstream_tasks.ipynb
Implements downstream analyses (regression) using real and synthetic data. -
security_checkpoint_data.xlsx
Raw security checkpoint data (Excel) with passenger and screening station metadata. -
train_data_tvae.csv
Preprocessed training dataset used as input to the TVAE. -
processed_data.csv
Final processed dataset after feature engineering, joins, and cleaning. -
tvae_only_project/
Directory containing additional scripts or submodules specific to the TVAE implementation. -
tvae_synthesizer.pkl
Pickled synthesizer object used to generate synthetic samples from the TVAE latent space. -
requirements.txt
List of Python dependencies and their versions required to run the notebooks and scripts.
Prerequisites
- Python 3.12 or higher
-
Clone the repository
git clone https://github.com/SynthAIr/passengerflow.git cd passengerflow -
Create and activate a Python environment
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install requirements
pip install -r requirements.txt
-
Data Preprocessing
Open and rundata_processing.ipynbto load raw data, perform cleaning, feature engineering, and exportprocessed_data.csv. -
Descriptive Analysis
Usedescriptives.ipynbfor exploratory analysis of processed data, including plots and summary tables. -
Model Training
Executetvae_train.ipynbto train the TVAE ontrain_data_tvae.csv. The trained model will be saved astrained_tvae.pkl. -
Synthetic Data Generation
Loadtvae_synthesizer.pklindownstream_tasks.ipynbor custom scripts to generate synthetic samples. -
Downstream Evaluation
Rundownstream_tasks.ipynbto evaluate classification or regression tasks using both real and synthetic data.
- TVAE implementation adapted from sdv-dev/CTGAN, licensed under MIT.
This project is licensed under the MIT License. See the LICENSE file for details.