A complete synthetic data generation solution using Generative Adversarial Networks (GANs) to create privacy-safe synthetic app usage data. Built with Python, TensorFlow, and scikit-learn, perfect for data augmentation and safe model training.
-
π€ Upload & Manage Data
Upload your app usage dataset in CSV format using Google Colab file uploader. -
π§Ή Data Preprocessing
Drop unneeded columns, normalize numeric data using MinMaxScaler. -
βοΈ Generator & Discriminator Models
Fully configured Keras-based Generator and Discriminator architectures for synthetic data generation. -
π Adversarial Training
Train GAN in an adversarial setup for realistic synthetic data generation. -
πΎ Model Persistence
Save generator model (generator.h5), scaler (scaler.pkl), and column mapping (columns.json) for reuse. -
π Generate Synthetic Samples
Generate and export new synthetic records after training.
# Clone this repo
git clone https://github.com/your-username/gan-synthetic-data-generator.git
cd gan-synthetic-data-generator
# Open in Google Colab
# Upload your 'screentime_analysis.csv'
# Install required dependencies
pip install -r requirements.txt
# (Optional) Install huggingface tools
pip install huggingface_hub
apt-get install git-lfs -y
git lfs install
# Run the notebook in Google Colab
# Step by step run cells to preprocess data, build models, train GAN, and generate synthetic data
# After training:
# generator.h5, scaler.pkl, columns.json will be saved automatically
The dataset: screentime_analysis.csv Example columns: β¦ Date: Date (dropped during preprocessing)
β¦ App: App name (dropped during preprocessing)
β¦ Usage: Time spent using the app
β¦ Notifications: Number of notifications received
β¦ Times Opened: Number of times the app was opened
Place your CSV inside data/ (e.g., data/ad_users.csv) or upload via the web UI.
β¦ App Usage Time
β¦ Notifications Received
β¦ App Open Count
β¦ Data Normalization (MinMaxScaler)
β¦ Generator & Discriminator Deep Neural Networks (DNNs)
β¦ Adversarial Training Loop (GAN)
β¦ Synthetic Data Generation
Example synthetic data output (after training):
[[482.3, 18.5, 12.1], [397.8, 10.3, 7.6], [510.2, 15.8, 11.0]]
Generated to mimic real-world data distribution while ensuring privacy.
β¦ generator.h5: Trained Generator model
β¦ scaler.pkl: Saved MinMaxScaler for consistent normalization
β¦ columns.json: Original column names used in the dataset
β¦ Backend: Python, Google Colab
β¦ ML/DS: TensorFlow / Keras, Pandas, NumPy, Scikit-learn
β¦ Utilities: joblib (model persistence), huggingface_hub (optional model hosting)
π‘ Contributions, issues, and feature requests are welcome!