Skip to content

Generate realistic synthetic data using Generative Adversarial Networks (GANs) trained on app usage statistics. Ideal for privacy-safe data analysis and machine learning applications.

Notifications You must be signed in to change notification settings

nandanarnandu/Synthetic_Data_Generator-GAN-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ§ πŸ“Š Synthetic Data Generator using GANs

Python TensorFlow scikit-learn pandas License: MIT

A complete synthetic data generation solution using Generative Adversarial Networks (GANs) to create privacy-safe synthetic app usage data. Built with Python, TensorFlow, and scikit-learn, perfect for data augmentation and safe model training.


✨ Features

  • πŸ“€ Upload & Manage Data
    Upload your app usage dataset in CSV format using Google Colab file uploader.

  • 🧹 Data Preprocessing
    Drop unneeded columns, normalize numeric data using MinMaxScaler.

  • βš™οΈ Generator & Discriminator Models
    Fully configured Keras-based Generator and Discriminator architectures for synthetic data generation.

  • πŸ”„ Adversarial Training
    Train GAN in an adversarial setup for realistic synthetic data generation.

  • πŸ’Ύ Model Persistence
    Save generator model (generator.h5), scaler (scaler.pkl), and column mapping (columns.json) for reuse.

  • πŸ“Š Generate Synthetic Samples
    Generate and export new synthetic records after training.


πŸš€ Quick Start

# Clone this repo
git clone https://github.com/your-username/gan-synthetic-data-generator.git
cd gan-synthetic-data-generator

# Open in Google Colab
# Upload your 'screentime_analysis.csv'

# Install required dependencies
pip install -r requirements.txt

# (Optional) Install huggingface tools
pip install huggingface_hub
apt-get install git-lfs -y
git lfs install

# Run the notebook in Google Colab
# Step by step run cells to preprocess data, build models, train GAN, and generate synthetic data

# After training:
# generator.h5, scaler.pkl, columns.json will be saved automatically

πŸ“‚ Dataset

The dataset: screentime_analysis.csv Example columns: ⦁ Date: Date (dropped during preprocessing)

⦁ App: App name (dropped during preprocessing)

⦁ Usage: Time spent using the app

⦁ Notifications: Number of notifications received

⦁ Times Opened: Number of times the app was opened

Place your CSV inside data/ (e.g., data/ad_users.csv) or upload via the web UI.

πŸ“Š Features Used

⦁ App Usage Time

⦁ Notifications Received

⦁ App Open Count

πŸ” Techniques Applied

⦁ Data Normalization (MinMaxScaler)

⦁ Generator & Discriminator Deep Neural Networks (DNNs)

⦁ Adversarial Training Loop (GAN)

⦁ Synthetic Data Generation

πŸ“Œ Sample Output

Example synthetic data output (after training):

[[482.3, 18.5, 12.1], [397.8, 10.3, 7.6], [510.2, 15.8, 11.0]]

Generated to mimic real-world data distribution while ensuring privacy.

πŸ“ˆ Output

⦁ generator.h5: Trained Generator model

⦁ scaler.pkl: Saved MinMaxScaler for consistent normalization

⦁ columns.json: Original column names used in the dataset

πŸ› οΈ Tech Stack

⦁ Backend: Python, Google Colab

⦁ ML/DS: TensorFlow / Keras, Pandas, NumPy, Scikit-learn

⦁ Utilities: joblib (model persistence), huggingface_hub (optional model hosting)

πŸ’‘ Contributions, issues, and feature requests are welcome!


About

Generate realistic synthetic data using Generative Adversarial Networks (GANs) trained on app usage statistics. Ideal for privacy-safe data analysis and machine learning applications.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published