Skip to content

Conversation

@Siddharth7269
Copy link
Collaborator

Added the completed model for TabDDPM Model

Implemented TabDDPM training pipeline for synthetic tabular data generation
Built end-to-end diffusion-based code following Kotelnikov et al.’s architecture (2209.15421v2), including scheduler, sampler and noise schedule modules.

Completed experiments using two evaluation protocols

50 / 50 real–synthetic split with 2-fold cross-validation, repeated 3 times per dataset

70 / 30 train–test split matching the paper’s original setup

Integrated comprehensive performance metrics

TSTR accuracy (MLP, Logistic Regression, XGBoost, Random Forest)

Jensen–Shannon Divergence (JSD)

Wasserstein Distance (WD)

Developed class-injection logic for missing labels
Automatically detect underrepresented classes in synthetic outputs and inject real samples to ensure compatibility with XGBoost and other downstream models.

Added dynamic epoch configuration

100 epochs for small / medium datasets

150 epochs for large datasets

Aligned hyperparameters and benchmarking with the paper
Matched learning rate schedule, batch size, and model depth exactly as in 2209.15421v2 to validate reproducibility.

Surpassed published benchmarks
Achieved an F1-score of 0.80 on the UCI Adult dataset versus the paper’s 0.795 benchmark
.

Containerized the full pipeline with Docker
Created Dockerfile and Compose scripts to encapsulate all dependencies for environment-independent execution.

Modularized dataset handling and preprocessing scripts
Encapsulated loading, cleaning, encoding and splitting logic into reusable modules for rapid onboarding of new datasets.

Managed experiments via GitHub
Employed feature branches, structured commits and CI-driven validation to track code, hyperparameters and results.

Automated final result aggregation
Wrote scripts to compile averaged evaluation metrics and divergence scores across repeats into a consolidated CSV report.

Added  the completed model for TabDDPM Model 

Signed-off-by: Siddharth Yadav <55278616+Siddharth7269@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants