Hybrid GAN - Copula Data Synthesizer

Hybrid Generative Adversarial Neural Network and Copula Data Synthesizer

Model Demo: [ https://drive.google.com/file/d/1fd-9cMRGWv_yN45iL_XvwTw8XLJSGH6C/view?usp=drive_link ]

A powerful Streamlit-based application that enables you to upload a real-world tabular dataset, preprocess it automatically, and generate high-quality synthetic datasets using Gaussian Copula and Generative Adversarial Network (GAN)–based synthesis models.

Overview

This project provides a hybrid framework for tabular data generation using both statistical and deep-learning-based generative models.

It combines the strengths of:

Gaussian Copula models for learning statistical dependencies,
Generative Adversarial Networks (CTGAN) for capturing complex, non-linear relationships.

The interface is built with Streamlit, providing an intuitive and interactive web-based environment for:

Uploading datasets,
Configuring synthesis parameters,
Training models,
Evaluating generated data quality,
Downloading synthetic datasets.

Key Features

1. Dual Synthesis Engines

Choose between:

Gaussian Copula: Fast, statistically driven model for smaller datasets.
Generative Adversarial Network (GAN/CTGAN): Deep-learning approach for complex, large, or high-dimensional data.

2. Automated Preprocessing

Detects column types automatically.
Cleans missing values, encodes categories, parses datetimes.
Handles high-cardinality columns with configurable Top-K mapping.
Auto-converts incompatible types to SDV-supported formats.

3. Evaluation Metrics

After generating synthetic data, the app computes:

Kolmogorov–Smirnov (KS) Statistic: For numeric distribution similarity.
Jensen–Shannon (JS) Divergence: For categorical distributions.
Correlation Matrix MSE: To assess structure preservation.
ML Utility (AUC): Tests predictive similarity between real and synthetic data.
Privacy Metrics (Memorization Check): Ensures synthetic data doesn’t memorize real rows.

4. Synthetic Data Export

Instantly download your generated synthetic dataset as a CSV file.
Columns and types aligned with your original dataset.

Tech Stack

Component	Technology
Frontend	Streamlit
Backend / Model Engine	SDV (Synthetic Data Vault)
Models Used	GaussianCopula, CTGAN
Language	Python 3.8+
Libraries	pandas, numpy, sdv, scipy, scikit-learn
Interface	Streamlit UI with custom CSS

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
synth_gen.py		synth_gen.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hybrid GAN - Copula Data Synthesizer

Model Demo: [ https://drive.google.com/file/d/1fd-9cMRGWv_yN45iL_XvwTw8XLJSGH6C/view?usp=drive_link ]

Overview

Key Features

1. Dual Synthesis Engines

2. Automated Preprocessing

3. Evaluation Metrics

4. Synthetic Data Export

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hybrid GAN - Copula Data Synthesizer

Model Demo: [ https://drive.google.com/file/d/1fd-9cMRGWv_yN45iL_XvwTw8XLJSGH6C/view?usp=drive_link ]

Overview

Key Features

1. Dual Synthesis Engines

2. Automated Preprocessing

3. Evaluation Metrics

4. Synthetic Data Export

Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages