Metadata-Version: 2.1
Name: synthetic-dataset
Version: 0.0.0.2
Summary: Generating accurate and safe synthetic datasets for tabular, classification, and time-series labeling tasks
Author: Synthetic Dataset AI Team
Author-email: <admin@syntheticdataset.ai>
Keywords: python,pandas,numpy,scikit-learn,scipy,matplotlib,seaborn
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: License :: Free To Use But Restricted
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE


# Synthetic Data Generation for Tabular, Classification, and Time-Series Labels



This repository contains a Python-based framework for generating accurate and safe synthetic datasets for tabular, classification, and time-series labeling tasks. It is designed to help researchers, data scientists, and machine learning engineers create high-quality, realistic datasets for training and evaluating their models while ensuring privacy and compliance with data protection regulations.



## Features



1. **Tabular Data Generation**: Easily generate synthetic tabular datasets with customizable column types, distribution patterns, and correlations between variables.

2. **Classification Data Generation**: Create datasets for binary or multi-class classification tasks, controlling class imbalance and feature importance.

3. **Time-Series Data Generation**: Generate synthetic time-series datasets with user-defined seasonality, trend, and noise components.

4. **Data Privacy**: Ensure data privacy by using differential privacy techniques and limiting the degree of similarity between the original and synthetic datasets.

5. **Flexible and Extensible**: The framework is designed to be easily extended and adapted to a wide range of data generation tasks, with support for custom data generation modules and integration with other data generation tools.



## Installation



Clone the repository and install the required dependencies:



```bash

git clone https://github.com/syntheticdataset/synthetic-dataset.git

cd synthetic-dataset

pip install -r requirements.txt

```



## Usage

Refer to the provided examples and documentation for guidance on how to generate synthetic datasets for your specific use case.



from synthetic_data import TabularDataGenerator, ClassificationDataGenerator, TimeSeriesDataGenerator



```python

# Tabular data generation

tabular_gen = TabularDataGenerator(num_rows=1000)

tabular_data = tabular_gen.generate()



# Classification data generation

classification_gen = ClassificationDataGenerator(num_samples=1000, num_classes=3)

classification_data, labels = classification_gen.generate()



# Time-series data generation

time_series_gen = TimeSeriesDataGenerator(num_samples=1000, seasonal_period=12)

time_series_data = time_series_gen.generate()

```





## Contributing

Please read the CONTRIBUTING.md file for details on how to contribute to the project. We welcome pull requests, bug reports, and feature requests.



## License

This project is licensed under the MIT License - [Licence](https://github.com/syntheticdataset/synthetic-dataset/blob/main/LICENSE) see the  file for details.

