Metadata-Version: 2.1
Name: random-data-gen
Version: 0.1.3
Summary: Package to generate random transactional data
Home-page: https://github.com/felipesassi/random-data-gen
Author: Felipe Sassi
Author-email: felipesassi@outlook.com
Requires-Python: >=3.8,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: numpy (>=1.22.1,<2.0.0)
Requires-Dist: pandas (>=1.4.0,<2.0.0)
Project-URL: Repository, https://github.com/felipesassi/random-data-gen
Description-Content-Type: text/markdown

# RandomDataGen - Random Data Generator Package

<a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
[![Checked with mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)
[![Downloads](https://pepy.tech/badge/random-data-gen)](https://pepy.tech/project/random-data-gen)

This is a package to generate random transactional data. You can use this package to study Pandas operations or clustering methods like RFM.

With this package you can create a table with transactional data containing:

- consumer_id: ID identifying the customer that does the transaction;
- transaction_created_at: Date of transaction;
- transaction_payment_value: Monetary value of transaction.

All the fields are customizable.

## How the data is generated

The *consumer_id* field is generated by a range function, returning a sequence of integers from 1 to *n_consumers*:

``` python
consumer_ids = range(1, n_consumers + 1)
```

The *transaction_created_at* field is generated by a Pandas function called date_range. You can view more about this functions in this [link](https://pandas.pydata.org/docs/reference/api/pandas.date_range.html):

``` python
created_at_list = list(pd.date_range(start=first_transaction_date, end=last_transaction_date, periods=n_rows)
```

The *transaction_payment_value* is sample from a normal distribution with mean equals the mean_spend parameter and the stardand deviation  equals the std_spend parameter:

``` python
list(np.random.normal(transaction_mean_value, transaction_std_value, n_rows))
```

## How to use

You can start the use of RandomDataGen with this example code:

``` python
from random_data_gen.data_generator import TransactionalDataGenerator

TRGenerator = TransactionalDataGenerator(
    n_rows=1000,
    n_consumers=100,
    transaction_mean_value=100,
    transaction_std_value=10,
    first_transaction_date="2020-01-01",
    last_transaction_date="2021-01-01",
)

df = TRGenerator()
```

In this snippet we defined a dataframe with 1000 rows, 100 unique users, a mean spend in transactions of 100u.m., a standard deviation in transactional spend of 10u.m., the first transaction date (2020-01-01) and the last transaction date (2021-01-01).

The dataframe returned is in the form:

```
| consumer_id |     transaction_created_at    | transaction_payment_value |
|:-----------:|:-----------------------------:|:-------------------------:|
|     234     | 2020-01-01 00:00:00.000000000 |           120.10          |
|      43     | 2020-01-01 08:47:34.054054054 |           87.10           |
|     321     | 2021-10-23 10:27:12.092356134 |           12.98           |
|     3123    | 2020-12-30 21:37:17.837837840 |           12.84           |
```

The shape of this dataframe is defined by the parameter *n_rows*.

## Contribute

To contribute you need to install [Poetry](https://python-poetry.org/).

After installing, you need to clone this repo and run the following command:

```
poetry install -n
```

Before sending the code to the repo, you need to run:

```
make format
```

To apply the project style to the new code.

And after that, run:

```
make check
```

This command will check your code with flake8 and pytest.


