Metadata-Version: 2.1
Name: TemporalGeneralizedRules
Version: 1.0.2
Summary: Algorithms for association Rule mining
Author: Ignacio Fernandez y Emiliano Galimberti
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.md

## Temporal Generalized Association Rules

This library provides four algorithms related to Association Rule mining. 
You can download this repository as a package with:

pip install TemporalGeneralizedRules

The algorithms are:
- Apriori
- Cumulate
- HTAR
- HTGAR

These algorithms use a transactional dataset that is transformed to a vertical format for optimization.
Dataset MUST follow the following format:

| order_id | product_name |
|----------|--------------|
| 1        | Bread        |
| 1        | Milk         |
| 2        | Bread        |
| 2        | Beer         |
| 3        | Eggs         |

Or if timestamps are provided:

| order_id | timestamp | product_name | 
|----------|-----------|--------------|
| 1        | 852087600 | Bread        |
| 1        | 852087600 | Milk         |
| 2        | 854420400 | Bread        |
| 2        | 854420400 | Beer         |
| 3        | 854420400 | Eggs         |

For taxonomy file use the following format (don't provide headers):

| child    | parent       |
|----------|--------------|
| Bread    | Dairy        |
| Milk     | Dairy        |
| Beer     | Beverage     |

One line for each child, parent

Each field is separated by ","

## TGAR 

This is the main class that must be instantiated once. 

### Usage 
import TemporalGeneralizedRules

tgar = TemporalGeneralizedRules.TGAR()


## Apriori

This algorithm has four parameters:
- filepath: Filepath of the dataset in csv format with the format discussed in the previous section.
- min_supp: Minimum support threshold.
- min_conf: Minimum confidence threshold.
- parallel_count: Optional parameter that enables parallelization in candidate count phase of the algorithm.

### Usage
tgar.apriori("dataset.csv", 0.05, 0.5)

## Cumulate 

This algorithm has six parameters:
- filepath: Filepath of the dataset in csv format with the format discussed in the previous section.
- taxonomy_filepath: Filepath of the taxonomy in csv format with the format discussed in the previous section.
- min_supp: Minimum support threshold.
- min_conf: Minimum confidence threshold.
- min_r: Minimum R-interesting threshold.
- parallel_count: Optional parameter that enables parallelization in candidate count phase of the algorithm. It can make execution faster.

### Usage
tgar.cumulate("dataset.csv", 0.05, 0.5, 1.1)

## HTAR 
This algorithm has four parameters:
- filepath: Filepath of the dataset in csv format with the format discussed in the previous section.
- min_supp: Minimum support threshold.
- min_conf: Minimum confidence threshold.
- parallel_count: Optional parameter that enables parallelization in candidate count phase of the algorithm. It can make execution faster.

### Usage
tgar.htar("dataset.csv", 0.05, 0.5)


## HTGAR 
This algorithm has six parameters:
- filepath: Filepath of the dataset in csv format with the format discussed in the previous section.
- taxonomy_filepath: Filepath of the taxonomy in csv format with the format discussed in the previous section.
- min_supp: Minimum support threshold.
- min_conf: Minimum confidence threshold.
- min_r: Minimum R-interesting threshold.
- parallel_count: Optional parameter that enables parallelization in candidate count phase of the algorithm. It can make execution faster.

### Usage

tgar.htgar("dataset.csv", 0.05, 0.5, 1.1)


## Pypy

For a better performance we recommend using this package with Pypy, a faster implementation of python.

https://www.pypy.org/


## Bibliography

The algorithms provided in this library were based on the following papers:

- Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB '94). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 487–499. https://dl.acm.org/doi/10.5555/645920.672836


- Ramakrishnan Srikant, Rakesh Agrawal, Mining generalized association rules, Future Generation Computer Systems, Volume 13, Issues 2–3, 1997, Pages 161-180, ISSN 0167-739X. https://www.sciencedirect.com/science/article/pii/S0167739X97000198


- R. Agrawal and J. C. Shafer, "Parallel mining of association rules," in IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 962-969, Dec. 1996, doi: 10.1109/69.553164. https://ieeexplore.ieee.org/document/553164


- Hong et al., 2016.Hong, T.-P., Lan, G.-C., Su, J.-H., Wu, P.-S., and Wang, S.-L. (2016). Discovery of temporal association rules with hierarchical granular framework. Applied Computing and Informatics, 12(2):134–141 https://www.sciencedirect.com/science/article/pii/S2210832716000041
