Metadata-Version: 2.1
Name: TriesWithFrequencies
Version: 0.0.1
Summary: This package provides tries (prefix trees) with frequencies implementation based on dictionaries.
Home-page: https://github.com/antononcube/Python-packages/tree/main/TriesWithFrequencies
Author: Anton Antonov
Author-email: antononcube@posteo.net
Keywords: trie,prefix tree,frequency,frequencies
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: BSD License
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE

# Tries with frequencies

This Python package has functions for creation and manipulation of 
[Tries (Prefix trees)](https://en.wikipedia.org/wiki/Trie) 
with frequencies.

The package provides Machine Learning (ML) functionalities, not "just" a 
[Trie]((https://en.wikipedia.org/wiki/Trie)) 
data structure.

This Python implementation closely follows the Mathematica implementation [AAp2].

--------

## Installation

### From GitHub:

```shell
pip install -e git+https://github.com/antononcube/Python-packages.git#egg=TriesWithFrequencies-antononcube\&subdirectory=TriesWithFrequencies
```

### From PyPI:

```shell
python3 -m pip install TriesWithFrequencies
```

------------------------------------------------------------------------

## Setup

```python
from TriesWithFrequencies import *
```

------------------------------------------------------------------------

## Creation examples

In this section we show a few ways to create tries with frequencies.

Consider a trie (prefix tree) created over a list of words:

```python
tr = trie_create_by_split( ["bar", "bark", "bars", "balm", "cert", "cell"] )
trie_form(tr)
```


    TRIEROOT => 6.0
    ├─b => 4.0
    │ └─a => 4.0
    │   ├─r => 3.0
    │   │ └─k => 1.0
    │   │ └─s => 1.0
    │   └─l => 1.0
    │     └─m => 1.0
    └─c => 2.0
      └─e => 2.0
        ├─r => 1.0
        │ └─t => 1.0
        └─l => 1.0
          └─l => 1.0


Here we convert the trie with frequencies above into a trie with
probabilities:


```python
ptr = trie_node_probabilities( tr )
trie_form(ptr)
```


    TRIEROOT => 1.0
    ├─b => 0.6666666666666666
    │ └─a => 1.0
    │   ├─r => 0.75
    │   │ ├─k => 0.3333333333333333
    │   │ └─s => 0.3333333333333333
    │   └─l => 0.25
    │     └─m => 1.0
    └─c => 0.3333333333333333
      └─e => 1.0
        ├─r => 0.5
        │ └─t => 1.0
        └─l => 0.5
          └─l => 1.0


------------------------------------------------------------------------

## Shrinking

Here we shrink the trie with probabilities above:

```python
trie_form(trie_shrink(ptr))
```


    TRIEROOT => 1.0
    └─ba => 1.0
      └─r => 0.75
        └─k => 0.3333333333333333
        └─s => 0.3333333333333333
      └─lm => 1.0
    └─ce => 1.0
      └─rt => 1.0
      └─ll => 1.0


Here we shrink the frequencies trie using a separator:


```python
trie_form(trie_shrink(tr, sep="~"))
```


    TRIEROOT => 6.0
    └─b~a => 4.0
      └─r => 3.0
        └─k => 1.0
        └─s => 1.0
      └─l~m => 1.0
    └─c~e => 2.0
      └─r~t => 1.0
      └─l~l => 1.0


------------------------------------------------------------------------

## Retrieval and sub-tries

Here we retrieve a sub-trie with a key:

```python
trie_form(trie_sub_trie(tr, list("bar")))
```


    r => 3.0
    └─k => 1.0
    └─s => 1.0


------------------------------------------------------------------------

## Classification

Create a trie:

```python
words = [*(["bar"] * 6), *(["bark"] * 3), *(["bare"] * 2), *(["cam"] * 3), "came", *(["camelia"] * 4)]
tr = trie_create_by_split(words)
tr = trie_node_probabilities(tr)
```
Show node counts:

```python
trie_node_counts(tr)
```

    {'total': 13, 'internal': 10, 'leaves': 3}


Show the trie form:

```python
trie_form(tr)
```


    TRIEROOT => 1.0
    ├─b => 0.5789473684210527
    │ └─a => 1.0
    │   └─r => 1.0
    │     ├─k => 0.2727272727272727
    │     └─e => 0.18181818181818182
    └─c => 0.42105263157894735
      └─a => 1.0
        └─m => 1.0
          └─e => 0.625
            └─l => 0.8
              └─i => 1.0
                └─a => 1.0

Classify with the letters of the word \"cam\":

```python
trie_classify(tr, list("cam"), prop="Probabilities")
```

    {'a': 0.5, 'm': 0.375, 'e': 0.12499999999999997}



------

## References

### Articles

[AA1] Anton Antonov,
["Tries with frequencies for data mining"](https://mathematicaforprediction.wordpress.com/2013/12/06/tries-with-frequencies-for-data-mining/),
(2013),
[MathematicaForPrediction at WordPress](https://mathematicaforprediction.wordpress.com).

[AA2] Anton Antonov,
["Removal of sub-trees in tries"](https://mathematicaforprediction.wordpress.com/2014/10/12/removal-of-sub-trees-in-tries/),
(2013),
[MathematicaForPrediction at WordPress](https://mathematicaforprediction.wordpress.com).

[AA3] Anton Antonov,
["Tries with frequencies in Java"](https://mathematicaforprediction.wordpress.com/2017/01/31/tries-with-frequencies-in-java/)
(2017),
[MathematicaForPrediction at WordPress](https://mathematicaforprediction.wordpress.com).
[GitHub Markdown](https://github.com/antononcube/MathematicaForPrediction).

[RAC1] Tib,
["Day 10: My 10 commandments for Raku performances"](https://raku-advent.blog/2020/12/10/day-10-my-10-commandments-for-raku-performances/),
(2020),
[Raku Advent Calendar](https://raku-advent.blog).

[WK1] Wikipedia entry, [Trie](https://en.wikipedia.org/wiki/Trie).

### Packages

[AAp1] Anton Antonov, 
[Tries with frequencies Mathematica Version 9.0 package](https://github.com/antononcube/MathematicaForPrediction/blob/master/TriesWithFrequenciesV9.m),
(2013), 
[MathematicaForPrediction at GitHub](https://github.com/antononcube/MathematicaForPrediction).

[AAp2] Anton Antonov,
[Tries with frequencies Mathematica package](https://github.com/antononcube/MathematicaForPrediction/blob/master/TriesWithFrequencies.m),
(2013-2018),
[MathematicaForPrediction at GitHub](https://github.com/antononcube/MathematicaForPrediction).

[AAp3] Anton Antonov, 
[Tries with frequencies in Java](https://github.com/antononcube/MathematicaForPrediction/tree/master/Java/TriesWithFrequencies), 
(2017),
[MathematicaForPrediction at GitHub](https://github.com/antononcube/MathematicaForPrediction).

[AAp4] Anton Antonov, 
[Java tries with frequencies Mathematica package](https://github.com/antononcube/MathematicaForPrediction/blob/master/JavaTriesWithFrequencies.m), 
(2017),
[MathematicaForPrediction at GitHub](https://github.com/antononcube/MathematicaForPrediction).

[AAp5] Anton Antonov, 
[Java tries with frequencies Mathematica unit tests](https://github.com/antononcube/MathematicaForPrediction/blob/master/UnitTests/JavaTriesWithFrequencies-Unit-Tests.wlt), 
(2017), 
[MathematicaForPrediction at GitHub](https://github.com/antononcube/MathematicaForPrediction).

[AAp6] Anton Antonov,
[ML::TriesWithFrequencies Raku package](https://github.com/antononcube/Raku-ML-TriesWithFrequencies),
(2021),
[GitHub/antononcube](https://github.com/antononcube).


### Videos

[AAv1] Anton Antonov,
["Prefix Trees with Frequencies for Data Analysis and Machine Learning"](https://www.youtube.com/watch?v=MdVp7t8xQbQ),
(2017),
Wolfram Technology Conference 2017,
[Wolfram channel at YouTube](https://www.youtube.com/channel/UCJekgf6k62CQHdENWf2NgAQ).
