Metadata-Version: 2.1
Name: elotl
Version: 0.0.1.14
Summary: Paquete para PLN de lenguas originarias
Home-page: https://github.com/ElotlMX/py-elotl
Author: Paul Aguilar (@penserbjorne)
Author-email: paul.aguilar.enriquez@hotmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: Spanish
Classifier: Topic :: Utilities
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Development Status :: 2 - Pre-Alpha
Requires-Python: >=3
Description-Content-Type: text/markdown
License-File: LICENSE

# Py-Elotl

Python package for Natural Language Processing (NLP), focused on low-resource languages spoken in Mexico.

This is a project of [Comunidad Elotl](https://elotl.mx/).

Developed by:
- Paul Aguilar [@penserbjorne](https://github.com/penserbjorne), [paul.aguilar.enriquez@hotmail.com](mailto:paul.aguilar.enriquez@hotmail.com)
- Robert Pugh [@Lguyogiro](https://github.com/Lguyogiro), [robertpugh408@gmail.com](mailto:robertpugh408@gmail.com)

Requiere python>=3.X

- Development Status `Pre-Alpha`. Read [Classifiers](https://pypi.org/classifiers/)
- pip package: [elotl](https://pypi.org/project/elotl/)
- GitHub repository: [ElotlMX/py-elotl](https://github.com/ElotlMX/py-elotl)

## Installation

### Using `pip`

```bash
pip install elotl
```

### From source

```bash
git clone https://github.com/ElotlMX/py-elotl.git
cd py-elotl
pip install -e .
```

## Use

### Working with corpus

```python
import elotl.corpus
```

#### Listing available corpus

Code:

```python
print("Name\t\tDescription")
list_of_corpus = elotl.corpus.list_of_corpus()
for row in list_of_corpus:
    print(row)
```

Output:

```bash
Name		Description
['axolotl', 'Is a Spanish-Nahuatl parallel corpus']
['tsunkua', 'Is a Spanish-otomí parallel corpus']

```

#### Loading a corpus

If a non-existent corpus is requested, a value of 0 is returned.

```python
axolotl = elotl.corpus.load('axolotlr')
if axolotl == 0:
    print("The name entered does not correspond to any corpus")
```

If an existing corpus is entered, a list is returned.

```python
axolotl = elotl.corpus.load('axolotl')
for row in axolotl:
    print(row)
```

```bash
['Hay que adivinar: un pozo, a la mitad del cerro, te vas a encontrar.', 'See tosaasaanil, see tosaasaanil. Tias iipan see tepeetl, iitlakotian tepeetl, tikoonextis san see aameyalli.', '', 'Adivinanzas nahuas']
```

Each element of the list has four indices:

- non_original_language
- original_language
- variant
- document_name

```python
tsunkua = elotl.corpus.load('tsunkua')
  for row in tsunkua:
      print(row[0]) # language 1
      print(row[1]) # language 2
      print(row[2]) # variant
      print(row[3]) # document
```

```bash
Una vez una señora se emborrachó
nándi na ra t'u̱xú bintí
Otomí del Estado de México (ots)
El otomí de toluca, Yolanda Lastra

```

### Normalizing nahuatl orthographies

Import the orthography module and Load the axolot nahuatl corpus.

```python
import elotl.corpus
import elotl.nahuatl.orthography
a = elotl.corpus.load("axolotl")
```

Creates a normalizer object, passing as parameter the normalization to be used.

The following normalizations are currently available:

- sep-u-j
- sep-w-h
- ack

If an unsupported normalization is specified, sep-u-j will be used by default.

You can use the `normalize` method to normalize a text to the selected orthography. And the `to_phones` method to get
the phonemes.

```python
>>> n = elotl.nahuatl.orthography.Normalizer("sep-u-j")
>>> n.normalize(a[1][1])
'au in ye yujki in on tlenamakak niman ye ik teixpan on motlalia se tlakatl itech mokaua.'
>>> n.to_phones(a[1][1])
'aw in ye yuʔki in on ƛenamakak niman ye ik teiʃpan on moƛalia se ƛakaƛ itet͡ʃ mokawa.'
```

```python
>>> n = elotl.nahuatl.orthography.Normalizer("sep-w-h")
>>> n.normalize(a[1][1])
'aw in ye yuhki in on tlenamakak niman ye ik teixpan on motlalia se tlakatl itech mokawa.'
>>> n.to_phones(a[1][1])
'aw in ye yuʔki in on ƛenamakak niman ye ik teiʃpan on moƛalia se ƛakaƛ itet͡ʃ mokawa.'
```

```python
>>> n = elotl.nahuatl.orthography.Normalizer("ack")
>>> n.normalize(a[1][1])
'auh in ye yuhqui in on tlenamacac niman ye ic teixpan on motlalia ce tlacatl itech mocahua.'
>>> n.to_phones(a[1][1])
'aw in ye yuʔki in on ƛenamakak niman ye ik teiʃpan on moƛalia se ƛakaƛ itet͡ʃ mokawa.'
```
## Package structure

The following structure is a reference. As the package grows it will be better documented.

```
elotl/                              Top-level package
          __init__.py               Initialize the package
          corpora/                  Here are the corpus data
          corpus/                   Subpackage to load corpus     
          nahuatl/                  Nahuatl language subpackage
                  orthography.py    Module to normalyze nahuatl orthography and phonemas
          utils/                    Subpackage with useful functions and files
                  fst/              Finite State Transducer functions
                        att/        Module with static .att files
test/                               Unit test scripts
```

## Development

### Build FSTs

Requires [HFST](https://github.com/hfst/hfst) to be installed. Install it and build the FSTs with `make`.

```bash
make all
```
### Create a virtual environment and activate it.

```bash
virtualenv --python=/usr/bin/python3 venv
source venv/bin/activate
```
### Update `pip` and generate distribution files.

```bash
python -m pip install --upgrade pip
python -m pip install --upgrade setuptools wheel
python setup.py clean sdist bdist_wheel
```

### Testing the package locally

```bash
python -m pip install -e .
```

### Send to PyPI

```bash
python -m pip install twine
twine upload dist/*
```

## License

[Mozilla Public License 2.0 (MPL 2.0)](./LICENSE)

## References

- [https://elotl.mx/](https://elotl.mx/)
- [Packaging Python Projects](https://packaging.python.org/tutorials/packaging-projects/)
- [How To Package Your Python Code](https://python-packaging.readthedocs.io/en/latest/minimal.html)
- [Making a Python Package](https://python-packaging-tutorial.readthedocs.io/en/latest/setup_py.html)


