
[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)
<!-- [![forthebadge](https://forthebadge.com/images/badges/made-with-java.svg)](https://forthebadge.com) -->

<div align = center>
<a href = "github.com/plugyawn"><img width="600px" height="180px" src= "https://user-images.githubusercontent.com/76529011/185376373-787f65d5-b78b-4f11-a7fb-e9aa19dc3a04.png"></a>
</div>

-----------------------------------------
[![code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)![Compatibility](https://img.shields.io/badge/compatible%20with-python3.9.x-blue.svg)

CMTT is a wrapper library that makes code-mixed text processing more efficient than ever. More documentation incoming!

### Installation
```
pip install code-mixed-text-toolkit
```

### Get started
How to use this library:

```Python
import code_mixed_text_toolkit.data as cmtt_data
import code_mixed_text_toolkit.preprocessing as cmtt_pp

# Loading json files
result_json = cmtt_data.load('https://world.openfoodfacts.org/api/v0/product/5060292302201.json')

# Loading csv files
result_csv = cmtt_data.load('https://gist.githubusercontent.com/rnirmal/e01acfdaf54a6f9b24e91ba4cae63518/raw/b589a5c5a851711e20c5eb28f9d54742d1fe2dc/datasets.csv')

# List all datasets available
cmtt_data.list_datasets(show_key="url")

# Download specific datasets
cmtt_data.download("openfoodfacts")
cmtt_data.download("rnirmal")

# Load and preprocess txt dataset
result_txt = cmtt_data.load('https://www.w3.org/TR/PNG/iso_8859-1.txt')
result_txt_tokenized = cmtt_pp.tokenizer.word_tokenize(result_txt)

# Search target word in txt corpus
cmtt_pp.search.search_word(result_txt, 'with', tokenize = True, width = 3)
```

### Contributors
 - [Paras Gupta](https://github.com/paras-gupt)
 - [Tarun Sharma](https://github.com/tarun2001sharma)
 - [Reuben Devanesan](https://github.com/Reuben27)