Metadata-Version: 2.1
Name: cleantxty
Version: 0.0.2
Summary: Python package to clean strings and making them reasonable for NLP
Home-page: UNKNOWN
Author: Tripathi Aditya Prakash
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

# cleantxty
Python package to clean strings and making them reasonable for NLP.

**cleantxty** is a an open-source python package cleaning text from raw text format. Source code for the library can be found [here.](https://github.com/aditya0072001/cleantxty)



## Features 

cleantxt has two main methods,
* **clean**: to clean raw text and return the cleaned text
* **clean_words**: to clean raw text and return a list of clean words

other menthods that can be used simultaneoulsy are:
* **remove_link**: to remove link from the text
* **remove_extra_white_space**: to remove extra white space from the text
* **lower_text**: to make case of the text to lower case
* **upper_text**: to make case of the text to upper case
* **remove_stopwords**: to remove stopwords from the text
* **remove_digits**: to remove digits from the text
* **remove_punctuations**: to remove punctuations from the text
* **custom_regex**: to use custom regex and appy to text
* **stem_text**: to stem the provided text


## Installation

cleantext requires [Python 3](https://www.python.org/downloads/) and [NLTK](http://www.nltk.org/install.html) to execute. 

To install using pip, use

`pip install cleantxty`

## Usage

* **Import the library**:

``` python
import cleantxty
```

* **Choose a method:**

 To return the text in a string format, 
 
``` python
cleantxty.clean("raw_text_here") 
```
 
 To return a list of words from the text,
 
``` python
cleantxty.clean_words("raw_text_here") 
```
 
 To choose a specific set of cleaning operations,

``` python
cleantxty.clean("raw_text_here",
default_case= "lower", # lower by default change to upper for upper case result
regex=None  # Provide custom regex to use
)

cleantxty.clean_words("raw_text_here",
default_case= "lower", # lower by default change to upper for upper case result
regex=None  # Provide custom regex to use
)
```

## Examples

``` python
import cleantxty
cleantxty.clean('This is A s$ple ? tExt3% to   cleaN566556+wow8 ')
```

returns,

``` Python
'this is a sample text to clean'
```

----

``` Python
import cleantxty
cleantext.clean_words('This is A s$ample !!!! tExt3% to   cleaN566556+2+59*/133')
```

returns,

``` Python
['sampl', 'text', 'clean']
```

----

``` Python
from cleantxty import clean
text = "my id, name1@dom1.com and your, name2@dom2.in"
clean(text, regex=r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+")
```

returns,

``` Python
"my id, email and your, email"
```

## License

##### MIT

For any questions, issues, bugs, and suggestions please visit [here](https://github.com/aditya0072001/cleantxt/issues)

