Metadata-Version: 2.1
Name: doc2term
Version: 0.1
Summary: A fast NLP tokenizer that detects tokens and remove duplications and punctuations
Home-page: https://github.com/callforpapers-source/doc2term
Maintainer: Saeed Dehqan
Maintainer-email: saeed.dehghan@owasp.org
License: Apache License, Version 2.0
Description: # doc2term
        
        [![Build Status](https://travis-ci.com/callforpapers-source/doc2term.svg?branch=main)](https://travis-ci.com/callforpapers-source/doc2term)
        [![pypi](https://badge.fury.io/py/doc2term.svg)](https://pypi.org/project/doc2term/)
        [![license](https://img.shields.io/:license-Apache%202-blue.svg)](http://github.com/callforpapers-source/doc2term/blob/master/LICENSE.txt)
        
        A fast NLP tokenizer that detects sentences, words, numbers, urls, hostnames, emails, filenames, and phone numbers. Tokenize integrates and standardize the documents, remove the punctuations and duplications.
        
        ## Installation
        
        ```
        pip install doc2term
        ```
        
        ### Compilation
        
        The installation requires to compile the original C code using `gcc`.
        
        ## Usage
        
        Example notebook: [doc2term](https://nbviewer.jupyter.org/github/callforpapers-source/doc2term/blob/main/examples/doc2term.ipynb)
        
        ### Example
        
        ```python
        >>> import doc2term
        
        >>> doc2term.doc2term_str("Actions speak louder than words. ... ")
        "Actions speak louder than words ."
        >>> doc2term.doc2term_str("You can't judge a book by its cover. ... from thoughtcatalog.com")
        "You can't judge a book by its cover . from thoughtcatalog.com"
        
        ```
        
Keywords: tokenizer,NLP,punctuation,standarization,duplicate-detection,text-processing,text-tokenizing,doc2term
Platform: UNKNOWN
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Python: >=3.7
Description-Content-Type: text/markdown
