Metadata-Version: 2.1
Name: gotext
Version: 0.9
Summary: GoText is a universal text extraction and preprocessing tool for python which supportss wide variety of document formats.
Home-page: https://github.com/VaibhavHaswani/GoText
Author: Vaibhav Haswani
Author-email: vaibhavhaswani@gmail.com
License: UNKNOWN
Download-URL: https://github.com/VaibhavHaswani/GoText/archive/refs/tags/v0.9.tar.gz
Project-URL: Bug Tracker, https://github.com/VaibhavHaswani/GoText/issues
Description: # GoText v0.9
        GoText is a universal text extraction and preprocessing tool for python which supportss wide variety of document formats.
        
        ## Install
        
        ```
        pip install gotext
        ```
        
        ## How To Use
        
        ``` python
        from gotext import GoDocument
        
        # process single document
        doc_path='docs/test.docx'
        go_obj=GoDocument(doc_path=doc_path)
        print(go_obj._text) #returns text extracted from document
        print(go_obj.preprocess()) #preprocess document and returns preprocessed text
        
        #process all the documents within a directory
        docs_dir='docs/'
        go_obj=GoDocument(docs_dir=docs_dir)
        print(go_obj._text) #returns a list of texts extracted from all the documents
        print(go_obj.preprocess()) #preprocess documents and returns a list of preprocessed text
        ```
        
        ## Feedback / Queries
        
        For any queries or feedback feel free to write to vaibhavhaswani@gmail.com
        
Keywords: text extraction,text preprocessing,document extraction,text utils
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
