Metadata-Version: 2.1
Name: nlp-id
Version: 0.1.9.3
Summary: Kumparan's NLP Services
Home-page: https://github.com/kumparan/nlp-id
Author: Frandy Eddy
Author-email: eddy.frandy@gmail.com
License: MIT
Description: # Kumparan's NLP Services
        
        `nlp-id` is a collection of modules which provides various functions for Natural Language Processing for Bahasa Indonesia. This repository contains all source code related to NLP services.
        
        ## Installation
        
        To install `nlp-id`, use the following command:
        
            $ pip install nlp-id     
        
        
        ## Usage
        
        Description on how to use the lemmatizer, tokenizer, POS-tagger, etc. will be explained in more detail in this section.
        
        ### Lemmatizer
        
        Lemmatizer is used to get the root words from every word in a sentence.
        
            from nlp_id.lemmatizer import Lemmatizer 
            lemmatizer = Lemmatizer() 
            lemmatizer.lemmatize('Saya sedang mencoba') 
            # saya sedang coba 
            
        ### Tokenizer
        
        Tokenizer is used to convert text into tokens of word, punctuation, number, date, email, URL, etc. 
        There are two kinds of tokenizer in this repository, **standard tokenizer** and **phrase tokenizer**. 
        The **standard tokenizer** tokenizes the text into separate tokens where the word tokens are single-word tokens.
        
            from nlp_id.tokenizer import Tokenizer 
            tokenizer = Tokenizer() 
            tokenizer.tokenize('Lionel Messi pergi ke pasar di area Jakarta Pusat.') 
            # ['Lionel', 'Messi', 'pergi', 'ke', 'pasar', 'di', 'area', 'Jakarta', 'Pusat', '.']
            
        The **phrase tokenizer** tokenizes the text into separate tokens where the word tokens are phrases (single or multi-word tokens). 
        
            from nlp_id.tokenizer import PhraseTokenizer 
            tokenizer = PhraseTokenizer() 
            tokenizer.tokenize('Lionel Messi pergi ke pasar di area Jakarta Pusat.') 
            # ['Lionel Messi', 'pergi', 'ke', 'pasar', 'di', 'area', 'Jakarta Pusat', '.']
            
        ### POS Tagger
        
        POS tagger is used to obtain the Part-Of-Speech tag from a text.
        There are two kinds of POS tagger in this repository, **standard POS tagger** and **phrase POS tagger**. 
        The tokens in **standard POS Tagger** are single-word tokens, while the tokens in **phrase POS Tagger** are phrases (single or multi-word tokens).
        
            from nlp_id.postag import PosTag
            postagger = PosTag() 
            postagger.get_pos_tag('Lionel Messi pergi ke pasar di area Jakarta Pusat.') 
            # [('Lionel', 'NNP'), ('Messi', 'NNP'), ('pergi', 'VB'), ('ke', 'IN'), ('pasar', 'NN'), ('di', 'IN'), ('daerah', 'NN'),  
              ('Jakarta', 'NNP'), ('Pusat', 'NNP')]
            
            postagger.get_phrase_tag('Lionel Messi pergi ke pasar di area Jakarta Pusat.') 
            # [('Lionel Messi', 'NP'), ('pergi', 'VP'), ('ke', 'IN'), ('pasar', 'NN'), ('di', 'IN'), ('daerah', 'NN'), 
              ('Jakarta Pusat', 'NP'), ('.', 'SYM')]
        
            
        Description of tagset used for POS Tagger:
        
        | No. | Tag | Description | Example |
        |:-----:|:-----:|:--------|:------------|
        | 1 | ADV | Adverbs. Includes adverb, modal, and auxiliary verb | sangat, hanya, justru, boleh, harus, mesti|
        | 2 | CC  | Coordinating conjunction. Coordinating conjunction links two or more syntactically equivalent parts of a sentence. Coordinating conjunction can link independent clauses, phrases, or words. | dan, tetapi, atau |
        | 3 | DT  | Determiner/article. A grammatical unit which limits the potential referent of a noun phrase, whose basic role is to mark noun phrases as either definite or indefinite.| para, sang, si |
        | 4 | FW | Foreign word. Foreign word is a word which comes from foreign language and is not yet included in Indonesian dictionary| online, e-commerce |
        | 5 | IN  | Preposition. A preposition links word or phrase and constituent in front of that preposition and results prepositional phrase. | dalam, dengan, di, ke|
        | 6 | JJ | Adjective. Adjectives are words which describe, modify, or specify some properties of the head noun of the phrase | bersih, panjang, jauh, marah |
        | 7 | NEG | Negation | tidak, belum, jangan |
        | 8 | NN | Noun. Nouns are words which refer to human, animal, thing, concept, or understanding | meja, kursi, monyet, perkumpulan |
        | 9 | NNP | Proper Noun. Proper noun is a specific name of a person, thing, place, event, etc. | Indonesia, Jakarta, Piala Dunia, Idul Fitri, Jokowi |
        | 10 | NUM  | Number. Includes cardinal and ordinal number | 9876, 2019, 0,5, empat |
        | 11 | PR  | Pronoun. Includes personal pronoun and demonstrative pronoun | saya, kami, kita, kalian, ini, itu |
        | 12 | RP  | Particle. Particle which confirms interrogative, imperative, or declarative sentences | pun, lah, kah|
        | 13 | SC  | Subordinating Conjunction. Subordinating conjunction links two or more clauses and one of the clauses is a subordinate clause. | sejak, jika, seandainya, dengan, bahwa, yang|
        | 14 | SYM | Symbols and Punctuations  | +,%,@ |
        | 15 | UH | Interjection. Interjection expresses feeling or state of mind and has no relation with other words syntactically. | ayo, mari, aduh|
        | 16 | VB | Verb. Includes transitive verbs, intransitive verbs, active verbs, passive verbs, and copulas. | tertidur, bekerja, membaca |
        | 17 | WH | Question words | siapa, apa, kapan, bagaimana |
        | 18 | ADJP | Adjective Phrase. A group of words headed by an adjective that describes a noun or a pronoun | sangat tinggi |
        | 19 | DP | Date Phrase. Date written with whitespaces | 1 Januari 2020 |
        | 20 | NP | Noun Phrase. A phrase that has a noun (or indefinite pronoun) as its head | Jakarta Pusat, Lionel Messi |
        | 21 | NUMP | Number Phrase.  | 10 juta |
        | 22 | VP | Verb Phrase. A syntactic unit composed of at least one verb and its dependents | tidak makan |
Keywords: Indonesian,Bahasa,NLP
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Text Processing :: Linguistic
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Description-Content-Type: text/markdown
