Metadata-Version: 2.1
Name: rnc
Version: 0.2
Summary: API for National Russian Corpus
Home-page: https://github.com/FaustGoethe/RNC
Author: Kolobov Kirill, Python beginner
Author-email: alniconim@gmail.com
License: MIT
Description: ### API for [National Russian Corpus](http://ruscorpora.ru) 
        
        #### Installation
        ```bash
        pip install bs4 aiohttp lxml rnc
        ```
        
        #### Structure
        
        ---
        A Corpus object contains list of obtained examples.
        There're two types of example:
        ![](https://github.com/FaustGoethe/RNC/blob/master/docs/Two_ex_types.png?raw=true) <br> 
        * If `out` is `normal`, API uses normal example, which name is equal to the Corpus class name:
        ```python
        ru = MainCorpus(...)
        ru.request_examples()
        
        print(type(ru[0]))
        >>> MainExample
        ```
        * if `out` is `kwic`, API uses `KwicExample`.
        
        Example objects [properties](https://github.com/FaustGoethe/RNC/blob/master/docs/Examples.md)   
        
        #### Usage
        
        ---
        ```python
        import rnc
        
        ru = rnc.corpus_name(
            query='корпус', 
            p_count=5,
            file='filename.csv',
            **kwargs
        )
        
        ru.request_examples()
        ```
        * query – one str or dict with tags. Word to found, one should give the vocabulary form of it.
        * p_count – count of PAGES.
        * file – name of local csv file, optional.
        * kwargs – additional params.
        
        [Corpora](https://github.com/FaustGoethe/RNC/blob/master/docs/Corpora.md)
        
        ##### Full version of query
        ```python
        query = {
            'word1': {
                'gramm': 'acc', # grammar tags for lexgramm search
                'flags': 'bdot' # additional tags for lexgramm search
            },
            # you can get as a value one string or dict of params
            # params are: any name of dict key, name of tag (you can see them below)  
            'word2': {
                'gramm': { 
                    # the NAMES of these keys may be any
                    'pos (any name)': 'S' or ['S', 'A'], # one value or list of values,
                    'case (any name)': 'acc' or ['acc', 'nom'],
                },
                'flags': {}, # all the same to here
                # distance between first and second words
                'min': 1,  
                'max': 3
            },  
        }
        
        corp = rnc.corpus_name(
            query=query,
            p_count=5,
            file='filename.csv',
            **kwargs
        )
        corp.reques_examples()
        ```
        [Lexgramm search params](https://github.com/FaustGoethe/RNC/tree/master/docs/Lexgram%20search%20params)
        
        
        
        ##### Additional params
        These params are optional, you can ignore them. 
        ```python
        ru = rnc.corpus_name(
            query=query, 
            p_count=5,
            file='filename.csv',
            marker=str.upper, # function, with which found wordforms'll be marked
            dpp=5, # documents per page
            spd=1, # sentences per document
            text='lexgramm' or 'lexform', # way to search
            out='normal' or 'kwic', # output format
            kwsz=5, # if out=kwic, count of words in context
            sort='sort_key', # way to sort the results
            subcorpus='', # see below how to set it
            accent=0, # with accentology (1) or without (0), if it's available
        )
        ```
        
        ##### API can works with local base too
        ```python
        ru = rnc.corpus_name(file='local_database.csv') # it must exist
        print(ru)
        ```
        If the file exists, API works with it and you can't request new examples.
        
        ##### Working with corpora
        ```python
        corp = rnc.corpus_name(...) 
        ```
        * `corp.request_examples()` – request examples. 
        There's an exception if:
            * Data still exist. 
            * No results found.
            * Requested page doesn't exist (if there're 10 pages in the Corpus, but you've requested > 10).
            * There's a mistake in the request.
            * You have no access to Internet.
            * There's a problem while getting access to Corpus.
            * another problems...
        * `corp()` – the same as `request_examples()`.
        * `corp.data` – list of examples.
        * `corp.found_wordforms` – dict with found wordforms and their frequency.
        * `corp.dump()` – write two files: csv file with all data and json file with request params.
        * `corp.copy()` – create a copy.
        * `corp.shuffle()` – shuffle data.
        * `corp.pop(index)` – remove and return the example at the index from the data list.
        * `corp.sort(key=, reverse=)` – sort the list of examples. Here HTTP keys doesn't work.  
        * `corp.url` – URl to first page of the Corpus result.
        * `corp.open_url()` – open first page of the Corpus result.
        * `corp.add_pages()` – in developing...
        * `str(corp)` – str with info about Corpus, enumerated examples. 
        * `len(corp)` – count if examples.
        * `bool(corp)` – whether data exist.
        * `corp.dpp` or another request param.
        * `corp[index or slice]` – get element at the index or create new obj with sliced data:
        ```python
        first_ten = corp[:10]
        ``` 
        Compare corp length with length of another obj or int.  
        * `corp > `
        * `corp >= `
        * `corp < `
        * `corp <= `
        
        Also you can use cycle for. For example we want to see only left context (out=kwic) and source:
        ```python
        corp = rnc.corpus_name('корпус', 5, out='kwic', kwsz=7)
        corp.request_examples()
        for r in corp:
            print(r.left)
            print(r.src)
        ```
        
        
        #### ATTENTION
        * Don't forget to call this function
        ```python
        corp.request_examples()
        ```
        * If you've requested more than 10 pages, Corpus returns 429 error (Too many requests).
        For example requesting 100 pages you should wait about 3 minutes: 
        ![100 pages](https://github.com/FaustGoethe/RNC/blob/master/docs/100_pages.png?raw=true)
        * If you want to see messages like that:
        ```python
        rnc.corpora.stream_handler.setLevel(level='DEBUG')
        ```
        
        
        #### How to
        ##### How to set sort?
        [Here](https://github.com/FaustGoethe/RNC/blob/master/docs/HTTP%20params.md) you can find sort keys and their descriptions.
        
        
        ##### How to set subcorpus?
        There're default keys in rnc.Subcorpus.Person – Russian writers and poets: 
        * Pushkin
        * Dostoyevsky
        * TolstoyLN
        * Chekhov
        * Gogol
        * Turgenev
        
        Example:
        ```python
        ru = rnc.MainCorpus('нету', 1, subcorpus=rnc.Subcorpus.Person.Pushkin)
        ```
        
        **OR**
        
         
        ![1](https://raw.githubusercontent.com/FaustGoethe/RNC/master/docs/How%20to%20set%20subcorpus/1.png)
        ![2](https://raw.githubusercontent.com/FaustGoethe/RNC/master/docs/How%20to%20set%20subcorpus/2.png)
        ![3](https://raw.githubusercontent.com/FaustGoethe/RNC/master/docs/How%20to%20set%20subcorpus/3.png)
        ![4](https://raw.githubusercontent.com/FaustGoethe/RNC/master/docs/How%20to%20set%20subcorpus/4.png)
        
        ---
        [Documentation](https://github.com/FaustGoethe/RNC/tree/master/docs)
        
        ---
        If you found a bug or have an idea to improve the API write to me – alniconim@gmail.com.  
Platform: UNKNOWN
Requires-Python: >=3.6
Description-Content-Type: text/markdown
