Metadata-Version: 2.1
Name: ekorpkit
Version: 0.1.38.post0.dev31
Summary: eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization.
Home-page: https://github.com/entelecheia/ekorpkit
Author: Young Joon Lee
License: UNKNOWN
Description: # ekorpkit 【iːkɔːkɪt】 : **eKo**nomic **R**esearch **P**ython Tool**kit**
        
        [![PyPI version](https://badge.fury.io/py/ekorpkit.svg)](https://badge.fury.io/py/ekorpkit) [![Jupyter Book Badge](https://jupyterbook.org/en/stable/_images/badge.svg)](https://entelecheia.github.io/ekorpkit-book/) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6497226.svg)](https://doi.org/10.5281/zenodo.6497226) [![release](https://github.com/entelecheia/ekorpkit/actions/workflows/release.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/release.yaml) [![CodeQL](https://github.com/entelecheia/ekorpkit/actions/workflows/codeql-analysis.yml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/codeql-analysis.yml) [![test](https://github.com/entelecheia/ekorpkit/actions/workflows/test.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/test.yaml) [![CircleCI](https://circleci.com/gh/entelecheia/ekorpkit/tree/main.svg?style=shield)](https://circleci.com/gh/entelecheia/ekorpkit/tree/main) [![codecov](https://codecov.io/gh/entelecheia/ekorpkit/branch/main/graph/badge.svg?token=8I4ORHRREL)](https://codecov.io/gh/entelecheia/ekorpkit) [![markdown-autodocs](https://github.com/entelecheia/ekorpkit/actions/workflows/markdown-autodocs.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/markdown-autodocs.yaml)
        
        eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization. Its powerful config composition is backed by [Hydra](https://hydra.cc/).
        
        ## Key features
        
        ### Easy Configuration
        
        - You can compose your configuration dynamically, enabling you to easily get the perfect configuration for each research. 
        - You can override everything from the command line, which makes experimentation fast, and removes the need to maintain multiple similar configuration files. 
        - With a help of the **eKonf** class, it is also easy to compose configurations in a jupyter notebook environment.
        
        ### No Boilerplate
        
        - eKorpkit lets you focus on the problem at hand instead of spending time on boilerplate code like command line flags, loading configuration files, logging etc.
        
        ### Workflows
        
        - A workflow is a configurable automated process that will run one or more jobs.
        - You can divide your research into several unit jobs (tasks), then combine those jobs into one workflow.
        - You can have multiple workflows, each of which can perform a different set of tasks.
        
        ### Sharable and Reproducible
        
        - With eKorpkit, you can easily share your datasets and models.
        - Sharing configs along with datasets and models makes every research reproducible.
        - You can share each unit jobs or an entire workflow.
        
        ### Pluggable Architecture
        
        - eKorpkit has a pluggable architecture, enabling it to combine with your own implementation.
        
        ## [Tutorials](https://entelecheia.github.io/ekorpkit-book)
        
        Tutorials for [ekorpkit](https://github.com/entelecheia/ekorpkit) package can be found at https://entelecheia.github.io/ekorpkit-book/
        
        ## [Installation](https://entelecheia.github.io/ekorpkit-book/docs/basics/install.html)
        
        Install the latest version of ekorpkit:
        
        ```bash
        pip install ekorpkit
        ```
        
        To install all extra dependencies,
        
        ```bash
        pip install ekorpkit[all]
        ```
        
        ## [The eKorpkit Corpus](https://github.com/entelecheia/ekorpkit/blob/main/docs/corpus/README.md)
        
        The eKorpkit Corpus is a large, diverse, bilingual (ko/en) language modelling dataset.
        
        ![ekorpkit corpus](https://github.com/entelecheia/ekorpkit/blob/main/docs/figs/ekorpkit_corpus.png?raw=true)
        
        ## Citation
        
        ```tex
        @software{lee_2022_6497226,
          author       = {Young Joon Lee},
          title        = {eKorpkit: eKonomic Research Python Toolkit},
          month        = apr,
          year         = 2022,
          publisher    = {Zenodo},
          doi          = {10.5281/zenodo.6497226},
          url          = {https://doi.org/10.5281/zenodo.6497226}
        }
        ```
        
        ```tex
        @software{lee_2022_ekorpkit,
          author       = {Young Joon Lee},
          title        = {eKorpkit: eKonomic Research Python Toolkit},
          month        = apr,
          year         = 2022,
          publisher    = {GitHub},
          url          = {https://github.com/entelecheia/ekorpkit}
        }
        ```
        
        ## License
        
        - eKorpkit is licensed under the Creative Commons License(CCL) 4.0 [CC-BY](https://creativecommons.org/licenses/by/4.0). This license covers the eKorpkit package and all of its components.
        - Each corpus adheres to its own license policy. Please check the license of the corpus before using it!
        
Platform: UNKNOWN
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: all
Provides-Extra: tokenize
Provides-Extra: fugashi
Provides-Extra: mecab
Provides-Extra: mecab-python3
Provides-Extra: mecab-ko-dic
Provides-Extra: dataset
Provides-Extra: tokenize-en
Provides-Extra: pysbd
Provides-Extra: pynori
Provides-Extra: soynlp
Provides-Extra: nltk
Provides-Extra: sacremoses
Provides-Extra: emoji
Provides-Extra: tomotopy
Provides-Extra: topic
Provides-Extra: wordcloud
Provides-Extra: visualize
Provides-Extra: pyLDAvis
Provides-Extra: pyhwp
Provides-Extra: hwp
Provides-Extra: bok
Provides-Extra: parser
Provides-Extra: wikiextractor
Provides-Extra: wiki
Provides-Extra: namu-wiki-extractor
Provides-Extra: fomc
Provides-Extra: pdfplumber
Provides-Extra: mail
Provides-Extra: mail-parser
Provides-Extra: pubmed
Provides-Extra: pubmed_parser
Provides-Extra: jsonpath-ng
Provides-Extra: beautifulsoup4
Provides-Extra: html
Provides-Extra: html-to-json
Provides-Extra: fasttext-langdetect
Provides-Extra: langdetect
Provides-Extra: edgar
Provides-Extra: cssutils
Provides-Extra: transformers
Provides-Extra: accelerate
Provides-Extra: model
Provides-Extra: evaluate
Provides-Extra: simpletransformers
Provides-Extra: datasets
Provides-Extra: tensorflow
Provides-Extra: automl
Provides-Extra: flaml
Provides-Extra: yellowbrick
Provides-Extra: scipy
Provides-Extra: aws
Provides-Extra: cached-path
Provides-Extra: boto3
Provides-Extra: google
Provides-Extra: google-api-python-client
Provides-Extra: google-cloud-storage
Provides-Extra: huggingface-hub
Provides-Extra: hf
Provides-Extra: modin
Provides-Extra: ddbackend
Provides-Extra: ray
Provides-Extra: wget
Provides-Extra: fetch
Provides-Extra: py7zr
Provides-Extra: zstandard
Provides-Extra: pymongo
Provides-Extra: database
Provides-Extra: plotly
Provides-Extra: doc
Provides-Extra: kaleido
Provides-Extra: rich
Provides-Extra: matplotlib
Provides-Extra: seaborn
Provides-Extra: Pillow
Provides-Extra: tensorflow-datasets
Provides-Extra: lsh
Provides-Extra: orjson
Provides-Extra: datasketch
Provides-Extra: pysimdjson
Provides-Extra: p_tqdm
Provides-Extra: joblib
Provides-Extra: jsonlines
Provides-Extra: loky
Provides-Extra: pathos
Provides-Extra: guesslang
Provides-Extra: openpyxl
Provides-Extra: numba
Provides-Extra: fredapi
Provides-Extra: fred
Provides-Extra: nasdaq-data-link
Provides-Extra: nasdaq
Provides-Extra: statsmodels
Provides-Extra: stats
Provides-Extra: art
Provides-Extra: lpips
Provides-Extra: disco
Provides-Extra: timm
Provides-Extra: einops
Provides-Extra: jax
Provides-Extra: dalle-mini
Provides-Extra: flax
Provides-Extra: unidecode
Provides-Extra: opencv-python
Provides-Extra: label
Provides-Extra: cleanlab
Provides-Extra: snorkel
Provides-Extra: rubrix
Provides-Extra: exhaustive
