Metadata-Version: 2.1
Name: pycantonese
Version: 3.2.3
Summary: PyCantonese: Cantonese Linguistics and NLP in Python
Home-page: https://pycantonese.org
Author: Jackson L. Lee
Author-email: jacksonlunlee@gmail.com
License: MIT License
Download-URL: https://pypi.org/project/pycantonese/#files
Project-URL: Bug Tracker, https://github.com/jacksonllee/pycantonese/issues
Project-URL: Source Code, https://github.com/jacksonllee/pycantonese
Description: PyCantonese: Cantonese Linguistics and NLP in Python
        ====================================================
        
        
        
        Full Documentation: https://pycantonese.org
        
        |
        
        .. image:: https://badge.fury.io/py/pycantonese.svg
           :target: https://pypi.python.org/pypi/pycantonese
           :alt: PyPI version
        
        .. image:: https://img.shields.io/pypi/pyversions/pycantonese.svg
           :target: https://pypi.python.org/pypi/pycantonese
           :alt: Supported Python versions
        
        .. image:: https://circleci.com/gh/jacksonllee/pycantonese.svg?style=shield
           :target: https://circleci.com/gh/jacksonllee/pycantonese
           :alt: CircleCI Builds
        
        |
        
        .. start-sphinx-website-index-page
        
        PyCantonese is a Python library for Cantonese linguistics and natural language
        processing (NLP). Currently implemented features (more to come!):
        
        - Accessing and searching corpus data
        - Parsing and conversion tools for Jyutping romanization
        - Stop words
        - Word segmentation
        - Part-of-speech tagging
        
        .. _download_install:
        
        Download and Install
        --------------------
        
        To download and install the stable, most recent version::
        
            $ pip install --upgrade pycantonese
        
        Ready for more?
        Check out the `Quickstart <https://pycantonese.org/quickstart.html>`_ page.
        
        Consulting
        ----------
        
        If your team would like professional assistance in using PyCantonese,
        technical consulting and training services are available.
        Please email `Jackson L. Lee <https://jacksonllee.com>`_.
        
        Support
        -------
        
        If you have found PyCantonese useful and would like to offer support,
        `buying me a coffee <https://www.buymeacoffee.com/pycantonese>`_ would go a long way!
        
        Links
        -----
        
        * Source code: https://github.com/jacksonllee/pycantonese
        * Bug tracker: https://github.com/jacksonllee/pycantonese/issues
        * Social media:
          `Facebook <https://www.facebook.com/pycantonese>`_
          and `Twitter <https://twitter.com/pycantonese>`_
        
        How to Cite
        -----------
        
        PyCantonese is authored and maintained by `Jackson L. Lee <https://jacksonllee.com>`_.
        
        A talk introducing PyCantonese:
        
        Lee, Jackson L. 2015. PyCantonese: Cantonese linguistic research in the age of big data.
        Talk at the Childhood Bilingualism Research Centre, Chinese University of Hong Kong. September 15. 2015.
        `Notes+slides <https://pycantonese.org/papers/Lee-pycantonese-2015.html>`_
        
        License
        -------
        
        MIT License. Please see ``LICENSE.txt`` in the GitHub source code for details.
        
        The HKCanCor dataset included in PyCantonese is substantially modified from
        its source in terms of format. The original dataset has a CC BY license.
        Please see ``pycantonese/data/hkcancor/README.md``
        in the GitHub source code for details.
        
        The rime-cantonese data (release 2020.09.09) is
        incorporated into PyCantonese for word segmentation and
        characters-to-Jyutping conversion.
        This data has a CC BY 4.0 license.
        Please see ``pycantonese/data/rime_cantonese/README.md``
        in the GitHub source code for details.
        
        Logo
        ----
        
        The PyCantonese logo is the Chinese character 粵 meaning Cantonese,
        with artistic design by albino.snowman (Instagram handle).
        
        Acknowledgments
        ---------------
        
        Wonderful resources with a permissive license that have been incorporated into PyCantonese:
        
        - HKCanCor
        - rime-cantonese
        
        Individuals who have contributed feedback, bug reports, etc.
        (in alphabetical order of last names):
        
        - @cathug
        - Litong Chen
        - Jenny Chim
        - @g-traveller
        - Rachel Han
        - Ryan Lai
        - Charles Lam
        - Chaak Ming Lau
        - Hill Ma
        - @richielo
        - @rylanchiu
        - Stephan Stiller
        - Tsz-Him Tsui
        - Robin Yuen
        
        .. end-sphinx-website-index-page
        
        Changelog
        ---------
        
        Please see ``CHANGELOG.md``.
        
        Setting up a Development Environment
        ------------------------------------
        
        The latest code under development is available on Github at
        `jacksonllee/pycantonese <https://github.com/jacksonllee/pycantonese>`_.
        You need to have `Git LFS <https://git-lfs.github.com/>`_ installed on your system.
        To obtain this version for experimental features or for development:
        
        .. code-block:: bash
        
           $ git clone https://github.com/jacksonllee/pycantonese.git
           $ cd pycantonese
           $ git lfs pull
           $ pip install -r dev-requirements.txt
           $ pip install -e .
        
        To run tests and styling checks:
        
        .. code-block:: bash
        
           $ pytest -vv --doctest-modules --cov=pycantonese pycantonese docs/source
           $ flake8 pycantonese
           $ black --check pycantonese
        
        To build the documentation website files:
        
        .. code-block:: bash
        
            $ python build_docs.py
Keywords: computational linguistics,natural language processing,NLP,Cantonese,linguistics,corpora,speech,language,Chinese,Jyutping
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: Chinese (Traditional)
Classifier: Natural Language :: Cantonese
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Human Machine Interfaces
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Filters
Classifier: Topic :: Text Processing :: General
Classifier: Topic :: Text Processing :: Indexing
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.6
Description-Content-Type: text/x-rst
