Metadata-Version: 1.1
Name: hanzidentifier-preview
Version: 1.1.0
Summary: Python module that identifies Chinese text as Simplified or Traditional.
Home-page: https://github.com/tsroten/hanzidentifier
Author: Thomas Roten
Author-email: thomas@roten.us
License: UNKNOWN
Description: ================
        Hanzi Identifier
        ================
        
        .. image:: https://badge.fury.io/py/hanzidentifier.png
            :target: http://badge.fury.io/py/hanzidentifier
            
        .. image:: https://travis-ci.org/tsroten/hanzidentifier.png?branch=develop
                :target: https://travis-ci.org/tsroten/hanzidentifier
        
        Hanzi Identifier is a simple Python module that identifies a string of text as 
        having Simplified or Traditional characters.
        
        * GitHub: https://github.com/tsroten/hanzidentifier
        * Free software: MIT license
        
        About
        -----
        
        Easy-to-use helper functions for identifying strings:
        
        .. code:: python
        
            >>> import hanzidentifier
            >>> hanzidentifier.has_chinese('Hello my name is John.')
            False
            >>> hanzidentifier.is_simplified('John说：你好！')
            True
            >>> hanzidentifier.is_traditional('John說：你好！')
            True
            >>> hanzidentifier.has_chinese('Country in Simplified: 国家. Country in Traditional: 國家.')
            True
        
        Here it is without the helper functions:
        
        .. code:: python
        
            >>> hanzidentifier.identify('Hello my name is Thomas.') is hanzidentifier.UNKNOWN
            True
            >>> hanzidentifier.identify('Thomas 说：你好！') is hanzidentifier.SIMPLIFIED
            True
            >>> hanzidentifier.identify('Thomas 說：你好！') is hanzidentifier.TRADITIONAL
            True
            >>> hanzidentifier.identify('你好！') is hanzidentifier.BOTH
            True
            >>> hanzidentifier.identify('Country in Simplified: 国家. Country in Traditional: 國家.' ) is hanzidentifier.MIXED
            True
        
        ``hanzidentifier.identify`` has five possible return values:
        
        * ``hanzidentifier.UNKNOWN``: there are no recognized Chinese characters in the string.
        * ``hanzidentifier.BOTH``: the string is compatible with both Simplified and Traditional character systems.
        * ``hanzidentifier.TRADITIONAL``: the string consists of Traditional characters.
        * ``hanzidentifier.SIMPLIFIED``: the string consists of Simplified characters.
        * ``hanzidentifier.MIXED``: the string consists of characters recognized solely as Traditional characters and also consists of characters recognized solely as Simplified characters.
        
        Characters that aren't found in CC-CEDICT are ignored when determining a string's identity.
        Hanzi Identifier uses the CC-CEDICT data provided by `Zhon <https://github.com/tsroten/zhon>`_ to identify Chinese characters.
        
        Because the Traditional and Simplified Chinese character systems overlap, a
        string containing Simplified characters could identify as
        ``hanzidentifer.SIMPLIFIED`` or ``hanzidentifier.BOTH`` depending on if the
        characters are also Traditional characters.
        
        Hanzi Identifier's functions accept and return unicode.
        
        Getting Started
        ---------------
        
        * Install Hanzi Identifier: ``$ pip install hanzidentifier``
        * Report bugs and ask questions via `GitHub Issues <https://github.com/tsroten/hanzidentifier/issues>`_
        * `Contribute features or bug fixes <https://github.com/tsroten/hanzidentifier/pulls>`_
        
        
        Change Log
        ----------
        
        v1.0.2 (2015-08-06)
        ~~~~~~~~~~~~~~~~~~~
        
        * New README format
        * Adds Travis CI support
        * Uses ``io.open()`` in ``setup.py``. Fixes #1.
        
        v1.0.1 (2014-04-14)
        ~~~~~~~~~~~~~~~~~~~
        
        * Fixes URL typo.
        
        v1.0 (2014-04-12)
        ~~~~~~~~~~~~~~~~~
        
        Version 1.0 merges some changes from Dragon Mapper. It is not backwards compatible with
        the previous versions of Hanzi Identifier (e.g. some of the constants are named differently).
        
        * Merges code from `Dragon Mapper <http://github.com/tsroten/dragonmapper>`_ project.
        * Adds tox support.
        
        v0.1 (2013-04-24)
        ~~~~~~~~~~~~~~~~~
        
        * Initial release.
        
Keywords: chinese,mandarin,hanzi,characters,simplified,traditional,identify,identification,cjk
Platform: any
Classifier: Programming Language :: Python
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Topic :: Text Processing :: Linguistic
