Metadata-Version: 1.1
Name: unicategories
Version: 0.0.5
Summary: Unicode category database
Home-page: https://github.com/ergoithz/unicategories
Author: Felipe A. Hernandez
Author-email: ergoithz@gmail.com
License: MIT
Download-URL: https://github.com/ergoithz/unicategories/archive/0.0.5.tar.gz
Description: unicategories
        =============
        
        Unicode category database, generated on setup.
        
        This module exposes a category dictionary containing `RangeGroup`s, an
        immutable iterable (based on tuple, with some useful methods) of (start, end)
        tuples being, like python's `range`, open at the end.
        
        This method have been chosen for memory efficiency, storing individually all
        characters on memory would take a lot of memory.
        
        RangeGroup class provides the following methods:
        
        * **RangeGroup.characters(self)**::
        
            Get iterator with all characters on this range group.
        
            :yields: iterator of characters (str of size 1)
            :ytype: str
        
        * **RangeGroup.codes(self)**::
        
            Get iterator for all unicode code points contained in this range group.
        
            :yields: iterator of character index (int)
            :ytype: int
        
        * **RangeGroup.has(self, character)**::
        
            Get if character (or character code point) is contained by any range on
            this range group.
        
            :param character: character or unicode code point to look for
            :type character: str or int
            :returns: True if character is contained by any range, False otherwise
            :rtype: bool
        
        Example
        -------
        
        .. code:: python
        
            from unicategories import categories
        
            upperchars = categories['Lu'].characters()  # iterator
            print('Unicode uppercase caracters are "%s"' % ''.join(upperchars))
            # Unicode uppercase caracters are "ABCDEF..."
        
        Unicode categories
        ------------------
        
        Taken from `wikipedia <https://en.wikipedia.org/wiki/Template:General_Category_(Unicode)>`_.
        
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Value  | Category Major, minor      | Basic type     | Character assigned     | Fixed                                                       | Remarks                                                                                                                   |
        +========+============================+================+========================+=============================================================+===========================================================================================================================+
        | Lu     | Letter, uppercase          | Graphic        | Character              |                                                             |                                                                                                                           |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Ll     | Letter, lowercase          | Graphic        | Character              |                                                             |                                                                                                                           |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Lt     | Letter, titlecase          | Graphic        | Character              |                                                             | Ligatures containing uppercase followed by lowercase letters (e.g., ǅ , ǈ , ǋ , and ǲ )                                |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Lm     | Letter, modifier           | Graphic        | Character              |                                                             |                                                                                                                           |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Lo     | Letter, other              | Graphic        | Character              |                                                             |                                                                                                                           |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Mn     | Mark, nonspacing           | Graphic        | Character              |                                                             |                                                                                                                           |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Mc     | Mark, spacing combining    | Graphic        | Character              |                                                             |                                                                                                                           |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Me     | Mark, enclosing            | Graphic        | Character              |                                                             |                                                                                                                           |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Nd     | Number, decimal digit      | Graphic        | Character              |                                                             | All these, and only these, have Numeric Type = De                                                                         |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Nl     | Number, letter             | Graphic        | Character              |                                                             | Numerals composed of letters or letterlike symbols (e.g., Roman numerals )                                                |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | No     | Number, other              | Graphic        | Character              |                                                             | E.g., vulgar fractions , superscript and subscript digits                                                                 |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Pc     | Punctuation, connector     | Graphic        | Character              |                                                             | Includes "_" underscore                                                                                                   |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Pd     | Punctuation, dash          | Graphic        | Character              |                                                             | Includes several hyphen characters                                                                                        |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Ps     | Punctuation, open          | Graphic        | Character              |                                                             | Opening bracket characters                                                                                                |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Pe     | Punctuation, close         | Graphic        | Character              |                                                             | Closing bracket characters                                                                                                |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Pi     | Punctuation, initial quote | Graphic        | Character              |                                                             | Opening quotation mark . Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Pf     | Punctuation, final quote   | Graphic        | Character              |                                                             | Closing quotation mark. May behave like Ps or Pe depending on usage                                                       |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Po     | Punctuation, other         | Graphic        | Character              |                                                             |                                                                                                                           |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Sm     | Symbol, math               | Graphic        | Character              |                                                             |                                                                                                                           |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Sc     | Symbol, currency           | Graphic        | Character              |                                                             |                                                                                                                           |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Sk     | Symbol, modifier           | Graphic        | Character              |                                                             |                                                                                                                           |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | So     | Symbol, other              | Graphic        | Character              |                                                             |                                                                                                                           |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Zs     | Separator, space           | Graphic        | Character              |                                                             | Includes the space, but not TAB , CR , or LF , which are Cc                                                               |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Zl     | Separator, line            | Format         | Character              |                                                             | Only U+2028 LINE SEPARATOR (LSEP)                                                                                         |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Zp     | Separator, paragraph       | Format         | Character              |                                                             | Only U+2029 PARAGRAPH SEPARATOR (PSEP)                                                                                    |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Cc     | Other, control             | Control        | Character              | Fixed 65                                                    | No name     , <control>                                                                                                   |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Cf     | Other, format              | Format         | Character              |                                                             | Includes the soft hyphen , control characters to support bi-directional text , and language tag characters                |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Cs     | Other, surrogate           | Surrogate      | Not (but abstract)     | Fixed 2,048                                                 | No name     , <surrogate>                                                                                                 |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Co     | Other, private use         | Private-use    | Not (but abstract)     | Fixed 137,468 total: 6,400 in BMP , 131,068 in Planes 15–16 | No name     , <private-use>                                                                                               |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Cn     | Other, not assigned        | Noncharacter   | Not                    | Fixed 66                                                    | No name     , <noncharacter>                                                                                              |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        | Cn     | Other, not assigned        | Reserved       | Not                    | Not fixed                                                   | No name     , <reserved>                                                                                                  |
        +--------+----------------------------+----------------+------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
        
Keywords: unicode
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
