Metadata-Version: 2.2
Name: zh-normalization
Version: 0.0.2
Summary: Chinese Text Normalization(for speech recognition and text to speech)
Home-page: https://github.com/shibing624/zh-normalization
Author: XuMing
Author-email: xuming624@qq.com
License: Apache 2.0
Keywords: TTS,ASR,text to speech,speech
Platform: Windows
Platform: Linux
Platform: Solaris
Platform: Mac OS-X
Platform: Unix
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Natural Language :: Chinese (Simplified)
Classifier: Natural Language :: Chinese (Traditional)
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pypinyin
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: platform
Dynamic: requires-dist
Dynamic: summary

# zh-normalization
Chinese sentence NSW(Non-Standard-Word) Normalization

## Supported NSW (Non-Standard-Word) Normalization

|NSW type|raw|normalized|
|:--|:-|:-|
|serial number|电影中梁朝伟扮演的陈永仁的编号27149|电影中梁朝伟扮演的陈永仁的编号二七一四九|
|cardinal|这块黄金重达324.75克<br>我们班的最高总分为583分|这块黄金重达三百二十四点七五克<br>我们班的最高总分为五百八十三分|
|numeric range |12\~23<br>-1.5\~2|十二到二十三<br>负一点五到二|
|date|她出生于86年8月18日，她弟弟出生于1995年3月1日|她出生于八六年八月十八日， 她弟弟出生于一九九五年三月一日|
|time|等会请在12:05请通知我|等会请在十二点零五分请通知我
|temperature|今天的最低气温达到-10°C|今天的最低气温达到零下十度
|fraction|现场有7/12的观众投出了赞成票|现场有十二分之七的观众投出了赞成票|
|percentage|明天有62％的概率降雨|明天有百分之六十二的概率降雨|
|money|随便来几个价格12块5，34.5元，20.1万|随便来几个价格十二块五，三十四点五元，二十点一万|
|telephone|这是固话0421-33441122<br>这是手机+86 18544139121|这是固话零四二一三三四四一一二二<br>这是手机八六一八五四四一三九一二一|

## Usage
```shell
pip install zh-normalization
```

Run the following code to normalize the Chinese sentence:
```python
from zh_normalization import TextNormalizer

m = TextNormalizer()
text = "电影中梁朝伟扮演的陈永仁的编号27149!"
sents = m.normalize(text)
new_text = ''.join(sents)
print(new_text)
```

Output:
```shell
电影中梁朝伟扮演的陈永仁的编号二七幺四九!
```
## References
[Pull requests #658 of DeepSpeech](https://github.com/PaddlePaddle/DeepSpeech/pull/658/files)
