Metadata-Version: 2.1
Name: functionwords
Version: 0.8
Summary: Extract curated Chinese and English function words from texts.
Home-page: https://github.com/Wang-Haining/functionwords
License: CC-BY-SA 4.0
Author: Haining Wang
Author-email: hw56@indiana.edu
Requires-Python: >=3.8,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Project-URL: Repository, https://github.com/Wang-Haining/functionwords
Description-Content-Type: text/markdown

# functionwords
[![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg)](
https://creativecommons.org/licenses/by-nc-sa/4.0/)

The `functionwords` package provides **curated** Chinese and English function words.
It supports five function word lists, as listed below.
Chinese function words are only available in simplified form.


|`Function_words_list`      |# of function words| &nbsp; &nbsp; &nbsp; &nbsp;Description &nbsp; &nbsp; &nbsp; &nbsp;|
|:----:|:----:|:----|
| `chinese_simplified_modern`      |  819   |compiled from the [dictionary][1]     |
| `chinese_classical_naive`        |  32    |harvested from the [platforms][2]     |
| `chinese_classical_comprehensive`|  466   |compiled from the [dictionary][3]     |
| `chinese_comprehensive`          |  1,122 | a combination of `chinese_simplified_modern`, `chinese_classical_naive`, and `chinese_classical_comprehensive`|
| `english`                        |  512   |found in  [software][4]               |

The `FunctionWords` class does the heavy lifting.
Initiate it with the desired `function_words_list`.
The instance has two methods `transform()` and `get_feature_names()`) and
three attributes (`function_words_list`, `function_words`, and `description`).

For more details, see FunctionWords instance's attribute `description`.

## Installation

```bash
pip install -U functionwords
```

## Getting started


```python
from functionwords import FunctionWords

raw = "The present King of Singapore is bald."

# to instantiate a FunctionWords instance
# `function_words_list` can be either 'chinese_classical_comprehensive', 
# 'chinese_classical_naive', 'chinese_simplified_modern', or 'english'
fw = FunctionWords(function_words_list='english')

# to count function words accordingly
# returns a list of counts
fw.transform(raw)

# to list all function words given `function_words_list`
# returns a list
fw.get_feature_names()

```

## Requirements

Only Python 3.8+ is required.

## Important links

- Source code: https://github.com/Wang-Haining/functionwords
- Issue tracker: https://github.com/Wang-Haining/functionwords/issues

## Version

- Created on March 17, 2021. v.0.5, launch.
- Modified on Nov. 19, 2021. v.0.6, fix bugs in extracting Chinese ngram features.
- Modified on Jan. 03, 2022. v.0.7, add `chinese_comprehensive` feature set.
- Modified on Jan. 23, 2022. v.0.8, count Chinese ngram features finely.

## Licence

This package is licensed under [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).

## References
[1]: Ziqiang, W. (1998). Modern Chinese Dictionary of Function Words. Shanghai Dictionary Press.

[2]: https://baike.baidu.com/item/%E6%96%87%E8%A8%80%E8%99%9A%E8%AF%8D and 
https://zh.m.wikibooks.org/zh-hans/%E6%96%87%E8%A8%80/%E8%99%9B%E8%A9%9E

[3]: Hai, W., Changhai, Z., Shan, H., Keying, W. (1996). Classical Chinese Dictionary of Function Words. Peking University Press.

[4]: [Jstylo](https://github.com/psal/jstylo/blob/master/src/main/resources/edu/drexel/psal/resources/koppel_function_words.txt).


