# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['pysentimiento', 'pysentimiento.baselines', 'pysentimiento.lince']

package_data = \
{'': ['*']}

install_requires = \
['datasets>=1.13.3,<2.0.0',
 'emoji>=1.6.1,<2.0.0',
 'sklearn>=0.0,<0.1',
 'torch>=1.9.0,<2.0.0',
 'transformers==4.13']

setup_kwargs = {
    'name': 'pysentimiento',
    'version': '0.4.1',
    'description': 'A Transformer-based library for SocialNLP tasks',
    'long_description': '# pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks\n\n\n![Tests](https://github.com/finiteautomata/pysentimiento/workflows/run_tests/badge.svg)\n\nA Transformer-based library for SocialNLP tasks.\n\nCurrently supports:\n\n- Sentiment Analysis (Spanish, English)\n- Emotion Analysis (Spanish, English)\n- Hate Speech Detection (Spanish, English)\n- Named Entity Recognition (Spanish + English)\n- POS Tagging (Spanish + English)\n\n\nJust do `pip install pysentimiento` and start using it:\n\n## Getting Started\n\n[![Test it in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pysentimiento/pysentimiento/blob/master/notebooks/PySentimiento_Sentiment_Analysis_in_Spanish.ipynb)\n\n```python\nfrom pysentimiento import create_analyzer\nanalyzer = create_analyzer(task="sentiment", lang="es")\n\nanalyzer.predict("Qué gran jugador es Messi")\n# returns AnalyzerOutput(output=POS, probas={POS: 0.998, NEG: 0.002, NEU: 0.000})\nanalyzer.predict("Esto es pésimo")\n# returns AnalyzerOutput(output=NEG, probas={NEG: 0.999, POS: 0.001, NEU: 0.000})\nanalyzer.predict("Qué es esto?")\n# returns AnalyzerOutput(output=NEU, probas={NEU: 0.993, NEG: 0.005, POS: 0.002})\n\nanalyzer.predict("jejeje no te creo mucho")\n# AnalyzerOutput(output=NEG, probas={NEG: 0.587, NEU: 0.408, POS: 0.005})\n"""\nEmotion Analysis in English\n"""\n\nanalyzer = create_analyzer(task="emotion", lang="en")\n\nemotion_analyzer.predict("yayyy")\n# returns AnalyzerOutput(output=joy, probas={joy: 0.723, others: 0.198, surprise: 0.038, disgust: 0.011, sadness: 0.011, fear: 0.010, anger: 0.009})\nemotion_analyzer.predict("fuck off")\n# returns AnalyzerOutput(output=anger, probas={anger: 0.798, surprise: 0.055, fear: 0.040, disgust: 0.036, joy: 0.028, others: 0.023, sadness: 0.019})\n\n"""\nHate Speech (misogyny & racism)\n"""\nhate_speech_analyzer = create_analyzer(task="hate_speech", lang="es")\n\nhate_speech_analyzer.predict("Esto es una mierda pero no es odio")\n# returns AnalyzerOutput(output=[], probas={hateful: 0.022, targeted: 0.009, aggressive: 0.018})\nhate_speech_analyzer.predict("Esto es odio porque los inmigrantes deben ser aniquilados")\n# returns AnalyzerOutput(output=[\'hateful\'], probas={hateful: 0.835, targeted: 0.008, aggressive: 0.476})\n\nhate_speech_analyzer.predict("Vaya guarra barata y de poca monta es XXXX!")\n# returns AnalyzerOutput(output=[\'hateful\', \'targeted\', \'aggressive\'], probas={hateful: 0.987, targeted: 0.978, aggressive: 0.969})\n```\n\nAlso, you might use pretrained models directly with [`transformers`](https://github.com/huggingface/transformers) library.\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\ntokenizer = AutoTokenizer.from_pretrained("pysentimiento/robertuito-sentiment-analysis")\n\nmodel = AutoModelForSequenceClassification.from_pretrained("pysentimiento/robertuito-sentiment-analysis")\n```\n\n## Preprocessing\n\n`pysentimiento` features a tweet preprocessor specially suited for tweet classification with transformer-based models.\n\n```python\nfrom pysentimiento.preprocessing import preprocess_tweet\n\n# Replaces user handles and URLs by special tokens\npreprocess_tweet("@perezjotaeme debería cambiar esto http://bit.ly/sarasa") # "@usuario debería cambiar esto url"\n\n# Shortens repeated characters\npreprocess_tweet("no entiendo naaaaaaaadaaaaaaaa", shorten=2) # "no entiendo naadaa"\n\n# Normalizes laughters\npreprocess_tweet("jajajajaajjajaajajaja no lo puedo creer ajajaj") # "jaja no lo puedo creer jaja"\n\n# Handles hashtags\npreprocess_tweet("esto es #UnaGenialidad")\n# "esto es una genialidad"\n\n# Handles emojis\npreprocess_tweet("🎉🎉", lang="en")\n# \'emoji party popper emoji emoji party popper emoji\'\n```\n\n## Trained models so far\n\nCheck [CLASSIFIERS.md](CLASSIFIERS.md) for details on the reported performances of each model.\n\n\n## Instructions for developers\n\n0. Clone and install\n\n```\ngit clone https://github.com/pysentimiento/pysentimiento\npip install poetry\npoetry shell\npoetry install\n```\n\n1. Get the data and put it under `data/`\n\nOpen an issue or email us if you are not able to get the it.\n\n2. Run script to train models\n\nCheck [TRAIN.md](TRAIN.md) for further information on how to train your models\n\n3. Upload models to Huggingface\'s Model Hub\n\nCheck ["Model sharing and upload"](https://huggingface.co/transformers/model_sharing.html) instructions in `huggingface` docs.\n\n## License\n\n`pysentimiento` is an open-source library. However, please be aware that models are trained with third-party datasets and are subject to their respective licenses, many of which are for non-commercial use\n\n1. [TASS Dataset license](http://tass.sepln.org/tass_data/download.php) (License for Sentiment Analysis in Spanish, Emotion Analysis in Spanish & English)\n2. [SEMEval 2017 Dataset license](https://www.dropbox.com/s/byzr8yoda6bua1b/2017_English_final.zip?file_subpath=%2F2017_English_final%2FDOWNLOAD%2FREADME.txt) (Sentiment Analysis in English)\n\n3. [LinCE Datasets](https://ritual.uh.edu/lince/datasets) (License for NER & POS tagging)\n\n## Suggestions and bugfixes\n\nPlease use the repository [issue tracker](https://github.com/pysentimiento/pysentimiento/issues) to point out bugs and make suggestions (new models, use another datasets, some other languages, etc)\n\n\n## Citation\n\nIf you use `pysentimiento` in your work, please cite [this paper](https://arxiv.org/abs/2106.09462)\n\n```\n@misc{perez2021pysentimiento,\n      title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},\n      author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque},\n      year={2021},\n      eprint={2106.09462},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n\nAlso, pleace cite related pre-trained models and datasets for the specific models you use:\n\n```bibtex\n\n%%%%%%%%%%%%%%%%%%%%%%%%%%\n% Pretrained models      %\n%%%%%%%%%%%%%%%%%%%%%%%%%%\n% RoBERTuito\n@article{perez2021robertuito,\n  title={RoBERTuito: a pre-trained language model for social media text in Spanish},\n  author={P{\\\'e}rez, Juan Manuel and Furman, Dami{\\\'a}n A and Alemany, Laura Alonso and Luque, Franco},\n  journal={arXiv preprint arXiv:2111.09453},\n  year={2021}\n}\n% BETO\n@article{canete2020spanish,\n  title={Spanish pre-trained bert model and evaluation data},\n  author={Canete, Jos{\\\'e} and Chaperon, Gabriel and Fuentes, Rodrigo and Ho, Jou-Hui and Kang, Hojin and P{\\\'e}rez, Jorge},\n  journal={Pml4dc at iclr},\n  volume={2020},\n  pages={2020},\n  year={2020}\n}\n% BERTweet\n@inproceedings{nguyen2020bertweet,\n  title={BERTweet: A pre-trained language model for English Tweets},\n  author={Nguyen, Dat Quoc and Vu, Thanh and Nguyen, Anh Tuan},\n  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},\n  pages={9--14},\n  year={2020}\n}\n%%%%%%%%%%%%%%%%%%%%%%%%%%\n% Datasets               %\n%%%%%%%%%%%%%%%%%%%%%%%%%%\n% TASS 2020 (sentiment in Spanish)\n\n@article{garcia2020overview,\n  title={Overview of TASS 2020: introducing emotion detection},\n  author={Garc{\\\'\\i}a-Vegaa, Manuel and D{\\\'\\i}az-Galianoa, Manuel Carlos and Garc{\\\'\\i}a-Cumbrerasa, Miguel {\\\'A} and del Arcoa, Flor Miriam Plaza and Montejo-R{\\\'a}eza, Arturo and Jim{\\\'e}nez-Zafraa, Salud Mar{\\\'\\i}a and C{\\\'a}marab, Eugenio Mart{\\\'\\i}nez and Aguilarc, C{\\\'e}sar Antonio and Antonio, Marco and Cabezudod, Sobrevilla and others},\n  year={2020}\n}\n\n% EmoEvent (Emotion Analysis Spanish & English)\n\n@inproceedings{del2020emoevent,\n  title={EmoEvent: A multilingual emotion corpus based on different events},\n  author={del Arco, Flor Miriam Plaza and Strapparava, Carlo and Lopez, L Alfonso Urena and Mart{\\\'\\i}n-Valdivia, M Teresa},\n  booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},\n  pages={1492--1498},\n  year={2020}\n}\n\n% Hate Speech Detection (Spanish & English)\n\n\n@inproceedings{hateval2019semeval,\n  title={SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter},\n  author={Basile, Valerio and Bosco, Cristina and Fersini, Elisabetta and Nozza, Debora and Patti, Viviana and Rangel, Francisco and Rosso, Paolo and Sanguinetti, Manuela},\n  booktitle={Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019)},\n  year={2019},\n  publisher= {Association for Computational Linguistics}\n}\n% Sentiment Analysis in English\n\n@article{nakov2019semeval,\n  title={SemEval-2016 task 4: Sentiment analysis in Twitter},\n  author={Nakov, Preslav and Ritter, Alan and Rosenthal, Sara and Sebastiani, Fabrizio and Stoyanov, Veselin},\n  journal={arXiv preprint arXiv:1912.01973},\n  year={2019}\n}\n\n% LinCE (NER & POS Tagging)\n\n@inproceedings{aguilar2020lince,\n  title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation},\n  author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar},\n  booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},\n  pages={1803--1813},\n  year={2020}\n}\n```\n',
    'author': 'Juan Manuel Pérez',
    'author_email': 'jmperez@dc.uba.ar',
    'maintainer': None,
    'maintainer_email': None,
    'url': 'https://github.com/pysentimiento/pysentimiento/',
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'python_requires': '>=3.7,<3.10',
}


setup(**setup_kwargs)
