Metadata-Version: 2.1
Name: classy-classification
Version: 0.2.1
Summary: This repository contains an easy and intuitive approach to few-shot text classification using sentence-transformers and spacy embeddings.
Home-page: https://github.com/davidberenstein1957/classy-classification
License: MIT
Keywords: spacy,rasa,few-shot classification,nlu,sentence-transformers
Author: David Berenstein
Author-email: david.berenstein@pandoraintelligence.com
Requires-Python: >=3.6,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development
Requires-Dist: scikit-learn (>=1.0.2,<2.0.0)
Requires-Dist: sentence-transformers (>=2.2.0,<3.0.0)
Requires-Dist: spacy[transformers] (>=3.2.2,<4.0.0)
Project-URL: Documentation, https://github.com/davidberenstein1957/classy-classification
Project-URL: Repository, https://github.com/davidberenstein1957/classy-classification
Description-Content-Type: text/markdown

# Classy few shot classification
This repository contains an easy and intuitive approach to few-shot text classification. 

# Why?
[Huggingface](https://huggingface.co/) does offer some nice models for few/zero-shot classification, but these are not tailored to multi-lingual approaches. Rasa NLU has [a nice approach](https://rasa.com/blog/rasa-nlu-in-depth-part-1-intent-classification/) for this, but its to embedded in their codebase for easy usage outside of Rasa/chatbots. Additionally, it made sense to integrate [sentence-transformers](https://github.com/UKPLab/sentence-transformers), instead of default [word embeddings](https://arxiv.org/abs/1301.3781). Finally, I decided to integrate with Spacy, since training a custom [Spacy TextCategorizer](https://spacy.io/api/textcategorizer) seems like a lot of hassle if you want something quick. 

# Install
``` pip install classy-classification```
# Quickstart
Take a look at the examples directory. 
## Some quick and dirty training data.
``` 
training_data = {
    "politics": [
        "Putin orders troops into pro-Russian regions of eastern Ukraine.",
        "The president decided not to go through with his speech.",
        "There is much uncertainty surrounding the coming elections.",
        "Democrats are engaged in a ‘new politics of evasion’"
    ],
    "sports": [
        "The soccer team lost.",
        "The team won by two against zero.",
        "I love all sport.",
        "The olympics were amazing.",
        "Yesterday, the tennis players wrapped up wimbledon."
    ],
    "weather": [
        "It is going to be sunny outside.",
        "Heavy rainfall and wind during the afternoon.",
        "Clear skies in the morning, but mist in the evenening.",
        "It is cold during the winter.",
        "There is going to be a storm with heavy rainfall."
    ]
}

validation_data = [
    "I am surely talking about politics.",
    "Sports is all you need.",
    "Weather is amazing."
]
```


## using an individual sentence-transformer
```
from classy_classification import classyClassifier

classifier = classyClassifier(data=training_data)
classifier(validation_data[0])
classifier.pipe(validation_data)

# overwrite training data
classifier.set_training_data(data=new_training_data)

# overwrite [embedding model](https://www.sbert.net/docs/pretrained_models.html)
classifier.set_embedding_model(model="paraphrase-MiniLM-L3-v2")

# overwrite SVC config
classifier.set_svc(
    config={                              
        "C": [1, 2, 5, 10, 20, 100],
        "kernels": ["linear"],                              
        "max_cross_validation_folds": 5
    }
)
```

## external sentence-transformer within spacy pipeline
```
import spacy

import classy_classification

nlp = spacy.blank("en")
nlp.add_pipe("text_categorizer", config={"data": training_data}) # provide similar config as above
nlp(validation_data[0])._.cats
nlp.pipe(validation_data)
```
## internal spacy word2vec embeddings
```
import spacy

import classy_classification

nlp = spacy.load("en_core_web_md") 
nlp.add_pipe("text_categorizer", config={"data": training_data, "model": "spacy"}) #use internal embeddings from spacy model
nlp(validation_data[0])._.cats
nlp.pipe(validation_data)
```

# Todo
[ ] look into a way to integrate spacy trf models.


# Inspiration Drawn From
- [Scikit-learn](https://github.com/scikit-learn/scikit-learn)
- [Rasa NLU](https://github.com/RasaHQ/rasa) 
- [Sentence Transformers](https://github.com/UKPLab/sentence-transformers)
- [Spacy](https://github.com/explosion/spaCy)

