Metadata-Version: 2.1
Name: TweetBench-andrewmagill
Version: 0.0.2
Summary: TweetBench allows you to queue, run, and benchmark Tweet classification expirements with minimal configuration.
Home-page: https://git.txstate.edu/CS7311/a-m730/tree/master/Project/source
Author: Andrew MAgill
Author-email: andrewmagill@txstate.edu
License: UNKNOWN
Description: # TweetBench
        
        TweetBench allows you to queue, run, and benchmark Tweet classification expirements with minimal configuration. TweetBench imorts libraries and utilities, loads data, gathers expirements, executes pipeline on five different train/test splits, evaluates, averages, and compares scores to baseline, and generates a submission file for you. All you have to do is add and modify the pipelines in the ```./expirements/``` folder (Jupyter Notebook or Python scripts) with your parameters.
        
        ### Prerequisites
        
        * Python 3
        * steuptools
        * wheel
        * virtualenv (optional)
        
        ### Requirements (included in installation)
        
        * Jupyter Notebook
        * mapplotlib
        * pandas
        * scikit-learn
        
        ### Installation
        
        Clone this repository
        
        > git clone git@git.txstate.edu:CS7311/a-m730.git # or https://git.txstate.edu/CS7311/a-m730.git
        
        > cd a-m730/Project/source
        
        It is recommended that you work in a virtual environment:
        
        > python -m virtualenv tweetbench_env && source tweetbench_env
        
        Run installation:
        
        > python3 -m pip install --index-url https://test.pypi.org/simple/ --no-deps TweetBench-andrewmagill
        
        ### Run Benchmark Pipeline
        
        Start Jupyter Notebook:
        
        > jupyter notebook
        
        Open and execute benchmark.ipynb to run the expirements contained in ```./exprements/```. To add a new expirement to the queue, simply add another Jupyter Notebook or python script to the ```./expirements/``` directory and re-run the notebook. Results will be displayed in the benchmark.ipynb Notebook, and written to the ```./output/``` directory.
        
        ### Creating New Expriments
        
        TweetBench will run pipelines found in any Jupyter Notebook or python script (.py file) in the expirements folder. There are some requirements, in order for an expirement to run, it must be written as a [scikit-learn Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) (documentation and examples can be found [here](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html)) 
        
        Example, the simplest possible pipeline, which should be run as the baseline for most of your expirements:
        
        > pipeline = Pipeline([('vectorizer', CountVectorizer()), ('classifier', LogisticRegression()))
        
        You may also want to include metadata for your expirement. This is an optional step, but necessary if you want to designate a pipeline as your baseline for comparison. Your metadata variable must be named *META* and must be structured in the form of a Python dictionary and optionally contain the following fields: **name**:str, **desription**: str, **baselinei**: bool. Your pipeline's parameters will be inserted into the metadata, and output along with your expirement evaluation scores and predictions.
        
        Example metadata:
        
        >META = {  
        >    "name": "fine-grained logreg text classifier",  
        >    "description": "Fine grained four classification: 5G Conspiracy, Other-Conspiracy, Non-conspiracy, Indeterminate",  
        >    "baseline": False  
        >}  
        
        ### MediaEval 2020: FakeNews
        
        The code used for the coarse and fine grained text classification, and classification augmented by OCR on Tweet images, as well as Lia Nogueria's community labels are included in the ```./expirements/``` folder.
        
        * [001 - Fine-grained text classification](https://git.txstate.edu/CS7311/a-m730/blob/master/Project/source/expirements/001.ipynb)
        * [002 - Fine-grained text classification with OCR](https://git.txstate.edu/CS7311/a-m730/blob/master/Project/source/expirements/002.ipynb)
        * [004 - Fine-grained text classification with community labels](https://git.txstate.edu/CS7311/a-m730/blob/master/Project/source/expirements/004.ipynb)
        * [011 - Coarse-grained text classification](https://git.txstate.edu/CS7311/a-m730/blob/master/Project/source/expirements/011.ipynb)
        * [012 - Coarse-grained text classification with OCR](https://git.txstate.edu/CS7311/a-m730/blob/master/Project/source/expirements/012.ipynb)
        * [014 - Coarse-grained text classification with community labels](https://git.txstate.edu/CS7311/a-m730/blob/master/Project/source/expirements/014.ipynb)
        
        **Note: These expirements are run in the [benchmark.ipynb]() notebook that imports libraries, loads data, gathers pipelines, and outputs results.**
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
