# Hover

`Hover` is a machine learning helper library that enables smooth human supervision. In other words, it's an interface where you hover over and label your data.. differently. Think driving a hovercraft compared with being on foot.

Hover caters to a variety of programming backgrounds: the "vanilla" use case should be friendly to a Python/Pandas beginner, and there are optional but powerful functionalities to pick up. 

## The vanilla use case



## Docs

For documentation, please check out the mkdocs site.

## Dependencies

`./requirements.txt` is for developers.

Please refer to `hover/requirements.txt` for pip-installable dependencies.

Note that dev dependencies and package dependencies are placed separately.

## What is Hover?

Hover is conjectured for efficient machine teaching. It understands that efficiency can have different metrics given different context, and accommodates your needs to:

* coldstart a supervised model, with any amount of annotation (could be 0) available at first
* locate, investigate, and fix 'bad cases' of your model
* minimize costs (developer time) given desired gains (model performance)
* maximize gains (model performance) given budgeted costs (developer time)

The name comes from "patch a bunch of things together -- it works!" but it is also a reference to the World of Warcraft character [Hover](https://wowwiki.fandom.com/wiki/Hover), a powerful amalgamation of vastly different pieces, alluding to an almost-alive intelligence built of not-so-alive components.

Here's a list of what currently gets patched together:

* distant supervision `LabelingFunction`s based on [Snorkel](https://www.snorkel.org) but with additional attributes and functionality
* active learning, or just plain annotation, based on [Prodigy](https://prodi.gy) but with much more flexible model architecture and format
* prior knowledge integration compatible with any pre-trained embedding / language model
* interactive visualization based on [Bokeh](https://bokeh.org) specialized in data exploration, labeling function engineering, and neural net interpretation

## Core Modules

* `hover.annotation` - for creating labeled data and accepting/rejecting `LabelingFunction`s.
* `hover.evaluation` - for assessing `LabelingFunction`s and samples for annotation.
* `hover.generation` - for creating `LabelingFunction`s and more.
* `hover.representation` - for representing collections of texts, vector transformations, and more.
* `hover.proposal` - for selecting `LabelingFunction`s and samples for annotation.

## High-level Usage

* `hover.workflow` contains the lowest-level objects that most users _have_ to interact with.
    - `hover.workflow.Dataset` helps you manage your train/dev/test sets of data.
        - if your goal is to produce supervised data to feed to your own model, this is what you will eventually export.
    - `hover.workflow.LabelingFunctionPopulation` maintains a healthy collection of both generated and user-defined labeling functions.
        - if your goal to to produce empirical rules or a `LabelModel` of Snorkel, this is what you will eventually export.
    - `hover.workflow.Automated` puts `Dataset` and `LabelingFunctionPopulation` together, iterating and cross-checking back and forth.
        - it has a built-in model architecture, both for active learning and for establishing a baseline of model performance.
        - whether you care more about data or rules, this helps you greedily provide the most relevant supervision.
        
## Advanced Usage
* However, `Automated` should really be a template with replaceable parts -- that is, it incorporates the core modules by default, but allows users to extend any of the modules as long as the interfaces are compatible.
    - [dev] make an abstract base class as a parent class of `Automated`, thinking carefully about customization.
        - for example, `Automated` currently uses Prodigy for annotation, but one could use `hover.annotation.PromptCollector`, especially if Prodigy is not available. These two annotators behave quite differently, so the base class needs to find the greatest common factor.



## Project layout

    mkdocs.yml    # The configuration file.
    docs/
        index.md  # The documentation homepage.
        ...       # Other markdown pages, images and other files.
