

<div class="row">
    <div class="col-md-4">
        <img src="resources/favicon_large.png" alt="logo" width="100" height="80" />
    </div>
    <div class="col-md-8">
        <h1>SPAR: Semantic Projection with Active Retrieval</h1>
    </div>
</div>

- [Overview](#overview)
- [Installation and Quick Start](#installation-and-quick-start)
- [Interface and Usage](#interface-and-usage)
- [Citation](#citation)

## Overview
SPAR is a Python NLP package that enables interactive quantification of text. With SPAR, you can quantify short documents (e.g., social media posts) using latent, continuous scales such as *`creativity`*, *`collaboration`*, *`danger`*, by measuring their semantic similarity with a set of example (seed) documents, for example:  _`'encourage new ways of thinking'`_, _`'working together to weather the storm'`_, _`'we are facing a deadly virus.'`_ 

Main features:

* conducts domain-adaptive and few-shots measurements, without requiring any model training or fine-tuning. It is combines the idea of semantic projection ([Grand et al. 2022](https://www.nature.com/articles/s41562-022-01316-8), Authors 2023) with active semantic search, which allows users to find the most relevant domain-specific documents to define the scales. 
* supports multiple state-of-the-arts text embedding methods, such as [Sentence Transformers](https://www.sbert.net/docs/pretrained_models.html) or [OpenAI Text Embeddings API](https://platform.openai.com/docs/guides/embeddings). 
* comes with a user-friendly web interface that makes defining scales and conducting measurements intuitive and accessible. 

SPAR is built on other open source packages such as [HuggingFace Transformers](https://huggingface.co/transformers/), [SentenceTransformers](https://github.com/UKPLab/sentence-transformers/), and [Gradio](https://gradio.app/). 

**Please note that the project is currently in a research preview (pre-alpha) stage**. To view the planned features for the project, please see the [Road Map](ROADMAP.MD).

## Installation and Quick Start
It is recommended to use a virtual environment and upgrade pip first with `pip install -U pip`. SPAR can be installed via pip: 

    pip install -U spar-measure

To launch SPAR, use the following command in your terminal:

    python -m spar_measure.gui

And open the interactive app in your browser at `http://localhost:7860/`.

If a GPU is available, SPAR will use it to speed up embedding. If you prefer not to use a CUDA device, you can set the CUDA_VISIBLE_DEVICES environment variable to an empty string:

    CUDA_VISIBLE_DEVICES="" python -m spar_measure.gui

Alternatively, click and use Google Colab to run SPAR:   
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ISR2022128/resources/example_colab.ipynb)

See full documentation for other usage options [here](resources/Manual.MD). 

## Interface and Usage

SPAR is based on the following 4 simple steps: 

1. Upload a CSV file with the text content to be measured and a document ID column. Select embedding method and embed the documents.

<img src="resources/imgs/sc1.png" alt="sc1" style="width: 70%;"/>

2. Define a set of dimensions and seed queries. For example:
   * `Creative`: *"We should adapt and innovate."*
   * `Positive emotion`: *"We are happy."*
   * `Danger`: *"It is dangerous."*
  
    Then, search for sentences in a corpus that are similar to the seed queries, and use the results to define dimensions **in the context** of the corpus. 
    For example:

   * `Creative`: 
     * *"Digital technology will play a huge role going forward."*
     * *"How do you adapt to these uncharted waters? "*
   * `Positive emotion`: 
     * *"The smiling faces say it all."*
     * *"A round of applause to all of our recent WaFd Foundation grant recipients!"*
   * `Danger`: *"How do you prevent the spread of a deadly virus?"*
    
    Enter them to the query box and click the "Embed Queries and Save Dimensions" button.
<img src="resources/imgs/sc2.png" alt="sc2" style="width: 70%;"/>

3. Define scales, which consists of one or more demensions. For example:  
     * `Sentiment = Positive emotion - Negative emotion`
     * `Creativity = Creative` 

<img src="resources/imgs/sc3.png" alt="sc3" style="width: 70%;"/>

4. Project the document embeddings onto the scale embeddings. A CSV file with the results can be downloaded.

<img src="resources/imgs/sc4.png" alt="sc4" style="width: 70%;"/>

## Citation
If you find SPAR useful in your work, please cite the following paper:

* Blinded Authors (2023), A Computational Framework for Understanding Firm Communication During Disasters, Under Review at *Information Systems Research*.
