Metadata-Version: 2.1
Name: tmplot
Version: 0.0.8
Summary: Visualization of Topic Modeling Results
Home-page: https://github.com/maximtrp/tmplot
Author: maximtrp
Author-email: maximtrp@gmail.com
License: MIT
Keywords: data science,data analytics
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: General
Description-Content-Type: text/markdown
License-File: LICENSE

# tmplot

[![Documentation Status](https://readthedocs.org/projects/tmplot/badge/?version=latest)](https://tmplot.readthedocs.io/en/latest/?badge=latest)
[![Downloads](https://pepy.tech/badge/tmplot)](https://pepy.tech/project/tmplot)
![PyPI](https://img.shields.io/pypi/v/tmplot)
[![Issues](https://img.shields.io/github/issues/maximtrp/tmplot.svg)](https://github.com/maximtrp/tmplot/issues)

**tmplot** is a Python package for analysis and visualization of topic modeling results. It provides the interactive report interface that borrows much from LDAvis/pyLDAvis and builds upon it offering a number of metrics for calculating topic distances and a number of algorithms for calculating scatter coordinates of topics. It can be used to select closest and stable topics across multiple models.

![Plots](https://raw.githubusercontent.com/maximtrp/tmplot/main/images/topics_terms_plots.png)

## Features

- Supported models:

  - [tomotopy](https://bab2min.github.io/tomotopy/): `LDAModel`, `LLDAModel`, `CTModel`, `DMRModel`, `HDPModel`, `PTModel`, `SLDAModel`, `GDMRModel`
  - [gensim](https://radimrehurek.com/gensim/): `LdaModel`, `LdaMulticore`
  - [bitermplus](https://github.com/maximtrp/bitermplus): `BTM`

- Supported distance metrics:

  - Kullback-Leibler (symmetric and non-symmetric) divergence
  - Jenson-Shannon divergence
  - Jeffrey's divergence
  - Hellinger distance
  - Bhattacharyya distance
  - Total variation distance
  - Jaccard inversed index

- Supported [algorithms](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold) for calculating topics scatter coordinates:

  - t-SNE
  - SpectralEmbedding
  - MDS
  - LocallyLinearEmbedding
  - Isomap

## Installation

The package can be installed from PyPi:

```bash
pip install tmplot
```

Or directly from this repository:

```bash
pip install git+https://github.com/maximtrp/tmplot.git
```

## Dependencies

- `numpy`
- `scipy`
- `scikit-learn`
- `pandas`
- `altair`
- `ipywidgets`
- `tomotopy`, `gensim`, and `bitermplus` (optional)

## Quick example

```python
# Importing packages
import tmplot as tmp
import pickle as pkl
import pandas as pd

# Reading a model from a file
with open('data/model.pkl', 'rb') as file:
    model = pkl.load(file)

# Reading documents from a file
docs = pd.read_csv('data/docs.txt.gz', header=None).values.ravel()

# Plotting topics as a scatter plot
topics_coords = tmp.prepare_coords(model)
tmp.plot_scatter_topics(topics_coords, size_col='size', label_col='label')

# Plotting terms probabilities
terms_probs = tmp.calc_terms_probs_ratio(phi, topic=0, lambda_=1)
tmp.plot_terms(terms_probs)

# Running report interface
tmp.report(model, docs=docs, width=250)
```

You can find more examples in the [tutorial](https://tmplot.readthedocs.io/en/latest/tutorial.html).


