Metadata-Version: 2.1
Name: mesi
Version: 1.0.1
Summary: Measure similarity in a many-to-many fashion
Home-page: https://github.com/Michionlion/mesi
License: GPL-3.0-or-later
Keywords: diff,similarity,check
Author: Saejin Mahlau-Heinert
Author-email: saejinmh@gmail.com
Requires-Python: >=3.6.2,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Education :: Testing
Classifier: Topic :: Text Processing :: General
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Dist: polyleven (>=0.7,<0.8)
Requires-Dist: tabulate (>=0.8.9,<0.9.0)
Requires-Dist: textdistance[extras] (>=4.2.1,<5.0.0)
Requires-Dist: tqdm (>=4.62.3,<5.0.0)
Requires-Dist: typer[all] (>=0.4.0,<0.5.0)
Project-URL: Bug Tracker, https://github.com/Michionlion/mesi/issues
Project-URL: Documentation, https://github.com/Michionlion/mesi
Project-URL: Repository, https://github.com/Michionlion/mesi
Description-Content-Type: text/markdown

# Mesi

[![Lint and Test](https://github.com/Michionlion/mesi/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/Michionlion/mesi/actions/workflows/main.yml)
[![codecov](https://codecov.io/gh/Michionlion/mesi/branch/main/graph/badge.svg?token=RdzwvXDrxp)](https://codecov.io/gh/Michionlion/mesi)
[![PyPI](https://img.shields.io/pypi/v/mesi)](https://pypi.org/project/mesi)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/mesi)](https://pypi.org/project/mesi/#files)
[![License](https://img.shields.io/github/license/Michionlion/mesi.svg)](https://github.com/Michionlion/mesi/blob/master/LICENSE)

---

Mesi is a tool to measure the similarity in a many-to-many fashion of long-form
documents like Python source code or technical writing. The output can be useful
in determining which of a collection of files are the most similar to each
other.

## Installation

Python 3.9+ and [pipx](https://pypa.github.io/pipx/) are recommended, although
Python 3.6+ and/or [pip](https://pip.pypa.io/en/stable/) will also work.

```bash
pipx install mesi
```

If you'd like to test out Mesi before installing it, use the remote execution
feature of `pipx`, which will temporarily download Mesi and run it in an
isolated virtual environment.

```bash
pipx run mesi --help
```

## Usage

For a directory structure that looks like:

```text
projects
├── project-one
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
├── project-two
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
│
```

where similarity should be measured between each project's
`deliverables/python_program.py` file, run the command:

```bash
mesi projects/*/deliverables/python_program.py
```

A lower distance in the produced table equates to a higher degree of similarity.

See the help menu (`mesi --help`) for additional options and configuration.

### Algorithms

There are many algorithms to choose from when comparing string similarity! Mesi
implements all the
[algorithms](https://github.com/life4/textdistance#algorithms) provided by
[TextDistance](https://github.com/life4/textdistance). In general `levenshtein`
is never a bad choice, which is why it is the default.

## Bugs/Requests

Please use the [GitHub issue
tracker](https://github.com/Michionlion/mesi/issues) to submit bugs or request
new features, options, or algorithms.

## Dependencies

Mesi uses two primary dependencies for text similarity calculation:
[polyleven](https://github.com/fujimotos/polyleven), and
[TextDistance](https://github.com/life4/textdistance). Polyleven is the default,
as its singular implementation of [Levenshtein
distance](https://en.wikipedia.org/wiki/Levenshtein_distance) can be faster in
most situations. However, if a different edit distance algorithm is requested,
TextDistance's implementations will be used.

## License

Distributed under the terms of the [GPL v3](LICENSE) license, mesi is free and
open source software.

