# UNIQmin: An alignment-independent tool for the study of pathogen sequence diversity at any given rank of taxonomy lineage

[![DOI - 10.3390/biology10090853](https://img.shields.io/badge/DOI-10.3390%2Fbiology10090853-2ea44f)](https://doi.org/10.3390/biology10090853)
[![ChongLC - MinimalSetofViralPeptidome-UNIQmin](https://img.shields.io/static/v1?label=ChongLC&message=UNIQmin&color=blue&logo=github)](https://github.com/ChongLC/MinimalSetofViralPeptidome-UNIQmin)
[![stars - MinimalSetofViralPeptidome-UNIQmin](https://img.shields.io/github/stars/ChongLC/MinimalSetofViralPeptidome-UNIQmin?style=social)](https://github.com/ChongLC/MinimalSetofViralPeptidome-UNIQmin)
[![forks - MinimalSetofViralPeptidome-UNIQmin](https://img.shields.io/github/forks/ChongLC/MinimalSetofViralPeptidome-UNIQmin?style=social)](https://github.com/ChongLC/MinimalSetofViralPeptidome-UNIQmin)
[![License](https://img.shields.io/badge/License-MIT-blue)](#license)


### Brief Description
Sequence variation among pathogens, even of a single amino acid, can expand their host repertoire or enhance the infection ability. Alignment independent approach represents an alternative approach to the study of pathogen diversity, which is devoid of the need for sequence conservation to perform comparative analyses. Herein, we present UNIQmin, a tool that utilises an alignment independent method to generate the minimal set of pathogen sequences, as a way to study their diversity, across any rank of taxonomic lineage. The minimal set refers to the smallest possible number of sequences required to capture the entire repertoire of pathogen peptidome diversity present in a sequence dataset.

### Installation
`pip install uniqmin`

### Usage
`uniqmin [-h] [-i INPUT] [-o OUTPUT] [-k [KMERLENGTH]] [-cpu [CPUSIZE]]`

For example, UNIQmin tool is applied to generate a minimal set (example) with a sample input file (exampleinput.fas). A *k*-mer window size of nine (9; nonamer) is used with utilising 14-cores. 

`uniqmin -i exampleinput.fas -o example -k 9 -cpu 14`

### Command-line Arguments
| Argument 	| Parameter              | Type    	| Required | Default 	| Description                                |           
|----------	|----------------------- |---------	|----------|------------|--------------------------------------------|
| -h       	| help                   | N/A     	|FALSE	   | N/A     	| Show this help message and exit            |
| -i       	| sequence input file    | String  	|TRUE	   | N/A     	| Path of the input file (in FASTA format)   |
| -o       	| output directory name  | String  	|TRUE      | N/A     	| Path of the output file to be created      |
| -k        | *k*-mer window size    | Integer 	|FALSE     | 9       	| The length of *k*-mers to be used          |
| -cpu      | cpu size               | Integer 	|FALSE     | 14       	| The number of CPU cores to be used         |

For more information, please visit [UNIQmin GitHub page](https://github.com/ChongLC/MinimalSetofViralPeptidome-UNIQmin). 

### License
© 2021 Chong LC

This repository is licensed under the MIT license.
See LICENSE for details.