Metadata-Version: 2.1
Name: pyabpoa
Version: 1.3.0.0
Summary: pyabpoa: SIMD-based partial order alignment using adaptive band
Home-page: https://github.com/yangao07/abPOA
Author: Yan Gao
Author-email: gaoy286@mail.sysu.edu.cn
License: MIT
Keywords: multiple-sequence-alignment  partial-order-graph-alignment
Platform: UNKNOWN
Description-Content-Type: text/markdown
License-File: LICENSE

# pyabpoa: abPOA Python interface
## Introduction
pyabpoa provides an easy-to-use interface to [abPOA](https://github.com/yangao07/abPOA), it contains all the APIs that can be used to perform MSA for a set of sequences and consensus calling from the final alignment graph.

## Installation

### Install pyabpoa with pip

pyabpoa can be installed with pip:

```
pip install pyabpoa
```

### Install pyabpoa from source
Alternatively, you can install pyabpoa from source (cython is required):
```
git clone --recursive https://github.com/yangao07/abPOA.git
cd abPOA
make install_py
```

## Examples
The following code illustrates how to use pyabpoa.
```
import pyabpoa as pa
a = pa.msa_aligner()
seqs=[
'CCGAAGA',
'CCGAACTCGA',
'CCCGGAAGA',
'CCGAAGA'
]
res=a.msa(seqs, out_cons=True, out_msa=True, out_pog='pog.png', incr_fn='') # perform multiple sequence alignment 
                                                                # generate a figure of alignment graph to pog.png

for seq in res.cons_seq:
    print(seq)  # print consensus sequence

res.print_msa() # print row-column multiple sequence alignment in PIR format
```
You can also try the example script provided in the source folder:
```
python ./python/example.py
```


## APIs

### Class pyabpoa.msa_aligner
```
pyabpoa.msa_aligner(aln_mode='g', ...)
```
This constructs a multiple sequence alignment handler of pyabpoa, it accepts the following arguments:

* **aln_mode**: alignment mode. 'g': global, 'l': local, 'e': extension; default: **'g'**
* **is_aa**: input is amino acid sequence; default: **False**
* **match**: match score; default: **2**
* **mismatch**: match penaty; default: **4**
* **score_matrix**: scoring matrix file, **match** and **mismatch** are not used when **score_matrix** is used; default: **''**
* **gap_open1**: first gap opening penalty; default: **4**
* **gap_ext1**: first gap extension penalty; default: **2**
* **gap_open2**: second gap opening penalty; default: **24**
* **gap_ext2**: second gap extension penalty; default: **1**
* **extra_b**: first adaptive banding paremeter; set as < 0 to disable adaptive banded DP; default: **10**
* **extra_f**: second adaptive banding paremete; the number of extra bases added on both sites of the band is *b+f\*L*, where *L* is the length of the aligned sequence; default : **0.01**
* **is_diploid**: set as 1 if input is diploid datal default: **0**
* **min_freq**: minimum frequency of each consensus to output for diploid datal default: **0.3**


The `msa_aligner` handler provides one method which performs multiple sequence alignment and takes four arguments:
```
pyabpoa.msa_aligner.msa(seqs, out_cons, out_msa, out_pog='', incr_fn='')
```

* **seqs**: a list variable containing a set of input sequences; **positional**
* **out_cons**: a bool variable to ask pyabpoa to generate consensus sequence; **positional**
* **out_msa**: a bool variable to ask pyabpoa to generate RC-MSA; **positional**
* **out_pog**: name of a file (`.png` or `.pdf`) to store the plot of the final alignment graph; **optional**, default: **''**
* **incr_fn**: name of an existing graph (GFA) or MSA (FASTA) file, incrementally align sequence to this graph/MSA; **optional**, default: **''**

### Class pyabpoa.msa_result
```
pyabpoa.msa_result(seq_n, cons_n, cons_len, ...)
```
This class describes the information of the generated consensus sequence and the RC-MSA. The returned result of `pyabpoa.msa_aligner.msa()` is an object of this class that has the following properties:

* **seq_n**: number of input aligned sequences
* **cons_n**: number of generated consensus sequences (generally 1, could be 2 if input is specified as diploid)
* **cons_len**: an array of consensus sequence length(s)
* **cons_seq**: an array of consensus sequence(s)
* **msa_len**: size of each row in the RC-MSA
* **msa_seq**: an array containing `seq_n` strings that demonstrates the RC-MSA, each consisting of one input sequence and several `-` indicating the alignment gaps. 

`pyabpoa.msa_result()` has a function of `print_msa` which prints the RC-MSA to screen.

```
pyabpoa.msa_result().print_msa()
```


