Metadata-Version: 2.1
Name: readpaf
Version: 0.0.6a3
Summary: minimap2 PAF file reader
Home-page: https://github.com/alexomics/read-paf
Author: Alexander Payne
Author-email: alexander.payne@nottingham.ac.uk
License: MIT
Project-URL: Bug Tracker, https://github.com/alexomics/read-paf/issues
Project-URL: Source Code, https://github.com/alexomics/read-paf
Description: readpaf
        =======
        [![Build](https://github.com/alexomics/read-paf/actions/workflows/main.yml/badge.svg)](https://github.com/alexomics/read-paf/actions/workflows/main.yml)
        [![PyPI](https://img.shields.io/pypi/v/readpaf)](https://pypi.org/p/readpaf)
        
        readpaf is a fast parser for [minimap2](https://github.com/lh3/minimap2) PAF (**P**airwise m**A**pping **F**ormat) files. It is 
        pure python with no dependencies (unless you want a DataFrame).
        
        
        Installation
        ===
        ```bash
        pip install readpaf
        ```
        
        <details>
          <summary>Other install methods</summary>
            
           ### Install with pandas:
           This is only needed if you want to manipulate the PAF file as a `pandas.DataFrame`
        
           ```bash
           pip install readpaf[pandas]
           ```
        
           ### Direct download:
           using cURL
        
           ```bash
           curl -O https://raw.githubusercontent.com/alexomics/read-paf/main/readpaf.py
           ```
        
           or wget
        
           ```bash
           wget https://raw.githubusercontent.com/alexomics/read-paf/main/readpaf.py
           ```
        </details>
        
        Usage
        ===
        
        readpaf only has one user function, `parse_paf` that accepts of file-like object; this 
        is any object in python that has a file-oriented API (`sys.stdin`, `stdout` from subprocess, 
        `io.StringIO`, open files from `gzip` or `open`).  
        
        The following script demonstrates how minimap2 output can be piped into read-paf 
        
        ```python
        from readpaf import parse_paf
        from sys import stdin
        
        for record in parse_paf(stdin):
            print(record.query_name, record.target_name)
        ```
        
        read-paf can also generate a pandas DataFrame:
        
        ```python
        from readpaf import parse_paf
        
        with open("test.paf", "r") as handle:
            df = parse_paf(handle, dataframe=True)
        
        ```
        
        Functions
        ===
        
        read-paf has a single user function
        
        parse_paf
        ---
        
        ```python
        parse_paf(file_like=file_handle, fields=list, dataframe=bool)
        ```
        Parameters:
        
         - **file_like:** A file like object, such as `sys.stdin`, a file handle from open or io.StringIO objects
         - **fields:** A list of 13 field names to use for the PAF file, default:
            ```python
            "query_name", "query_length", "query_start", "query_end", "strand",
            "target_name", "target_length", "target_start", "target_end",
            "residue_matches", "alignment_block_length", "mapping_quality", "tags"
            ```
            These are based on the [PAF specification](https://github.com/lh3/miniasm/blob/master/PAF.md).
         - **dataframe:** bool, if True, return a pandas.DataFrame with the tags expanded into separate Series
         
        If used as an iterator, then each object returned is a named tuple representing a single line in the PAF file. 
        Each named tuple has field names as specified by the `fields` parameter. The SAM-like tags are converted into 
        their correct types and stored in a dictionary. When `print` or `str` are called on `PAF` record (named tuple) 
        a formated PAF string is returned, which is useful for writing records to a file. The `PAF` record also has a 
        method `blast_identity` which calculates the [blast identity](https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity) for
        that record.
        
        If used to generate a pandas DataFrame, then each row represents a line in the PAF file and the SAM-like tags 
        are expanded into individual series.
        
Platform: OS Independent
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*
Description-Content-Type: text/markdown
Provides-Extra: pandas
