Metadata-Version: 2.1
Name: pyppl
Version: 3.1.0
Summary: A Python PiPeLine framework
Home-page: https://github.com/pwwang/PyPPL
License: MIT
Author: pwwang
Author-email: pwwang@pwwang.com
Requires-Python: >=3.6,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Dist: attr_property
Requires-Dist: attrs (>=19.3,<20.0)
Requires-Dist: cmdy
Requires-Dist: colorama (>=0.4.1,<0.5.0)
Requires-Dist: diot
Requires-Dist: filelock (>=3.0.0,<4.0.0)
Requires-Dist: liquidpy
Requires-Dist: pluggy (<1.0.0)
Requires-Dist: psutil (>=5.6,<6.0)
Requires-Dist: pyparam
Requires-Dist: pyppl_echo
Requires-Dist: pyppl_export
Requires-Dist: pyppl_lock
Requires-Dist: pyppl_rich
Requires-Dist: pyppl_runners
Requires-Dist: pyppl_strict
Requires-Dist: python-simpleconf
Requires-Dist: python-varname
Requires-Dist: toml (>=0.10,<0.11)
Requires-Dist: transitions (>=0.7,<0.8)
Project-URL: Repository, https://github.com/pwwang/PyPPL
Description-Content-Type: text/markdown

# [PyPPL][3] - A [Py](#)thon [P](#)i[P](#)e[L](#)ine framework

[![Pypi][10]][18] [![Github][11]][3] [![PythonVers][14]][18] [![docs][19]][1] [![Travis building][5]][7] [![Codacy][4]][8] [![Codacy coverage][6]][8]

[Documentation][1] | [API][2] | [Change log][9]

<!-- toc -->
## Features
- Process caching.
- Process reusability.
- Process error handling.
- Runner customization.
- Easy running profile switching.
- Plugin system.

## Installation
```bash
pip install PyPPL
```

## Plugin gallery

(*) shipped with `PyPPL`

- [pyppl_report](https://github.com/pwwang/pyppl_report): Generating reports for PyPPL pipelines
- [pyppl_flowchart](https://github.com/pwwang/pyppl_flowchart): Generating flowchart for PyPPL
- [pyppl_export](https://github.com/pwwang/pyppl_export)*: Exporting outputs generated by PyPPL pipeline
- [pyppl_echo](https://github.com/pwwang/pyppl_echo)*: Echoing script output to PyPPL logs"
- [pyppl_rich](https://github.com/pwwang/pyppl_rich)*: Richer information in logs for PyPPL
- [pyppl_strict](https://github.com/pwwang/pyppl_strict)*: More strict check of job success for PyPPL
- [pyppl_lock](https://github.com/pwwang/pyppl_lock)*: Preventing running processes from running again for PyPPL
- [pyppl_annotate](https://github.com/pwwang/pyppl_annotate): Adding long description/annotation for processes
- [pyppl_require](https://github.com/pwwang/pyppl_require): Checking and installing requirements for processes
- [pyppl_jobtime](https://github.com/pwwang/pyppl_jobtime): Job running time statistics for PyPPL
- [pyppl_notify](https://github.com/pwwang/pyppl_notify): Email notifications for PyPPL
- [pyppl_runcmd](https://github.com/pwwang/pyppl_runcmd): Allowing to run local command before and after each process for PyPPL
- [pyppl_runners](https://github.com/pwwang/pyppl_runners): Common runners for PyPPL

## Writing pipelines with predefined processes
Let's say we are implementing the [TCGA DNA-Seq Re-alignment Workflow][16]
(The very left part of following figure).
For demonstration, we will skip the QC and the co-clean parts here.

[![DNA_Seq_Variant_Calling_Pipeline][15]][16]

`demo.py`:
```python
from pyppl import PyPPL, Channel
# import predefined processes
from TCGAprocs import pBamToFastq, pAlignment, pBamSort, pBamMerge, pMarkDups

# Load the bam files
pBamToFastq.input = Channel.fromPattern('/path/to/*.bam')
# Align the reads to reference genome
pAlignment.depends = pBamToFastq
# Sort bam files
pBamSort.depends = pAlignment
# Merge bam files
pBamMerge.depends = pBamSort
# Mark duplicates
pMarkDups.depends = pBamMerge
# Export the results
pMarkDups.config.export_dir = '/path/to/realigned_Bams'
# Specify the start process and run the pipeline
PyPPL().start(pBamToFastq).run()
```

[![asciicast][20]][21]

## Implementing individual processes
`TCGAprocs.py`:
```python
from pyppl import Proc
pBamToFastq = Proc(desc = 'Convert bam files to fastq files.')
pBamToFastq.input = 'infile:file'
pBamToFastq.output = [
    'fq1:file:{{i.infile | stem}}_1.fq.gz',
    'fq2:file:{{i.infile | stem}}_2.fq.gz']
pBamToFastq.script = '''
bamtofastq collate=1 exclude=QCFAIL,SECONDARY,SUPPLEMENTARY \
    filename= {{i.infile}} gz=1 inputformat=bam level=5 \
    outputdir= {{job.outdir}} outputperreadgroup=1 tryoq=1 \
    outputperreadgroupsuffixF=_1.fq.gz \
    outputperreadgroupsuffixF2=_2.fq.gz \
    outputperreadgroupsuffixO=_o1.fq.gz \
    outputperreadgroupsuffixO2=_o2.fq.gz \
    outputperreadgroupsuffixS=_s.fq.gz
'''

pAlignment = Proc(desc = 'Align reads to reference genome.')
pAlignment.input = 'fq1:file, fq2:file'
#                             name_1.fq.gz => name.bam
pAlignment.output = 'bam:file:{{i.fq1 | stem | stem | [:-2]}}.bam'
pAlignment.script = '''
bwa mem -t 8 -T 0 -R <read_group> <reference> {{i.fq1}} {{i.fq2}} | \
    samtools view -Shb -o {{o.bam}} -
'''

pBamSort = Proc(desc = 'Sort bam files.')
pBamSort.input = 'inbam:file'
pBamSort.output = 'outbam:file:{{i.inbam | basename}}'
pBamSort.script = '''
java -jar picard.jar SortSam CREATE_INDEX=true INPUT={{i.inbam}} \
    OUTPUT={{o.outbam}} SORT_ORDER=coordinate VALIDATION_STRINGENCY=STRICT
'''

pBamMerge = Proc(desc = 'Merge bam files.')
pBamMerge.input = 'inbam:file'
pBamMerge.output = 'outbam:file:{{i.inbam | basename}}'
pBamMerge.script = '''
java -jar picard.jar MergeSamFiles ASSUME_SORTED=false CREATE_INDEX=true \
    INPUT={{i.inbam}} MERGE_SEQUENCE_DICTIONARIES=false OUTPUT={{o.outbam}} \
    SORT_ORDER=coordinate USE_THREADING=true VALIDATION_STRINGENCY=STRICT
'''

pMarkDups = Proc(desc = 'Mark duplicates.')
pMarkDups.input = 'inbam:file'
pMarkDups.output = 'outbam:file:{{i.inbam | basename}}'
pMarkDups.script = '''
java -jar picard.jar MarkDuplicates CREATE_INDEX=true INPUT={{i.inbam}} \
    OUTPUT={{o.outbam}} VALIDATION_STRINGENCY=STRICT
'''
```

Each process is indenpendent so that you may also reuse the processes in other pipelines.

## Pipeline flowchart
```python
# When try to run your pipline, instead of:
#   PyPPL().start(pBamToFastq).run()
# do:
PyPPL().start(pBamToFastq).flowchart().run()
```
Then an SVG file endswith `.pyppl.svg` will be generated under current directory.
Note that this function requires [Graphviz][13] and [graphviz for python][12].

See plugin [details][22].

![flowchart][17]

## Pipeline report
See plugin [details][23]

````python
pPyClone.report = """
## {{title}}

PyClone[1] is a tool using Probabilistic model for inferring clonal population structure from deep NGS sequencing.

![Similarity matrix]({{path.join(job.o.outdir, "plots/loci/similarity_matrix.svg")}})

```table
caption: Clusters
file: "{{path.join(job.o.outdir, "tables/cluster.tsv")}}"
rows: 10
```

[1]: Roth, Andrew, et al. "PyClone: statistical inference of clonal population structure in cancer." Nature methods 11.4 (2014): 396.
"""

# or use a template file

pPyClone.report = "file:/path/to/template.md"
````

```python
PyPPL().start(pPyClone).run().report('/path/to/report', title = 'Clonality analysis using PyClone')
```

![report][24]

## Full documentation
[ReadTheDocs][1]


[1]: https://pyppl.readthedocs.io/en/latest/
[2]: https://pyppl.readthedocs.io/en/latest/api/
[3]: https://github.com/pwwang/pyppl/
[4]: https://img.shields.io/codacy/grade/30db7165ed6c4d7aa3261cd1f5530813.svg?style=flat-square
[5]: https://img.shields.io/travis/pwwang/PyPPL.svg?style=flat-square
[6]: https://img.shields.io/codacy/coverage/30db7165ed6c4d7aa3261cd1f5530813.svg?style=flat-square
[7]: https://travis-ci.org/pwwang/PyPPL
[8]: https://app.codacy.com/project/pwwang/PyPPL/dashboard
[9]: https://pyppl.readthedocs.io/en/latest/CHANGELOG/
[10]: https://img.shields.io/pypi/v/pyppl.svg?style=flat-square
[11]: https://img.shields.io/github/tag/pwwang/PyPPL.svg?style=flat-square
[12]: https://github.com/xflr6/graphviz
[13]: https://www.graphviz.org/
[14]: https://img.shields.io/pypi/pyversions/PyPPL.svg?style=flat-square
[15]: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/images/dna-alignment-pipeline_0.png
[16]: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
[17]: https://raw.githubusercontent.com/pwwang/PyPPL/development/examples/demo/demo.pyppl.svg?sanitize=true
[18]: https://pypi.org/project/PyPPL/
[19]: https://img.shields.io/readthedocs/pyppl.svg?style=flat-square
[20]: https://asciinema.org/a/Uiz6Wdo1buGCGPFd89bWiZzwn.svg?sanitize=true
[21]: https://asciinema.org/a/Uiz6Wdo1buGCGPFd89bWiZzwn
[22]: https://github.com/pwwang/pyppl_flowchart
[23]: https://github.com/pwwang/pyppl_report
[24]: https://pyppl_report.readthedocs.io/en/latest/snapshot.png

