Metadata-Version: 2.1
Name: probNORM
Version: 1.0.1
Summary: Method for structural probing signal calculation that eliminates read distribution bias and prevents reactivity underestimation.
Home-page: https://github.com/zywicki-lab/probNORM
Author: Agnieszka Chełkowska-Pauszek
Author-email: agnieszka.chelkowska@amu.edu.pl
License: GNU General Public License v3.0
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Requires-Python: >=3.6.0
Description-Content-Type: text/markdown
License-File: LICENSE

# Welcome to probNORM

A new method for structural probing signal calculation that eliminates read distribution bias and prevents reactivity underestimation. It is based on the analysis of background RT stops in treated and control samples of a single replicate and enables statistical discrimination of the probing-sensitive nucleotides. The reactivities obtained by probNORM are highly consistent with the structural models allowing the separation of single- and double-stranded nucleotides.

## **For detailed documentation please see: https://zywicki-lab.github.io/probNORM/**
<br>

### Required

|              |                                                                                                                                                                                                                                                                                                      |
|--------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|[**Python**](https://www.python.org):       |version 3.6 or greater (Python 3 is supported). If you’re setting up Python for the first time,<br>the [Anaconda Python distribution](https://www.anaconda.com/products/distribution) is highly recommended.                                                                                                                                  |
|**Libraries**:    |[pysam](https://pysam.readthedocs.io/en/latest/installation.html), [numpy](https://numpy.org/install/), [scipy](https://scipy.org/install/)                                                                                                                                                                                                                                                                                   |
|[**BEDTools**](https://bedtools.readthedocs.io/en/latest/index.html):     |The version is not important, but later versions will have more features so it’s a good idea<br>to get the latest. Follow the instructions at [https://github.com/arq5x/bedtools2](https://github.com/arq5x/bedtools2) to install,<br>and make sure the programs are on your path. That is, you should be able to call bedtools<br>from any directory.|

## BEDTools installation

- via conda:

        conda install -c bioconda bedtools

- via apt-get for Debian like systems:

        sudo apt-get install bedtools

<br>

# Quick start

The main file of probNORM program is **probnorm**. To quickly run probNORM on provided example files type:

    probnorm bam -t example/treated.sorted.bam -c example/control.sorted.bam -o output.txt
for BAM format input, and:

    probnorm counts -i example/counts-input.txt -o output.txt

for count format input.

**This command will run probNORM with the default parameters.**
<br>

probNORM allows for two format of input data: BAM file or custom made counts file. Depending on the input type, the additional options may vary.

## The example files are provided at https://github.com/zywicki-lab/probNORM

<br>

# Output file
## Format

<br>
The file contains full information about the normalized transcript/s. It consists of nine tab separated columns:

<br>


| Column name | Description |
|-------------|-------------|
| transcript_id | ID of normalized transcript, the same as in the input file |
| position | Position in transcript |
| stops_treated | Stops count in the treated sample: from input counts file or calculated from BAM file |
| stops_control | Stops count in the control sample: from input counts file or calculated from BAM file |
| stops_norm_control | Normalized stops count in the control sample. Stops are normalized by incorporating the normalization factor (nf). |
| reactivity | Reactivity, calculated based on the normalized control stops. |
| fold_change | The ratio between stops counts in control and treated sample |
| p_value | P-value indicates the probability of nucleotide at a given position being a part of the background, not statistically significant. |
| passed_quality_filter | Quality filter (Y - yes / N - no). Transcript positions that exceed the filtering step are those with stops count higher than zero (both control and treated samples), without any missing parameters, and with proper coverage value (when a local script is determining the stops counts |

<br>

    #probnorm counts -i example/counts-input.txt -o output.txt

    transcript_id	position	stops_treated	stops_control	stops_norm_control	reactivity	fold_change	p_value	passed_quality_filter
    RDN18-1	1	3095.0	3472.0	2669.1000000000004	1.0632124544542494	0.2135860512052699	0.37310634695017253	Y
    RDN18-1	2	2029.0	1148.0	882.5250000000001	2.5274855472882036	1.2010598126290937	0.03438625350046609	Y
    RDN18-1	3	315.0	360.0	276.75	0.09548691331973486	0.18676851160572655	0.38858771448505425	Y
    RDN18-1	4	264.0	405.0	311.34375	0.0	-0.23797038886541122	0.6407954148840493	Y
    RDN18-1	5	139.0	171.0	131.45625	0.018832141238058788	0.08050214738573189	0.45145693582080115	Y
    ...
    RDN18-1	1776	0	0	0.0	0.0	0	0.5	N
    RDN18-1	1777	0	0	0.0	0.0	0	0.5	N
    RDN18-1	1778	0	0	0.0	0.0	0	0.5	N
    RDN18-1	1779	0	0	0.0	0.0	0	0.5	N
    RDN18-1	1780	25.0	9.0	6.91875	0.04513784971143676	1.8533447778805348	0.002490274610317811	Y



## Summary information

After each use of probNORM the summary of run will be shown. It contains such informations as:

- input file type
- input and output file names
- parameters thresholds: coverage, p-value, reactive positions
- statictics about normalized transcripts
  
See the example below.

- BAM input

        ***** SUMMARY *****

            input mode: BAM
            input file/s: control: example/control.sorted.bam treated: example/treated.sorted.bam
            output file: test.output
            min coverage: 0
            max p-value: 1.0
            min reactive positions per transcript: 20%
            selected transcripts:  all
            total number of input transcripts: 3
            transcripts omitted due to low reactivity: 0
            transcripts normalized: 3

        *******************

- COUNTS input

        ***** SUMMARY *****

            input mode: COUNTS
            input file/s: data/counts-input.txt
            output file: test.output
            max p-value: 1.0
            min reactive positions per transcript: 20%
            total number of input transcripts: 5
            transcripts omitted due to low reactivity: 0
            transcripts normalized: 5

        *******************

# Contribution


If you notice any errors and mistakes, or would like to suggest some new features, please use Github's issue tracking system to report it at [probNORM](https://github.com/zywicki-lab/probNORM). You are also welcome to send a pull request with your corrections and suggestions.

<br>

# License

This project is licensed under the terms of the GNU General Public License v3.0 license.

