Metadata-Version: 2.1
Name: rnpfind
Version: 0.6.0
Summary: Collect and generate RNA-RBP interaction data in various formats
Home-page: https://github.com/mnahinkhan/rnpfind
Author: Nahin Khan
Author-email: mnahinkhan@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

# rnpfind

Command line tool for collecting and generating RNA-RBP interaction data in
various formats.


## Installation
```
pip install rnpfind
```

*Requirements*:
 - Python >=3.8

## Usage

`rnpfind` can be used as a command line tool as follows:

```
rnpfind <transcript>
```

where `transcript` is a gene name such as "PTEN", or an hg38 coordinate range
(such as 11:65497688-65506516)

The tool takes a transcript as input, and computes the binding sites of various
RBPs on the transcript. The information is collected from data sources
including RBPDB, ATTRACT, and POSTAR.

The tool produces as output the binding data in a folder (use `--out-dir` to
specify). A few output formats are supported (use `--out-format` to specify):

 - `bed` format: a widely used
   [format](https://en.wikipedia.org/wiki/BED_(file_format))
   for displaying intervals. A bed file created in this way could be visualized
   on a genome browser, for example. Note the `--trackhub` option avaiable to
   generate a trackhub structure (useful for hosting a large number of indexed
   bed files (bigBed files) and allowing users to view on genome browsers like
   the UCSC Genome Browser.

 - `csv` format: an `N`x`N` table (where `N`=number of RBPs) showing binding
   correlations of RBPs on the particular transcript analyzed. This could be
   useful for inferring molecular mechanisms on certain regions of the
   transcriptome.

For more options, run `rnpfind --help`


### Within Python
You can import `rnpfind` to your Python code as follows:

```
from rnpfind import rnpfind

# Collect data on Malat1
rnpfind("malat1")
```

The data is written to disk like in the command line call.
Check `help(rnpfind)` for keyword arg options.


Perhaps not so usefully, you can find the genome version `rnpfind` is working
with programatically:

```
from rnpfind import GENOME_VERSION
print(GENOME_VERSION)
```



## How does it work?
In principle, RNA-RBP interactions can be backed by two forms of evidence:
experimental and computational.

The experimental binding sites are collected on large databases such as POSTAR.
The computational binding sites are generated by scanning RNA-binding-motifs of
various RBPs (collected from RBPDB and ATTRACT) across a transcript to look for
hits.

*As a result, the tool requires around 6.4GB to function.* The data is
downloaded automatically on the first run of the tool, or can be downloaded
manually using `rnpfind-download`.

If the above memory footprint is too much for you to handle, consider using the
web tool avaiable at https://rnpfind.com



## Contributing
Any suggestions / PR requests are welcome!


## Development
Enable recommended Git Hooks as follows:
```
git config --local core.hooksPath .githooks/
```
The above will run the following to ensure code consistency every time you
commit:
 - [black](https://github.com/psf/black)
 - [isort](https://github.com/PyCQA/isort)

Also use [fit-commit](https://github.com/m1foley/fit-commit) to ensure
consistent commit message style.


