Metadata-Version: 2.1
Name: jasper-vh
Version: 1.1
Summary: Just a simple virus's hosts predictor
Home-page: https://github.com/777moneymaker/jasper
Author: Milosz Chodkowski
Author-email: milosz.chodkowski@student.put.poznan.pl
License: GPLv3
Download-URL: https://github.com/777moneymaker/jasper/archive/v1.0.2.tar.gz
Description: # JASPER
        
        ![pypi](https://img.shields.io/pypi/v/jasper-vh.svg?branch=master)
        
        ![JASPER LOGO](https://github.com/777moneymaker/jasper/blob/main/logo.png?raw=true)
        
        JASPER is a free bioinformatics tools for predicting virus hosts. 
        JASPER uses a bunch of bioinformatics tools to prediction virus hosts. It includes genome-genome alignment, CRISPR spacers analyzation, tRNA analyzation and more.
        JASPER contains few, independent modules `blast`, `crispr`, `trna`, `wish`, `mash`, `merge`.
        
        # Requirements
        
        ### Python 3.7
        
        You need `Python >= 3.7` to use JASPER.
        
        ### Naming convention
        
        **Jasper** depends on good file naming convention. The best is to use sequence ID as file name, e.x. `NC_008876.fna`. Software will use this id to name every temp file that needs to be created and also it will use this ID in results file.
        
        **WARNING** It's not the best idea to use `|` char in your filename and also in sequence header. Just use normal fasta naming like `>NC_00876 additional_info more_additional_info`.
        
        If you put multiple contigs in a single file, there is no problem with that. Just be sure that every contig is in it's right file. **Jasper** repairs every file it read, by default naming it `<id from filename>|<#contig>` e.x.:
        ```
        >NC_000856|1
        ATGCT....
        >NC_000856|2
        ATGCA....
        # and so on
        ```
        So even if you have, for instance, one genome in your file, then **Jasper** will change it's id to `<id from filename>|1`.
        
        ### Extensions
        
        Jasper uses input files that ends with `[fa, fna, fasta]` only!
        
        
        ### Additional software
        
        ```
        NCBI-Blast+
        PILER-CR
        WIsH
        Mash
        tRNAscan-SE
        ```
        # Installation
        
        **JASPER** uses additional software. It calls every program with `subprocess` so every program that is stated in above should be installed and added to `$PATH`.
        
        On Ubuntu:
        * To install NCBI-Blast+ use `sudo apt install ncbi-blast+`
        * To install PILER-CR go [here](http://www.drive5.com/pilercr/), download compiled software, move somewhere and add to `$PATH` under name `pilercr`.
        * To install tRNAscan-SE go [here](http://lowelab.ucsc.edu/tRNAscan-SE/), download, compile, move somewhere and add to `$PATH` under name `tRNAscan-SE`. Remember that tRNAscan-SE needs Infernal to work properly.
        * To install WIsH go [here](https://github.com/soedinglab/WIsH), download, compile, move somewhere and add to `$PATH` under name `WIsH`.
        * To install Mash go [here](https://github.com/marbl/Mash), download release, move somewhere and add to `$PATH` under name `mash`.
        
        Source code for additional software:
        * NCBI-Blast+: [here](https://www.ncbi.nlm.nih.gov/books/NBK279671/)
        * PILER-CR [here](http://www.drive5.com/pilercr/)
        * tRNAscan-SE [here](http://lowelab.ucsc.edu/tRNAscan-SE/)
        * WIsH [here](https://github.com/soedinglab/WIsH)
        * Mash [here](https://github.com/marbl/Mash)
        
        **Remember to install everything and add it to path**
        
        **You can also download the script `install_dependencies.sh` which will install everything.**
        
        After that go to JASPER's main directory and:
        ```
        python setup.py install
        ```
        
        or you can use pip `pip3 install jasper-vh` or `python -m pip install jasper-vh`.
        
        ### PATH
        
        By defaults some pip on linux drops scripts to `~/.local/bin`. Add it to your `$PATH` at the end.
        `export PATH="$HOME/.local/bin:$PATH"`
        Now you're done and you can start using `jasper-vh`.
        
        # Tests
        
        If you want to test, go to proj directory and type `python -m unittest discover`.
        It's recommended to do that, since it performs tool check (ensures that user has all dependencies and proper python version).
        
        # Usage
        
        JASPER uses bunch of arguments. A lot of parameters are BLAST parameters and can be configured with JSON file and passed to JASPER.
        
        ### Basic usage
        
        ```
        jasper-vh blast --virus path/to/virus/dir --create-db host_db --host /path/to/host/dir --clear
        jasper-vh crispr --host path/to/host/dir --create-db vir_db --host /path/to/vir/dir --clear
        jasper-vh trna --host path/to/host/dir --virus /path/to/vir/dir --clear
        jasper-vh wish --host path/to/host/dir --virus /path/to/vir/dir --clear
        jasper-vh mash --host path/to/host/dir --virus /path/to/vir/dir --clear
        jasper-vh merge *.csv --output final_results.csv 
        ```
        
        For more check `--help` on jasper individual modules: `jasper-vh  {blast,crispr,trna,wish,mash,merge} --help`
        
        # Blast config
        
        You can provide blast config as as a `*.json` file.
        Every module uses different task so there are few arguments that are forbidden:
        `['query', 'db', 'outfmt', 'max_target_seqs', 'num_alignments']`
        
        # References
        
        * Edgar, R.C. (2007) [*PILER-CR: fast and accurate identification of CRISPR repeats*](http://www.ncbi.nlm.nih.gov/pubmed/17239253), BMC Bioinformatics, Jan 20;8:18
        * Fichant and Burks, J. Mol. Biol. (1991) *Identification of tRNA genes in genomic DNA*, 220:659-671.
        * Clovis Galiez, Matthias Siebert et al. *WIsH: who is the host? Predicting prokaryotichosts from metagenomic phage contigs*
        * Ondov, B.D., Treangen, T.J., Melsted, P. et al. [*Mash: fast genome and metagenome distance estimation using MinHash.*](https://doi.org/10.1186/s13059-016-0997-x) Genome Biol 17, 132 (2016).
        * [NCBI-BLAST+](https://www.ncbi.nlm.nih.gov/books/NBK279690/)
        
        # License
        
        [GPLv3](https://www.gnu.org/licenses/gpl-3.0.html)
        
Keywords: bioinformatics sequence DNA trna CRISPR blast virus host
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Description-Content-Type: text/markdown
