ms1searchpy - a DirectMS1 proteomics search engine for LC-MS1 spectra
-----------------------------------------------------------------------

The .tsv (or mzML) and .fasta files are required for basic operation of the script.
tsv file is tab-separated text file with peptide features generated by Dinosaur software (J.Teleman et al., "Dinosaur: A Refined Open-Source Peptide MS Feature Detector", JPR 2016) or Biosaur (https://github.com/abdrakhimov1/Biosaur) from mzML file. This file can be generated by any other software for peak-picking and must contain columns 'massCalib', 'rtApex', 'charge' and 'nIsotopes' columns.
For a сonvenient usage, mzML files can be used directly and the script will run an attached version of Dinosaur (installed Java is required).  
For an efficient usage of retention time, user can install and use ELUDE prediction algorithm (-elude path_to_elude_binary should be used in parameters).
For the most efficient usage of retention time, user can install and use DeepLC prediction algorithm (-deeplc path_to_deeplc_binary should be used in parameters).

Algorithm can be run with following command:

    ms1searchpy path_to_MZML -d path_to_fasta

    OR

    ms1searchpy path_to_peptideFeatures -d path_to_fasta

The script output contains files: all identified proteins (filename_proteins_full.tsv), filtered proteins (filename_proteins.tsv), all matched peptide match fingerprints (filename_PFMs.tsv), all matched peptide match fingerprints with features prepared for Machnine Learning (filename_PFMs_ML.tsv) and log file with estimated mass and RT accuracies (filename_log.txt).

Citing ms1searchpy
-------------------
Ivanov et al. DirectMS1: MS/MS-free identification of 1000 proteins of cellular proteomes in 5 minutes. https://doi.org/10.1021/acs.analchem.9b05095

Installation
-------------
Using the pip:

    pip install ms1searchpy
    
Example for full installation and usage:
-----------------------------------------

 Convert raw files to mzML: 
 
    msconvert path_to_file.raw -o path_to_output_folder --mzML --filter "peakPicking true 1-" --filter "MS2Deisotope" --filter "zeroSamples removeExtra" --filter "threshold absolute 1 most-intense"

There are two suggested ways to install ms1searchpy with all external software (Diffacto, DeepLC) to get the maximum efficiency from ms1searchpy.

First way is suggested for Linux users: to use the Python virtual environment.

1. “pip3 install virtualenv”
2. “virtualenv3 --python=python3.6 /home/mark/env_ms1” . Comment: While ms1searchpy and Diffacto support all versions of Python3.6+, DeepLC works stable only with Python3.6. The name and path to virtual environment is not limited to the example above. 
3. “source /home/mark/env_ms1/bin/activate” . Comment: to activate the virtual environment. You need to activate it every time when you are going to work with ms1searchpy.
4. “pip3 install ms1searchpy” . Comment: to install the latest ms1searchpy from PyPi.
5. “pip3 install deeplc” . Comment: to install the latest ms1searchpy from PyPi.
6. “pip3 install https://github.com/statisticalbiotechnology/diffacto/archive/master.zip” . Comment: to install the latest ms1searchpy from github. Note, current PyPi diffacto version is outdated and has a critical bug.
7. “deactivate” . Comment: to deactivate virtual environment.

Examples of using ms1searchpy from virtual environment:

1. “source /home/mark/env_ms1/bin/activate”
2. “ms1searchpy /home/mark/test.mzML -d /home/mark/sprot_human.fasta -deeplc /home/mark/env_ms1/bin/deeplc -ad 1” . Comment: this command will run ms1searchpy with DeepLC RT prediction. “-ad 1” command creates a shuffled decoy database for FDR estimation. You should use it only once and just use the created database for other searches.
Or alternative:
“ms1searchpy /home/mark/test.features.tsv -d /home/mark/sprot_human_shuffled.fasta -deeplc /home/mark/env_ms1/bin/deeplc” . Comment: Instead of mzML file, a file with peptide features could be used with ms1searchpy. This file will be created automatically by ms1searchpy after the first processing of the mzML file.
3. “ms1todiffacto -dif /home/mark/env_ms1/bin/diffacto -S1 sample1_r1.proteins.tsv sample1_r2.proteins.tsv sample1_r3.proteins.tsv -S2 sample2_r1.proteins.tsv sample2_r2.proteins.tsv sample2_r3.proteins.tsv -norm median -out diffacto_output.tsv -min_samples 3” . Comment: ms1todiffacto command is used to prepare input file for diffacto from ms1searchpy output and to automatically run diffacto.
4. “deactivate” . Comment: to finish work with ms1searchpy.



Alternative way to install and use ms1searchpy is by using docker. This method is suggested for Windows users due to multiple difficulties of installing and using DeepLC under Windows.

1. Install docker. For details: https://docs.docker.com/docker-for-windows/install/
2. Open terminal. It could be done using Win+R keys combinations and typing “cmd”.
3. “docker pull abdrakhimov1/ms1searchpy”
4. “docker tag abdrakhimov1/ms1searchpy name_for_docker_container” . Comment: This is optional to make usage of docker image more convenient. 
5. “docker run -it -v C:\Users\mark\data_folder:/data name_for_docker_container ms1searchpy data/test.mzML -d data/sprot_human.fasta -deeplc /deeplc/bin/deeplc -ad 1” . Comment: The command to run ms1searchpy using docker is similar to the general ms1searchpy using described in virtualenv section. The main difference is that the command should always start with “docker run -it -v C:\Users\mark\data_folder:/data name_for_docker_container”. The path to the data_folder allows docker to use data from the Windows system inside the docker container. Note, that DeepLC is already installed in the docker container and the default path (/deeplc/bin/deeplc) should be used. Note, the command contains two types of slashes “/” and “\”.

Dependencies
------------

- pyteomics
- numpy
- scipy
- sklearn
- lightgbm
- pandas
- biosaur

Links
-----

- GitHub repo & issue tracker: https://github.com/markmipt/ms1searchpy
- Mailing list: markmipt@gmail.com

- Diffacto repo: https://github.com/statisticalbiotechnology/diffacto
- DeepLC repo: https://github.com/compomics/DeepLC
- Dinosaur repo: https://github.com/fickludd/dinosaur
