
# ANSURR | Accuracy of NMR Structures Using RCI and Rigidity v2.0.53

ANSURR uses backbone chemical shifts to validate the accuracy of NMR protein structures as described here https://www.nature.com/articles/s41467-020-20177-1. This repository contains the code required to install and run ANSURR on a Linux or a Mac. ANSURR v1.2.1 is also available on NMRbox (https://nmrbox.org/software/ansurr). Please let me know if you have any issues. 

## Installation

ANSURR v2 is installed using pip (https://packaging.python.org/en/latest/tutorials/installing-packages/). 

`pip install ansurr`

You will also need java in order to re-reference chemical shifts using PANAV (recommended) (https://java.com/en/download/help/download_options.html).

## Running ANSURR

ANSURR requires two input files, a NMR protein structure in PDB format and a shifts file in NEF format or NMR Star v3 format. To re-reference chemical shifts using PANAV before running ANSURR (recommended):

`ansurr -p xxxx.pdb -s xxxx.nef -r`

To run without re-referencing chemical shifts:

`ansurr -p xxxx.pdb -s xxxx.nef`

Options:

`-p` input pdb file

`-s` input shifts file

`-h` print the help message 

`-l` include free ligands when computing flexibility.

`-m` only output the ANSURR scores in a text file

`-n` include non-standard residues when computing flexibility. Note that RCI will not be calculated for non-standard residues and so they will not be used to compute validation scores. Regardless, including non-standard residues is a good idea to avoid breaks in the protein structure which would otherwise make those regions too floppy.

`-o` combine chains into a single structure when calculating flexibility. This is useful when the structure is an oligomer as oligomerisation will often result in changes in flexibility.

`-r` re-reference chemical shifts using PANAV before running ANSURR (recommended).

`-q` suppress output to the terminal

`-v` print version details

`-w` compute ANSURR scores for the well-defined residues identified by CYRANGE. These scores are computed using a separate benchmark for well-defined residues.

## Output

A directory called `<yourpdbfile>_<yourshiftfile>` is made to save the output generated. This directory will be overwritten if you run ANSURR again with input files with the same names as before. This directory contains two directories called  `ANSURR_output` and `other_output`. `ANSURR_output` contains:  

* `scores.out` - a text file with the validation scores for each model 
* `<yourpdbfile>_<yourshiftfile>_ansurr.nef` - a NEF file with most output generated by ANSURR
* `<yourpdbfile>_<yourshiftfile>_ansurr.json` - a json file with most output generated by ANSURR
* `<yourpdbfile>_<yourshiftfile>.png` - a graphical summary of the validation scores for each model 
* `out/` - text files for each model which detail the following for each residue: flexibility predicted by RCI, flexibility predicted by FIRST, secondary structure according to DSSP, well-defined regions of the ensemble according to CYRANGE, backbone chemical shift completeness and which atom types have chemical shift data
* `figs/` - plots of protein flexibility predicted by RCI (blue) and FIRST (orange) for each model. Alpha helical and beta sheet secondary structure indicated by red and blue dots, respectively. Green dots indicate regions that are well-defined according to CYRANGE. Black crosses indicate residues with no chemical shift data (not used to compute validation scores). 

`other_output` contains output from various programs run as part of ANSURR:

* `PANAV/` - re-referenced chemical shifts
* `RCI/` - flexibility predicted from chemical shifts using RCI
* `extracted_pdbs/` - PDB files for each model extracted from the NMR structure
* `DSSP/` - secondary structure for each model according to the program DSSP
* `FIRST/` - flexibility predicted for each model using FIRST

## Help

Contact Nick Fowler (njfowler.com) for support, queries or suggestions.

## Known Issues

- The Mac version of ANSURR gives slightly different ANSURR scores (mean difference of 1.2) for 0.5% of models tested so far. 99.5% of models have identical ANSURR scores between the linux/Mac versions.

- Secondary structure is currently not computed in the Mac version.

## Acknowledgements

Random Coil Index (RCI) | Berjanskii, M.V. &amp; Wishart, D.S. A simple method to predict protein flexibility using secondary chemical shifts. Journal of the American Chemical Society 127, 14970-14971 (2005).

Floppy Inclusions and Rigid Substructure Topography (FIRST) | Jacobs, D.J., Rader, A.J., Kuhn, L.A. &amp; Thorpe, M.F. Protein flexibility predictions using graph theory. Proteins-Structure Function and Genetics 44, 150-165 (2001).

Probabilistic Approach to NMR Assignment and Validation (PANAV) | Bowei Wang, Yunjun Wang and David S. Wishart. "A probabilistic approach for validating protein NMR chemical shift assignments". Journal of Biomolecular NMR. Volume 47, Number 2 / June 2010: 85-99

DSSP | A series of PDB related databases for everyday needs. Wouter G Touw, Coos Baakman, Jon Black, Tim AH te Beek, E Krieger, Robbie P Joosten, Gert Vriend. Nucleic Acids Research 2015 January; 43(Database issue): D364-D368. | Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Kabsch W, Sander C, Biopolymers. 1983 22 2577-2637.

adjustText - automatic label placement for matplotlib | https://github.com/Phlya/adjustText

CYRANGE | D.K. Kirchner &amp; P. Güntert, BMC Bioinformatics 2011, 12 170.











