Metadata-Version: 2.1
Name: rafm
Version: 0.3.0
Summary: rafm
Home-page: https://github.com/unmtransinfo/rafm
License: MIT
Keywords: science,biology,bioinformatics,pharmacology,data science,protein,sequences,structural biology,AlphaFold
Author: UNM Translational Informatics Team
Author-email: datascience.software@salud.unm.edu
Requires-Python: >=3.8.1,<3.10
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Dist: loguru (>=0.5.3,<0.6.0)
Requires-Dist: numpy
Requires-Dist: pandas (>=1.3.4,<2.0.0)
Requires-Dist: statsdict (>=0.1.3,<0.2.0)
Requires-Dist: typer
Project-URL: Changelog, https://github.com/unmtransinfo/rafm/releases
Project-URL: Documentation, https://rafm.readthedocs.io
Project-URL: Repository, https://github.com/unmtransinfo/rafm
Description-Content-Type: text/x-rst

================================
rafm Reliable AlphaFold Measures
================================

.. image:: https://raw.githubusercontent.com/unmtransinfo/rafm/master/docs/_static/calmodulin.png
   :target: https://raw.githubusercontent.com/unmtransinfo/rafm/master/docs/_static/calmodulin.png
   :alt: AlphaFold model and two crystal structures of calmodulin

*rafm* computes per-model measures associated with atomic-level accuracy for
AlphaFold models from *pLDDT* confidence scores.  Outputs are to a
tab-separated file.


Installation
------------

You can install *rafm* via pip_ from PyPI_:

.. code:: console

   $ pip install rafm


Usage
-----
*rafm --help* lists all commands. Current commands are:

* *plddt-stats*
    Calculate stats on bounded pLDDTs from list of AlphaFold model files.
    in PDB format.

    Options:

        * *--criterion FLOAT*
            The cutoff value on truncated pLDDT for possible utility. [default: 91.2]
        * *--min-length INTEGER*
            The minimum sequence length for which to calculate truncated stats.
            [default: 20]
        * *--min-count INTEGER*
            The minimum number of truncated *pLDDT* values for which to calculate stats.
            [default: 20]
        * *--lower-bound INTEGER*
            The *pLDDT* value below which stats will not be calculated. [default: 80]
        * *--upper-bound INTEGER*
            The *pLDDT* value above which stats will not be calculated. [default: 100]
        * *--file-stem TEXT*
            Output file name stem. [default: rafm]

    Output columns (where *NN* is the bounds specifier, default: 80):

        * *residues_in_pLDDT*
            The number of residues in the AlphaFold model.
        * *pLDDT_mean*
            The mean value of pLDDT over all residues.
        * *pLDDT_median*
            The median value of pLDDT over all residues.
        * *pLDDTNN_count*
            The number of residues within bounds.
        * *pLDDTNN_frac*
            The fraction of pLDDT values within bounds, if the
            count is greater than the minimum.
        * *pLDDTNNN_mean*
            The mean of pLDDT values within bounds, if the
            count is greater than the minimum.
        * *pLDDTNN_median*
            The median of pLDDT values within bounds, if the
            count is greater than the minimum.
        * *LDDT_expect*
            The expectation value of global *LDDT* over the
            residues with *LDDT* within bounds.  Only
            produced if default bounds are used.
        * *passing*
            True if the model passed the criterion, False
            otherwise.  Only produced if default bounds are
            used.
        * *file*
            The path to the model file.

* *plddt-select-residues*
    Writes a tab-separated file of residues from passing models,
    using an input file of values selected by *plddt-stats*.
    Input options are the same as *plddt-stats*.

    Output columns:

        * *file*
            Path to the model file.
        * *residue*
            Residue number, starting from 0 and numbered
            sequentially.  Note that *all* residues will be
            written, regardless of bounds set.
        * *pLDDT*
            pLDDT value for that residue.

Statistical Basis
-----------------
The default parameters were chosen to select for *LDDT* values of greater
than 80 on a set of crystal structures obtained since AlphaFold was trained.  The
distributions of *LDDT* scores for the passing and non-passing sets, along
with an (overlapping) set of PDB files at 100% sequence identity over
at least 80% of the sequence looks like this:

.. image:: https://raw.githubusercontent.com/unmtransinfo/rafm/master/docs/_static/lddt_dist.png
   :target: https://raw.githubusercontent.com/unmtransinfo/rafm/master/docs/_static/lddt_dist.png
   :alt: Distribution of high-scoring, low-scoring, and high-similarity structures

The markers on the *x*-axis refer to the size of conformational changes observed in
conformational changes in various protein crystal structures:

* *CALM*
    Between calcum-bound and calcium-free calmodulin (depicted in the logo image above).
* *ERK2*
    Between unphosphorylated and doubly-phosphorylated ERK2 kinase.
* *HB*
    Between R- and T-state hemoglobin
* *MB*
    Between carbonmonoxy- and deoxy-myoglobin

When applied to set of "dark" genomes with no previous PDB entries, the distributions of
median *pLDDT* scores with a lower bound of 80 and per-residue *pLDDT* scores looks like
this:

.. image:: https://raw.githubusercontent.com/unmtransinfo/rafm/master/docs/_static/tdark_dist.png
   :target: https://raw.githubusercontent.com/unmtransinfo/rafm/master/docs/_static/tdark_dist.png
   :alt: Distribution of *pLDDT80* scores and per-residue *pLDDT* scores


Contributing
------------

Contributions are very welcome.
To learn more, see the `Contributor Guide`_.


License
-------

Distributed under the terms of the `MIT license`_,
*rafm* is free and open source software.


Issues
------

If you encounter any problems,
please `file an issue`_ along with a detailed description.


Credits
-------

This project was generated from the `UNM Translational Informatics Python Cookiecutter`_ template.

*rafm* was written by Joel Berendzen and Jessica Binder.

.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _MIT license: https://opensource.org/licenses/MIT
.. _PyPI: https://pypi.org/
.. _UNM Translational Informatics Python Cookiecutter: https://github.com/unmtransinfo/cookiecutter-unmtransinfo-python
.. _file an issue: https://github.com/unmtransinfo/rafm/issues
.. _pip: https://pip.pypa.io/
.. github-only
.. _Contributor Guide: CONTRIBUTING.rst
.. _Usage: https://rafm.readthedocs.io/en/latest/usage.html

