Metadata-Version: 2.1
Name: deepnull
Version: 0.2.1
Summary: Models nonlinear interactions between covariates and phenotypes
Home-page: https://github.com/google-health/genomics-research/tree/main/nonlinear-covariate-gwas
Author: Google LLC
License: UNKNOWN
Description: # DeepNull: Modeling non-linear covariate effects improves phenotype prediction and association power
        
        This repository contains code implementing nonlinear covariate modeling to
        increase power in genome-wide association studies, as described in "DeepNull:
        Modeling non-linear covariate effects improves phenotype prediction and
        association power"
        ([Hormozdiari et al 2021](https://doi.org/10.1101/2021.05.26.445783)).
        The code is written using Python 3.7 and TensorFlow 2.4.
        
        ## Installation
        
        Installation is not required to run DeepNull end-to-end; you can just
        [open `DeepNull_e2e.ipynb` in colab](https://colab.research.google.com/github/Google-Health/genomics-research/blob/main/nonlinear-covariate-gwas/DeepNull_e2e.ipynb)
        to try it out.
        
        To install DeepNull locally, run
        
        ```bash
        pip install --upgrade pip
        pip install --upgrade deepnull
        ```
        
        on a machine with Python 3.7+. This installs a CPU-only version, as there are
        typically few enough covariates that using accelerators does not provide
        meaningful speedups.
        
        Verify that the installation is working properly by executing all tests:
        
        ```bash
        python -m deepnull.config_test
        python -m deepnull.data_test
        python -m deepnull.metrics_test
        python -m deepnull.main_test
        python -m deepnull.model_test
        python -m deepnull.train_eval_test
        ```
        
        ## How to run DeepNull
        
        To run locally, there is a single required input file. This file contains the
        phenotype of interest and covariates used to predict the phenotype, formatted as
        a *tab-separated* file suitable for GWAS analysis with
        [PLINK](https://www.cog-genomics.org/plink/2.0/assoc) or
        [BOLT-LMM](https://alkesgroup.broadinstitute.org/BOLT-LMM/BOLT-LMM_manual.html).
        
        Briefly, the file must contain a single header line. The first two columns must
        be `FID` and `IID`, and all `IID` values must be unique.
        
        An example command to train DeepNull to predict the phenotype `pheno` from
        covariates `age`, `sex`, and `genotyping_array` is the following:
        
        ```bash
        python -m deepnull.main \
          --input_tsv=/input/YOUR_PHENOCOVAR_TSV \
          --output_tsv=/output/YOUR_OUTPUT_TSV \
          --target=pheno \
          --covariates="age,sex,genotyping_array"
        ```
        
        To see all available flags, run
        
        ```bash
        python -m deepnull.main --help 2> /dev/null
        ```
        
        Of particular note is the `--model_config` flag. DeepNull uses the
        [ml_collections](https://github.com/google/ml_collections) library to specify
        all parameters related to the model and training regimen. The supported
        configuration code is located in [`config.py`](config.py), and parameters can
        be modified as described in detail in the
        [`ml_collections README`](https://github.com/google/ml_collections#parameterising-the-get_config-function).
        As a brief example, to use the DeepNull architecture with the `elu` activation
        and train with batch size 4096, the above example command would be modified as
        follows:
        
        ```bash
        python -m deepnull.main \
          --input_tsv=/input/ORIGINAL_PHENOCOVAR_TSV \
          --output_tsv=/output/PHENOCOVAR_WITH_DEEPNULL_PREDICTION_TSV \
          --target=pheno \
          --covariates="age,sex,genotyping_array" \
          --model_config=/path/to/config.py:deepnull \
          --model_config.model_config.mlp_activation=elu \
          --model_config.training_config.batch_size=4096
        ```
        
        where `/path/to/config.py` provides the path to [`config.py`](config.py) on your
        machine.
        
        ## Incorporating DeepNull into a GWAS analysis
        
        The above section, "How to run DeepNull", shows that the DeepNull software adds
        a single column to a phenotype+covariate file of interest that represents a
        nonlinear prediction of the target phenotype of interest. To incorporate this
        into a GWAS analysis, the single additional covariate should be **added** as an
        additional covariate. A concrete example with `BOLT-LMM`, using the same file,
        phenotype `pheno`, and covariates `age`, `sex`, `genotyping_array` as above, is
        shown below:
        
        ### Original example GWAS command
        ```bash
        # N.B. Data loading flags are omitted for brevity.
        
        bolt \
          --phenoFile /input/ORIGINAL_PHENOCOVAR_TSV \
          --covarFile /input/ORIGINAL_PHENOCOVAR_TSV \
          --qCovarCol age \
          --qCovarCol sex \
          --qCovarCol genotyping_array \
          --phenoCol pheno
        ```
        
        After running DeepNull on the `/input/ORIGINAL_PHENOCOVAR_TSV` to create the new
        TSV `/output/PHENOCOVAR_WITH_DEEPNULL_PREDICTION_TSV` that includes the column
        `pheno_deepnull`, the updated command is given below:
        
        ### Updated GWAS command to incorporate DeepNull
        ```bash
        # N.B. Data loading flags are omitted for brevity.
        # Note the addition of the single `--qCovarCol pheno_deepnull` line.
        
        bolt \
          --phenoFile /output/PHENOCOVAR_WITH_DEEPNULL_PREDICTION_TSV \
          --covarFile /output/PHENOCOVAR_WITH_DEEPNULL_PREDICTION_TSV \
          --qCovarCol age \
          --qCovarCol sex \
          --qCovarCol genotyping_array \
          --qCovarCol pheno_deepnull \
          --phenoCol pheno
        ```
        
        ## Data
        
        Datasets used to reproduce the results from the above publication are available
        to researchers with approved access to the
        [UK Biobank](https://www.ukbiobank.ac.uk/).
        
        NOTE: the content of this research code repository (i) is not intended to be a
        medical device; and (ii) is not intended for clinical use of any kind, including
        but not limited to diagnosis or prognosis.
        
        This is not an officially supported Google product.
        
Keywords: GWAS
Platform: UNKNOWN
Requires-Python: >=3.7, <4
Description-Content-Type: text/markdown
