Overview
=========

CpGtools package provides a number of Python programs to annotate, QC, visualize, and
analyze DNA methylation data generated from Illumina
`HumanMethylation450 BeadChip (450K) <https://support.illumina.com/array/array_kits/infinium_humanmethylation450_beadchip_kit.html>`_ /
`MethylationEPIC BeadChip (850K) <https://www.illumina.com/documents/products/datasheets/datasheet_CytoSNP850K_POP.pdf>`_ array or
`RRBS / WGBS <https://www.illumina.com/science/sequencing-method-explorer/kits-and-arrays/rrbs-seq-scrrbs.html>`_.

These programs can be divided into three groups:

- CpG position analysis modules
- CpG signal analysis modules
- Differential CpG analysis modules

CpG position analysis modules
-----------------------------
These modules are primarily used to analyze CpG's genomic locations. 

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Name                                                                                                                                                                                           | Description                                                                                                                                                                                                                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `CpG_aggregation.py <https://cpgtools.readthedocs.io/en/latest/demo/CpG_aggregation.html>`_                                                                                                    | Aggregate proportion values of CpGs that located in give genomic regions (eg. CpG islands, promoters, exons, etc.).                                                                                                                                                                                                        |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `CpG_anno_position.py <https://cpgtools.readthedocs.io/en/latest/demo/CpG_anno_position.html>`_                                                                                                | Add annotation information CpGs according to their genomic coordinates.                                                                                                                                                                                                                                                    |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `CpG_anno_probe.py <https://cpgtools.readthedocs.io/en/latest/demo/CpG_anno_probe.html>`_                                                                                                      | Add annotation information to 450K/850K probes.                                                                                                                                                                                                                                                                            |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `CpG_density_gene_centered.py <https://cpgtools.readthedocs.io/en/latest/demo/CpG_density_gene_centered.html>`_                                                                                | Generate the `CpG density (count) profile <https://cpgtools.readthedocs.io/en/latest/_images/CpG_density.png>`_ over gene body and the up/down-stream intergenic regions.                                                                                                                                                  |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `CpG_distrb_chrom.py <https://cpgtools.readthedocs.io/en/latest/demo/CpG_distrb_chrom.html>`_                                                                                                  | Calculate the distribution of CpG over chromosomes.                                                                                                                                                                                                                                                                        |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `CpG_distrb_gene_centered.py <https://cpgtools.readthedocs.io/en/latest/demo/CpG_distrb_gene_centered.html>`_                                                                                  | Calculate the distribution of CpG over `gene-centered genomic regions <https://cpgtools.readthedocs.io/en/latest/_images/geneDist.png>`_.                                                                                                                                                                                  |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `CpG_distrb_region.py <https://cpgtools.readthedocs.io/en/latest/demo/CpG_distrb_region.html>`_                                                                                                | Calculate the distribution of CpG over `user-specified genomic regions <https://cpgtools.readthedocs.io/en/latest/_images/regionDist.png>`_.                                                                                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `CpG_logo.py <https://cpgtools.readthedocs.io/en/latest/demo/CpG_logo.html>`_                                                                                                                  | Generate a `DNA motif logo <https://cpgtools.readthedocs.io/en/latest/_images/450_CH.logo.png>`_ and matrices for a given set of CpGs.                                                                                                                                                                                     |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `CpG_to_gene.py <https://cpgtools.readthedocs.io/en/latest/demo/CpG_to_gene.html>`_                                                                                                            | Assign CpGs to their putative target genes. It uses the algorithm similar to `GREAT <http://great.stanford.edu/public/html/>`_.                                                                                                                                                                                            |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

CpG signal analysis modules
----------------------------
These modules are primarily used to analyze CpG's DNA methylation beta values 

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Name                                                                                                                                                                                           | Description                                                                                                                                                                                                                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `beta_PCA.py <https://cpgtools.readthedocs.io/en/latest/demo/beta_PCA.html>`_                                                                                                                  | Perform `PCA <https://en.wikipedia.org/wiki/Principal_component_analysis>`_ (principal component analysis) for samples.                                                                                                                                                                                                    |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `beta_jitter_plot.py <https://cpgtools.readthedocs.io/en/latest/demo/beta_jitter_plot.html>`_'                                                                                                 | Generate `jitter plot <https://cpgtools.readthedocs.io/en/latest/_images/Jitter.png>`_ (a.k.a. strip chart) and bean plot for each sample."                                                                                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `beta_m_conversion.py <https://cpgtools.readthedocs.io/en/latest/demo/beta_m_conversion.html>`_                                                                                                | Convert Beta-value into M-value or *vice versa*.                                                                                                                                                                                                                                                                           |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `beta_profile_gene_centered.py <https://cpgtools.readthedocs.io/en/latest/demo/beta_profile_gene_centered.html>`_                                                                              | Calculate the `methylation profile <https://cpgtools.readthedocs.io/en/latest/_images/gene_profile.png>`_ (i.e., average beta value) for genomic regions around genes.                                                                                                                                                     |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `beta_profile_region.py <https://cpgtools.readthedocs.io/en/latest/demo/beta_profile_region.html>`_                                                                                            | Calculate `methylation profile <https://cpgtools.readthedocs.io/en/latest/_images/region_profile.png>`_ (i.e. average beta value) around the user-specified genomic regions.                                                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `beta_stacked_barplot.py <https://cpgtools.readthedocs.io/en/latest/demo/beta_stacked_barplot.html>`_                                                                                          | Create `stacked barplot <https://cpgtools.readthedocs.io/en/latest/_images/stacked_bar.png>`_ for each sample. The stacked barplot showing the proportions of CpGs whose beta values are falling into [0,0.25], [0.25,0.5], [0.5,0.75],[0.75,1]                                                                            |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `beta_stats.py <https://cpgtools.readthedocs.io/en/latest/demo/beta_stats.html>`_                                                                                                              | Summarize basic information on CpGs located in each genomic region.                                                                                                                                                                                                                                                        |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `beta_tSNE.py <https://cpgtools.readthedocs.io/en/latest/demo/beta_tSNE.html>`_                                                                                                                | Perform `t-SNE <https://lvdmaaten.github.io/tsne/>`_ (t-Distributed Stochastic Neighbor Embedding) analysis for samples.                                                                                                                                                                                                   |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `beta_topN.py <https://cpgtools.readthedocs.io/en/latest/demo/beta_topN.html>`_                                                                                                                | Select the top N most variable CpGs (according to standard deviation) from the input file.                                                                                                                                                                                                                                 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `beta_trichotmize.py <https://cpgtools.readthedocs.io/en/latest/demo/beta_trichotmize.html>`_                                                                                                  | Use `Bayesian Gaussian Mixture model <https://scikit-learn.org/stable/modules/generated/sklearn.mixture.BayesianGaussianMixture.html>`_ to trichotmize beta values into three status: 'Un-methylated','Semi-methylated', 'Full-methylated', and 'unassigned'.                                                              |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Differential CpG analysis modules
----------------------------------
These modules are primarily used to identify CpGs that are differentially methylated between groups

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Name                                                                                                                                                                                           | Description                                                                                                                                                                                                                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `dmc_Bayes.py <https://cpgtools.readthedocs.io/en/latest/demo/dmc_Bayes.html>`_                                                                                                                | Differential CpG analysis using the Bayesian approach. (for 450K/850K data)                                                                                                                                                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `dmc_bb.py <https://cpgtools.readthedocs.io/en/latest/demo/dmc_bb.html>`_                                                                                                                      | Differential CpG analysis using the beta-binomial model. (for RRBS/WGBS count data)                                                                                                                                                                                                                                        |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `dmc_fisher.py <https://cpgtools.readthedocs.io/en/latest/demo/dmc_fisher.html>`_                                                                                                              | Differential CpG analysis using Fisher's Exact Test. (for RRBS/WGBS count data)                                                                                                                                                                                                                                            |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `dmc_glm.py <https://cpgtools.readthedocs.io/en/latest/demo/dmc_glm.html>`_                                                                                                                    | Differential CpG analysis using the `GLM <https://en.wikipedia.org/wiki/Generalized_linear_model>`_ generalized liner model. (for 450K/850K data)                                                                                                                                                                          |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `dmc_logit.py <https://cpgtools.readthedocs.io/en/latest/demo/dmc_logit.html>`_                                                                                                                | Differential CpG analysis using logistic regression model. (for RRBS/WGBS count data)                                                                                                                                                                                                                                      |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `dmc_nonparametric.py <https://cpgtools.readthedocs.io/en/latest/demo/dmc_nonparametric.html>`_                                                                                                | Differential CpG analysis using `Mann-Whitney U test <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html>`_ for two group comparison, and the `Kruskal-Wallis H-test <https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance>`_ for multiple groups comparison. |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `dmc_ttest.py <https://cpgtools.readthedocs.io/en/latest/demo/dmc_ttest.html>`_                                                                                                                | Differential CpG analysis using T test. (for 450K/850K data)                                                                                                                                                                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

