Metadata-Version: 2.1
Name: epigenomic_dataset
Version: 1.2.0
Summary: Python package wrapping ENCODE epigenomic data for a number of reference cell lines.
Home-page: https://github.com/LucaCappelletti94/epigenomic_dataset
Author: Luca Cappelletti
Author-email: cappelletti.luca94@gmail.com
License: MIT
Description: epigenomic_dataset
        =========================================================================================
        |travis| |sonar_quality| |sonar_maintainability|
        |codacy| |code_climate_maintainability| |pip| |downloads|
        
        Python package wrapping ENCODE epigenomic data
        for several reference cell lines.
        
        How do I install this package?
        ----------------------------------------------
        As usual, just download it using pip:
        
        .. code:: shell
        
            pip install epigenomic_dataset
        
        Tests Coverage
        ----------------------------------------------
        Since some software handling coverages sometimes get slightly
        different results, here's three of them:
        
        |coveralls| |sonar_coverage| |code_climate_coverage|
        
        
        TODO: THE FOLLOWING SECTION WILL NEED RESTRUCTURING IN A LITTLE BIT!
        
        Preprocessed data for cis-regulatory regions
        -----------------------------------------------
        We have already downloaded and obtained the max window value for each promoter and enhancer
        region for the cell lines A549, GM12878, H1, HEK293, HepG2, K562 and MCF7 in the dataset Fantom
        and cell lines A549, GM12878, H1, HepG2 and K562 for the Roadmap dataset taking in consideration
        all the target features listed `in the complete table of epigenomes <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/epigenomic_dataset/epigenomes.csv>`__.
        
        The thresholds used for classifying the activations of enhancers and promoters in Fantom are the
        default explained in the sister pipeline `CRR labels <https://github.com/LucaCappelletti94/crr_labels>`__
        which handles the download and preprocessing of the data from Fantom and Roadmap.
        
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        |   Dataset         |   Cell line         |   Promoters                                                                                                                                                                                                                                                                           |   Enhancers                                                                                                                                                                                                                                                                           |
        +===================+=====================+==========================================================================================================================================+============================================================================================================================================+==========================================================================================================================================+============================================================================================================================================+
        | Fantom            | A549                | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/promoters/A549.csv.gz?raw=true>`__     | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/promoters/A549.csv.gz?raw=true>`__     | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/enhancers/A549.csv.gz?raw=true>`__     | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/enhancers/A549.csv.gz?raw=true>`__     |
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        | Fantom            | GM12878             | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/promoters/GM12878.csv.gz?raw=true>`__  | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/promoters/GM12878.csv.gz?raw=true>`__  | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/enhancers/GM12878.csv.gz?raw=true>`__  | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/enhancers/GM12878.csv.gz?raw=true>`__  |
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        | Fantom            | H1                  | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/promoters/H1.csv.gz?raw=true>`__       | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/promoters/H1.csv.gz?raw=true>`__       | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/enhancers/H1.csv.gz?raw=true>`__       | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/enhancers/H1.csv.gz?raw=true>`__       |
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        | Fantom            | HEK293              | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/promoters/HEK293.csv.gz?raw=true>`__   | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/promoters/HEK293.csv.gz?raw=true>`__   | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/enhancers/HEK293.csv.gz?raw=true>`__   | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/enhancers/HEK293.csv.gz?raw=true>`__   |
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        | Fantom            | HepG2               | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/promoters/HepG2.csv.gz?raw=true>`__    | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/promoters/HepG2.csv.gz?raw=true>`__    | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/enhancers/HepG2.csv.gz?raw=true>`__    | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/enhancers/HepG2.csv.gz?raw=true>`__    |
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        | Fantom            | K562                | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/promoters/K562.csv.gz?raw=true>`__     | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/promoters/K562.csv.gz?raw=true>`__     | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/enhancers/K562.csv.gz?raw=true>`__     | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/enhancers/K562.csv.gz?raw=true>`__     |
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        | Fantom            | MCF-7               | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/promoters/MCF-7.csv.gz?raw=true>`__    | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/promoters/MCF-7.csv.gz?raw=true>`__    | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/enhancers/MCF-7.csv.gz?raw=true>`__    | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/enhancers/MCF-7.csv.gz?raw=true>`__    |
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        | Roadmap           | A549                | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/200/promoters/A549.csv.gz?raw=true>`__    | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/1000/promoters/A549.csv.gz?raw=true>`__    | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/200/enhancers/A549.csv.gz?raw=true>`__    | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/1000/enhancers/A549.csv.gz?raw=true>`__    |
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        | Roadmap           | GM12878             | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/200/promoters/GM12878.csv.gz?raw=true>`__ | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/1000/promoters/GM12878.csv.gz?raw=true>`__ | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/200/enhancers/GM12878.csv.gz?raw=true>`__ | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/1000/enhancers/GM12878.csv.gz?raw=true>`__ |
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        | Roadmap           | H1                  | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/200/promoters/H1.csv.gz?raw=true>`__      | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/1000/promoters/H1.csv.gz?raw=true>`__      | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/200/enhancers/H1.csv.gz?raw=true>`__      | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/1000/enhancers/H1.csv.gz?raw=true>`__      |
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        | Roadmap           | HepG2               | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/200/promoters/HepG2.csv.gz?raw=true>`__   | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/1000/promoters/HepG2.csv.gz?raw=true>`__   | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/200/enhancers/HepG2.csv.gz?raw=true>`__   | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/1000/enhancers/HepG2.csv.gz?raw=true>`__   |
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        | Roadmap           | K562                | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/200/promoters/K562.csv.gz?raw=true>`__    | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/1000/promoters/K562.csv.gz?raw=true>`__    | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/200/enhancers/K562.csv.gz?raw=true>`__    | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/1000/enhancers/K562.csv.gz?raw=true>`__    |
        +-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
        
        Here are the labels for all the considered cell lines.
        
        +-------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+
        |   Dataset         |   Promoters                                                                                                                                                                                                                                                           |   Enhancers                                                                                                                                                                                                                                                           |
        +===================+==================================================================================================================================+====================================================================================================================================+==================================================================================================================================+====================================================================================================================================+
        | Fantom            | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/promoters.bed.gz?raw=true>`__  | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/promoters.bed.gz?raw=true>`__  | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/200/enhancers.bed.gz?raw=true>`__  | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/fantom/1000/enhancers.bed.gz?raw=true>`__  |
        +-------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+
        | Roadmap           | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/200/promoters.bed.gz?raw=true>`__ | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/1000/promoters.bed.gz?raw=true>`__ | `200 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/200/enhancers.bed.gz?raw=true>`__ | `1000 <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/preprocessed/roadmap/1000/enhancers.bed.gz?raw=true>`__ |
        +-------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+
        
        TODO: align promoters and enhancers in a reference labels dataset.
        
        The complete pipeline used to retrieve the CRR epigenomic data is available
        `here <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/run_crr_build.py>`__.
        
        Automatic retrieval of preprocessed data
        ----------------------------------------------
        You can automatically retrieve the data as follows:
        
        .. code:: python
        
            from epigenomic_dataset import load_epigenomes
        
            X, y = load_epigenomes(
                cell_line = "K562",
                dataset = "fantom",
                regions = "promoters",
                window_size = 200,
                root = "datasets" # Path where to download data
            )
        
        Pipeline for epigenomic data
        ----------------------------------------------
        The considered raw data are from `this query from the ENCODE project <https://www.encodeproject.org/search/?searchTerm=fold+change+over+control&type=Experiment&assembly=hg19&status=released&biosample_ontology.classification=cell+line&files.file_type=bigWig&replication_type=isogenic&audit.ERROR.category%21=extremely+low+read+depth&audit.ERROR.category%21=inconsistent+genetic+modification+reagent+source+and+identifier&audit.ERROR.category%21=missing+control+alignments&audit.ERROR.category%21=extremely+low+read+length&audit.NOT_COMPLIANT.category%21=insufficient+read+depth&audit.NOT_COMPLIANT.category%21=missing+controlled_by&audit.NOT_COMPLIANT.category%21=insufficient+read+length&audit.NOT_COMPLIANT.category%21=insufficient+replicate+concordance&audit.NOT_COMPLIANT.category%21=severe+bottlenecking&audit.NOT_COMPLIANT.category%21=control+insufficient+read+depth&audit.NOT_COMPLIANT.category%21=poor+library+complexity&limit=all>`_
        
        You can find the `complete table of the available epigenomes here <https://github.com/LucaCappelletti94/epigenomic_dataset/blob/master/epigenomic_dataset/epigenomes.csv>`_.
        These datasets were selected to have
        (at time of the writing,  07/02/2020)
        the least possible amount of known problems, such as
        low read resolution.
        
        You can run the pipeline as follows: suppose you
        want to extract the epigenomic features for the cell lines HepG2 and H1:
        
        .. code:: python
        
            from epigenomic_dataset import build
        
            build(
                bed_path="path/to/my/bed/file.bed",
                cell_lines=["HepG2", "H1"]
            )
        
        If you want to specify where to store the files use:
        
        .. code:: python
        
            from epigenomic_dataset import build
        
            build(
                bed_path="path/to/my/bed/file.bed",
                cell_lines=["HepG2", "H1"],
                path="path/to/my/target"
            )
        
        By default, the downloaded bigWig files are not deleted.
        You can choose to delete the files as follows:
        
        .. code:: python
        
            from epigenomic_dataset import build
        
            build(
                bed_path="path/to/my/bed/file.bed",
                cell_lines=["HepG2", "H1"],
                path="path/to/my/target",
                clear_download=True
            )
        
        
        .. |travis| image:: https://travis-ci.org/LucaCappelletti94/epigenomic_dataset.png
           :target: https://travis-ci.org/LucaCappelletti94/epigenomic_dataset
           :alt: Travis CI build
        
        .. |sonar_quality| image:: https://sonarcloud.io/api/project_badges/measure?project=LucaCappelletti94_epigenomic_dataset&metric=alert_status
            :target: https://sonarcloud.io/dashboard/index/LucaCappelletti94_epigenomic_dataset
            :alt: SonarCloud Quality
        
        .. |sonar_maintainability| image:: https://sonarcloud.io/api/project_badges/measure?project=LucaCappelletti94_epigenomic_dataset&metric=sqale_rating
            :target: https://sonarcloud.io/dashboard/index/LucaCappelletti94_epigenomic_dataset
            :alt: SonarCloud Maintainability
        
        .. |sonar_coverage| image:: https://sonarcloud.io/api/project_badges/measure?project=LucaCappelletti94_epigenomic_dataset&metric=coverage
            :target: https://sonarcloud.io/dashboard/index/LucaCappelletti94_epigenomic_dataset
            :alt: SonarCloud Coverage
        
        .. |coveralls| image:: https://coveralls.io/repos/github/LucaCappelletti94/epigenomic_dataset/badge.svg?branch=master
            :target: https://coveralls.io/github/LucaCappelletti94/epigenomic_dataset?branch=master
            :alt: Coveralls Coverage
        
        .. |pip| image:: https://badge.fury.io/py/epigenomic-dataset.svg
            :target: https://badge.fury.io/py/epigenomic-dataset
            :alt: Pypi project
        
        .. |downloads| image:: https://pepy.tech/badge/epigenomic-dataset
            :target: https://pepy.tech/badge/epigenomic-dataset
            :alt: Pypi total project downloads
        
        .. |codacy| image:: https://api.codacy.com/project/badge/Grade/85bc1e3d96bf4c43a2ca70ca233a1bca
            :target: https://www.codacy.com/manual/LucaCappelletti94/epigenomic_dataset?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=LucaCappelletti94/epigenomic_dataset&amp;utm_campaign=Badge_Grade
            :alt: Codacy Maintainability
        
        .. |code_climate_maintainability| image:: https://api.codeclimate.com/v1/badges/64bfb8eb5a73959ea0d3/maintainability
            :target: https://codeclimate.com/github/LucaCappelletti94/epigenomic_dataset/maintainability
            :alt: Maintainability
        
        .. |code_climate_coverage| image:: https://api.codeclimate.com/v1/badges/64bfb8eb5a73959ea0d3/test_coverage
            :target: https://codeclimate.com/github/LucaCappelletti94/epigenomic_dataset/test_coverage
            :alt: Code Climate Coverate
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Provides-Extra: test
