Metadata-Version: 1.1
Name: dtool-overlay
Version: 0.3.1
Summary: dtool CLI utilities for working with per item metadata
Home-page: https://github.com/jic-dtool/dtool-overlay
Author: Tjelvar Olsson
Author-email: tjelvar.olsson@gmail.com
License: MIT
Download-URL: https://github.com/jic-dtool/dtool-overlay/tarball/0.3.1
Description: dtool CLI commands for working with per item metadata
        =====================================================
        
        .. image:: https://badge.fury.io/py/dtool-overlay.svg
           :target: http://badge.fury.io/py/dtool-overlay
           :alt: PyPi package
        
        .. image:: https://travis-ci.org/jic-dtool/dtool-overlay.svg?branch=master
           :target: https://travis-ci.org/jic-dtool/dtool-overlay
           :alt: Travis CI build status (Linux)
        
        .. image:: https://codecov.io/github/jic-dtool/dtool-overlay/coverage.svg?branch=master
           :target: https://codecov.io/github/jic-dtool/dtool-overlay?branch=master
           :alt: Code Coverage
        
        Installation
        ------------
        
        ::
        
            pip install dtool-overlay
        
        Example usage
        -------------
        
        Get a dataset to play with::
        
            LOCAL_DS_URI=$(dtool cp -q http://bit.ly/Ecoli-ref-genome .)
        
        Show the existing overlays::
        
            $ dtool overlays show $LOCAL_DS_URI
            identifiers,relpaths
            23ebd7cd21a905d5f255919ca1d0491901cb8718,reference.4.bt2
            37e2d68bb38271036d96b6979d24666e0d4fd814,reference.rev.1.bt2
            41fb9ae5d4f6c37226ff324c701b84bc3110709e,reference.1.bt2
            828ebf503926b7c1b8b07c1995b4ca818814b404,reference.rev.2.bt2
            b445ff5a1e468ab48628a00a944cac2e007fb9bc,U00096.3.fasta
            d21454a7338c53eabc8d8ed7c2f9c3ff4585c4cf,reference.3.bt2
            dda8452b346d51b9cf60f0662ef3d6e3b6da2e74,reference.2.bt2
        
        The output above show that there are no overlays on this dataset. (The
        "identifiers" and "relpaths" columns are there for bookkeeping).
        
        Create a "is_fasta" boolean overlay template by using a glob pattern::
        
            $ dtool overlays template glob $LOCAL_DS_URI is_fasta '*.fasta' > is_fasta.csv
            $ cat is_fasta.csv
            identifiers,is_fasta,relpaths
            23ebd7cd21a905d5f255919ca1d0491901cb8718,False,reference.4.bt2
            37e2d68bb38271036d96b6979d24666e0d4fd814,False,reference.rev.1.bt2
            41fb9ae5d4f6c37226ff324c701b84bc3110709e,False,reference.1.bt2
            828ebf503926b7c1b8b07c1995b4ca818814b404,False,reference.rev.2.bt2
            b445ff5a1e468ab48628a00a944cac2e007fb9bc,True,U00096.3.fasta
            d21454a7338c53eabc8d8ed7c2f9c3ff4585c4cf,False,reference.3.bt2
            dda8452b346d51b9cf60f0662ef3d6e3b6da2e74,False,reference.2.bt2
        
        Write the overlay template to the dataset::
        
            $ dtool overlays write $LOCAL_DS_URI is_fasta.csv
            
        Show the newly created overlay::
        
            $ dtool overlays show $LOCAL_DS_URI
            identifiers,is_fasta,relpaths
            23ebd7cd21a905d5f255919ca1d0491901cb8718,False,reference.4.bt2
            37e2d68bb38271036d96b6979d24666e0d4fd814,False,reference.rev.1.bt2
            41fb9ae5d4f6c37226ff324c701b84bc3110709e,False,reference.1.bt2
            828ebf503926b7c1b8b07c1995b4ca818814b404,False,reference.rev.2.bt2
            b445ff5a1e468ab48628a00a944cac2e007fb9bc,True,U00096.3.fasta
            d21454a7338c53eabc8d8ed7c2f9c3ff4585c4cf,False,reference.3.bt2
            dda8452b346d51b9cf60f0662ef3d6e3b6da2e74,False,reference.2.bt2
        
        To extract multiple pieces of metadata from the items' relpath one can use the
        ``dtool overlays template parse`` command. This takes as input a dataset URI, a
        parse rule (see https://pypi.org/project/parse/ for more details) and a glob
        rule. The latter decides which relpaths to apply the parsing to.
        
        Consider for example the dataset below::
        
            $ dtool ls http://bit.ly/Ecoli-reads-minified
            8bda245a8cd526673aab775f90206c8b67d196af  ERR022075_2.fastq.gz
            9760280dc6313d3bb598fa03c5931a7f037d7ffc  ERR022075_1.fastq.gz
        
        
        The command below could be used to generate a template for the overlays
        "useful_name" and "read"::
        
            $ dtool overlays template parse  \
                http://bit.ly/Ecoli-reads-minified  \
                '{useful_name}_{read:d}.fastq.gz'
        
        Results in the CSV output below::
        
            identifiers,read,useful_name,relpaths
            8bda245a8cd526673aab775f90206c8b67d196af,2,ERR022075,ERR022075_2.fastq.gz
            9760280dc6313d3bb598fa03c5931a7f037d7ffc,1,ERR022075,ERR022075_1.fastq.gz
        
        To ignore a variable element when parsing one can use unnamed curly braces. The
        command below for example only generates the overlay "useful_name"::
        
            $ dtool overlays template parse  \
                http://bit.ly/Ecoli-reads-minified  \
                '{useful_name}_{:d}.fastq.gz'
            identifiers,useful_name,relpaths
            8bda245a8cd526673aab775f90206c8b67d196af,ERR022075,ERR022075_2.fastq.gz
            9760280dc6313d3bb598fa03c5931a7f037d7ffc,ERR022075,ERR022075_1.fastq.gz
        
         
        Sometimes it is useful to be able to find pairs of items. For example when
        dealing with genomic sequencing data that has forward and reverse reads.
        
        One can create a "pair_id" overlay CSV template for this dataset using the
        command below::
        
            $  dtool overlays template pairs http://bit.ly/Ecoli-reads-minified .fastq.gz
            identifiers,pair_id,relpaths
            8bda245a8cd526673aab775f90206c8b67d196af,9760280dc6313d3bb598fa03c5931a7f037d7ffc,ERR022075_2.fastq.gz
            9760280dc6313d3bb598fa03c5931a7f037d7ffc,8bda245a8cd526673aab775f90206c8b67d196af,ERR022075_1.fastq.gz
        
        In the above the suffix ".fastq.gz" is used to extract the prefix ``ERR022075_``
        that is used to find matching pairs.
        
        
        Related packages
        ----------------
        
        - `dtoolcore <https://github.com/jic-dtool/dtoolcore>`_
        - `dtool-cli <https://github.com/jic-dtool/dtool-cli>`_
        - `dtool-create <https://github.com/jic-dtool/dtool-create>`_
        
Platform: UNKNOWN
