Metadata-Version: 2.1
Name: pyd4
Version: 0.3.6
Summary: UNKNOWN
Home-page: UNKNOWN
License: UNKNOWN
Description: # pyd4 - Python Binding for D4 File Format
        
        This python module allows python read and write D4 file. It provides API for Python user effeicnet way to handle genomic quantatitive data, provides effecient routines to summarize, load and dump data from and to D4 format.
        
        Also it provides a very effecient way to profile the covarage from a BAM file and load it to numpy array (typically the entire process only takes less than 2 minutes).
        
        ## Installation
        
        *Note: PyD4 doesn't support Python2 or earlier, please use Python3 or later.*
        
        Install through pip is recommended
        
        ```bash
        pip install pyd4
        ```
        
        or you can also install from source with the `setup.py`:
        
        ```bash
        git clone https://github.com/38/d4-format.git
        cd d4-format/pyd4
        ./setup.py install
        ```
        
        
        ## Quick start by Example
        
        Here's some basic example to use the package.
        
        ```python
        from pyd4 import D4File
        
        # Open a D4 File
        file = D4File("test.d4")
        
        # Print the chrom list
        print(file.chroms())
        
        # Get the mean cov for region chr1:10000000-20000000
        print(file.mean("chr1:10000000-20000000"))
        
        # Get a iterator over values
        for i in file.value_iter("chr1", 0, 10000):
        	print(i)
        
        # Load the values to numpy 
        data = file["chr1:0-10000"]
        ```
        
        ## Use PyD4 with NumPy
        
        PyD4 can be use with NumPy effeciently. It can load data from a D4 file as a numpy array for further analysis. For example
        
        ```python
        from pyd4 import D4File
        
        # Open a D4 File
        file = D4File("test.d4")
        
        # Load chr1 as np array (this will take < than 1s)
        per_base_depth = file.load_to_np("1")
        
        # Then we can count the number of locus that is greater than 30 with numpy API
        print((per_base_depth > 30).sum())
        ```
        
        Alternatively, you can also use the index operator for that
        
        ```python
        per_base_depth_2 = file["2"]
        ```
        
        It's possible to load a region from chromosome instead of the entire chromosome.
        ```python
        per_base_from_1m = file["3:1000000-"]
        per_base_first_1m = file["3:-1000000"]
        per_base_12345_to_22345 = file["3:12345-22345"]
        ```
        
        ## Use PyD4 as a Bam Coverage Profiler
        
        It's possible that we use PyD4 to get per-base coverage of a BAM within < 2min!
        
        ```python
        import pyd4
        
        # Create D4 file from a BAM input
        d4_file = pyd4.bam_to_d4("input.bam")
        
        chr1_per_base = d4_file.load_to_np("1")
        
        # Print number of locus that is > 30
        print((chr1_per_base > 30).sum())
        ```
        
        ## Dump NumPy array as D4File
        
        ```python
        import pyd4
        
        input_file = pyd4.D4File("input.d4")
        
        chr1_data = input_file["1"]
        
        chr1_flags = chr1_data > 64
        
        # create_on_same_genome will create a new D4 file that copies the same genome size from input_file and the list ["1"] tells the API only copy the chromosome 1
        # for_bit_array tells PyD4 that this output should be optimized for a boolean array
        # and finally we call get_writer to get the writer
        output_file = input_file.create_on_same_genome("output.d4", ["1"]).for_bit_array().get_writer()
        
        # Then we can dump the numpy array to the D4 file
        # The first parameter specifies the chromosome we want to write
        # The second parameter specifies the locus in the genome to write the first value of the np array
        # The last parameter is the actual np array
        output_file.write_np_array("1", 0, chr1_flags)
        ```
        
        ## Fast Summarize 
        
        One of the key advantage of D4 is it provide a highly effecient way to summarize the data on multi-core CPUs. D4Py also provides the API that exposes those feature to Python users. Although most of the summarize task can be done with load_to_np API and numpy routines, but numpy doesn't support multicore CPU effeciently. Thus the summarize API is a faster way to summarize data.
        
        To get the mean depth of chromosome 1
        ```python
        import pyd4
        
        input_file = pyd4.D4File("input.d4")
        
        # Slower way (single threaded) to compute the mean depth with numpy
        np_array = input_file["1"]
        print(np_array.mean())
        
        # Faster way (parallel) to compute the same summary
        print(input_file.mean("1"))
        ```
        
        D4 also provides a high effecient way to perform batch summarize (For example down sample chromosome one per 1000 base pair window).
        
        ```python
        import pyd4
        
        input_file = pyd4.D4File("input.d4")
        
        down_sampled_chr1 = input_file.resample("1", bin_size = 1000)
        print(down_sampled_chr1)
        ```
        
        ## Changelog
        
        ### 0.3.1.1
        
        - Added documentation to the pypi page
        - Fixed minor bugs
        
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Rust
Classifier: Operating System :: POSIX
Classifier: Operating System :: MacOS :: MacOS X
Description-Content-Type: text/markdown
