Metadata-Version: 2.1
Name: bioframe
Version: 0.2.0
Summary: Pandas utilities for tab-delimited and other genomic files
Home-page: https://github.com/open2c/bioframe
Author: Open2C
Author-email: nezar@mit.edu
License: MIT
Description: # Bioframe: Operations on Genomic Interval Dataframes
        
        ![Python package](https://github.com/open2c/bioframe/workflows/Python%20package/badge.svg)
        [![DOI](https://zenodo.org/badge/69901992.svg)](https://zenodo.org/badge/latestdoi/69901992)
        
        <img src="./docs/figs/bioframe-logo.png" width=75%> 
        
        Bioframe is a library to enable flexible and scalable operations on genomic interval dataframes in python. Building bioframe directly on top of [pandas](https://pandas.pydata.org/) enables immediate access to a rich set of dataframe operations. Working in python enables rapid visualization (e.g. matplotlib, seaborn) and iteration of genomic analyses.
        
        The philosophy underlying bioframe is to enable flexible operations: instead of creating a function for every possible use-case, we instead encourage users to compose functions to achieve their goals. As a rough rule of thumb, if a function requires three steps and is crucial for genomic interval arithmetic we have included it; conversely if it can be performed in a single line by composing two of the core functions, we have not included it. 
        
        ## Core functions
        - `closest`: For every interval in a dataframe, find the closest intervals in a second dataframe. 
        - `cluster`: Group overlapping intervals in a dataframe into clusters.
        - `complement`: Find genomic intervals that are not covered by any interval from a dataframe.
        - `overlap`: Find pairs of overlapping genomic intervals between two dataframes. 
        
        Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: `coverage`, `expand`, `merge`,  
        `select`, and `subtract`.
        
        Bioframe also has functions for loading diverse genomic data formats, and performing operations on special classes of genomic intervals, including chromosome arms and fixed size bins.
        
        Read the [docs](https://bioframe.readthedocs.io/en/latest/) and explore the [jupyter notebooks](https://github.com/open2c/bioframe/tree/master/docs/notebooks/)
        
        ## Genomic interval operations
        
        
        To `overlap` two dataframes, call:
        ```python
        import bioframe as bf
        
        bf.overlap(df1, df2)
        ```
        
        For these two input dataframes, with intervals all on the same chromosome:
        
        <img src="./docs/figs/df1.png" width=60%> 
        <img src="./docs/figs/df2.png" width=60%> 
        
        
        `overlap` will return the following interval pairs as overlaps:
        
        <img src="./docs/figs/overlap_inner_0.png" width=60%> 
        <img src="./docs/figs/overlap_inner_1.png" width=60%> 
        
        
        To `merge` all overlapping intervals in a dataframe, call:
        ```python
        import bioframe as bf
        
        bf.merge(df1)
        ```
        
        For this input dataframe, with intervals all on the same chromosome:
        
        <img src="./docs/figs/df1.png" width=60%> 
        
        `merge` will return a new dataframe with these merged intervals:
        
        <img src="./docs/figs/merge_df1.png" width=60%> 
        
        
        See this [jupyter notebook](https://github.com/open2c/bioframe/tree/genomic_interval_arithmetic/docs/notebooks/intervals_tutorials.ipynb) for visualizations of other core bioframe functions.
        
        See this [jupyter notebook](https://github.com/open2c/bioframe/tree/genomic_interval_arithmetic/docs/notebooks/tutorial_assign_motifs_to_peaks.ipynb) for an example of how to assign TF motifs to ChIP-seq peaks using bioframe. 
        
        
        ## Requirements
        The following are required before installing bioframe:
        * Python 3.6+
        * `numpy`
        * `pandas>=1.0.3`
        
        ## Installation
        ```sh
        pip install bioframe
        ```
        
        ## Projects currently using bioframe:
        * [cooler](https://github.com/open2c/cooler)
        * [cooltools](https://github.com/open2c/cooltools)
        * yours? :)
        
Keywords: pandas,dataframe,genomics,epigenomics,bioinformatics,intervals
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Description-Content-Type: text/markdown
