Metadata-Version: 2.1
Name: bio-datasets
Version: 0.0.3
Summary: Open-source collection of biology datasets and pre-trained embeddings.
Home-page: https://github.com/DeepChainBio/datasets
Author: InstaDeep
Author-email: a.delfosse@instadeep.com
License: Apache-2.0
Description: # bio-datasets
        Open-source collection of biology datasets and pre-trained embeddings.
        
        ## Description
        bio-datasets is a collaborative framework that allows the user to fetch publicly available sequence-based protein datasets.
        For these datasets, pre-trained contextual embeddings are also available.
        
        
        ## Installation
        Install the required dependencies with `pip install biodatasets`.
        
        # How it works
        
        ```python
        from biodatasets import list_datasets, load_dataset
        
        print(list_datasets())
        
        pathogen = load_dataset("pathogen")
        X, y = pathogen.to_npy_arrays(input_names=["sequence"], target_names=["class"])
        embeddings = pathogen.get_embeddings("sequence", "protbert", "cls")
        ```
        
Platform: UNKNOWN
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Software Development
Requires-Python: >=3.7
Description-Content-Type: text/markdown
