Metadata-Version: 2.1
Name: datalabs
Version: 0.1.4.dev0
Summary: Datalabs
Home-page: https://github.com/expressai/datalabs
Author: expressai
Author-email: stefanpengfei@gamil.com
License: Apache 2.0
Download-URL: https://github.com/expressai/datalabs/tags
Description: # DataLab API CN
        
        ## Installation
        #### Install
        
            ```shell
            pip install --upgrade pip
            pip install datalabs
            ```  
        
           or 
        
            ```shell
            pip install --upgrade pip
            git clone https://github.com/ExpressAI/Datalab.git
            cd Datalab
            pip install .
            ```
        
         
        #### Dataset Operation
         
        
         
        
        ```python
        
        # pip install datalab
        from datalabs import operations, load_dataset
        from featurize import *
        
         
        dataset = load_dataset("ag_news")
        
        # print(task schema)
        print(dataset['test']._info.task_templates)
        
        # data operators
        res = dataset["test"].apply(get_text_length)
        print(next(res))
        
        
        # get entity
        res = dataset["test"].apply(get_entity_spacy)
        print(next(res))
        
        # get postag
        res = dataset["test"].apply(get_postag_spacy)
        print(next(res))
        
        from edit import *
        # add typos
        res = dataset["test"].apply(add_typo)
        print(next(res))
        
        #  change person name
        res = dataset["test"].apply(change_person_name)
        print(next(res))
        
        
        
        ```
        
        ### Task Schema
        
        * `text-classification`
            * `text`:str
            * `label`:ClassLabel
            
        * `text-matching`
            * `text1`:str
            * `text2`:str
            * `label`:ClassLabel
            
        * `summarization`
            * `text`:str
            * `summary`:str
            
        * `sequence-labeling`
            * `tokens`:List[str]
            * `tags`:List[ClassLabel]
            
        * `question-answering-extractive`:
            * `context`:str
            * `question`:str
            * `answers`:List[{"text":"","answer_start":""}]
        
        
        one can use `dataset[SPLIT]._info.task_templates` to get more useful task-dependent information, where
        `SPLIT` could be `train` or `validation` or `test`.
        
        
        ### Supported Datasets
        * [here](https://github.com/ExpressAI/DataLab/tree/main/datasets)
        
           
        
        
        
        
Keywords: dataset
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Description-Content-Type: text/markdown
Provides-Extra: audio
Provides-Extra: apache-beam
Provides-Extra: tensorflow
Provides-Extra: tensorflow_gpu
Provides-Extra: torch
Provides-Extra: s3
Provides-Extra: streaming
Provides-Extra: dev
Provides-Extra: tests
Provides-Extra: quality
Provides-Extra: benchmarks
Provides-Extra: docs
