Metadata-Version: 2.1
Name: interpret_eval
Version: 0.1.4
Summary: Interpretable Evaluation for Natural Language Processing
Home-page: https://github.com/neulab/ExplainaBoard
Author: Pengfei Liu
License: MIT License
Description: # ExplainaBoard: An Explainable Leaderboard for NLP
        
        [**Introduction**](##introduction) | 
        [**Website**](#website) |
        [**Download**](#download-system-outputs) |
        [**Backend**](#test-your-results) |
        [**Paper**](https://arxiv.org/pdf/2104.06387.pdf) |
        [**Video**](https://www.youtube.com/watch?v=3X6NgpbN_GU) |
        [**Bib**](http://explainaboard.nlpedia.ai/explainaboard.bib)
        
        
        <img src="./fig/logo-full-v2.png" width="800" class="center">
        
        ## Introduction
        ### ExplainaBoard is an interpretable, interactive and reliable leaderboard with seven (so far) new features (F) compared with generic leaderboard.
        * F1: *Single-system Analysis*: What is a system good or bad at?
        * F2: *Pairwise Analysis*: Where is one system better (worse) than another?
        * F3: *Data Bias Analysis*: What are the characteristics of different evaluated datasets?
        * F5: *Common errors*: What are common mistakes that top-5 systems made?
        * F6: *Fine-grained errors*: where will errors occur?
        * F7: *System Combination*: Is there potential complementarity between different systems?
        
        
        <img src="./fig/intro.png" width="400" class="center">
        
        
        
        
        
        
        ## Website
        We deploy ExplainaBoard as a [Web toolkit](http://explainaboard.nlpedia.ai/), which includes 9 NLP tasks, 
        40 datasets and 300 systems. Detailed information is as follows.
        
        ### Task 
        
        | Task                     | Sub-task         | Dataset | Model | Attribute | 
        |--------------------------|------------------|---------|-------|-----------|  
        |				           | Sentiment		  | 8       | 40    | 2         |
        | Text Classification      | Topics           | 4       | 18    | 2         |
        |					       | Intention        | 1       | 3     | 2         |
        | Text-Span Classification | Aspect Sentiment | 4       | 20    | 4         |
        | Text pair Classification | NLI              | 2       | 6     | 7         |
        |                          | NER              | 3       | 74    | 9         |
        | Sequence Labeling	       | POS              | 3       | 14    | 4         |	
        | 					       | Chunking         | 3       | 14    | 9         |
        | 					       | CWS              | 7       | 64    | 7         |
        | Structure Prediction     | Semantic Parsing | 4       | 12    | 4         | 
        | Text Generation          | Summarization    | 2       | 36    | 7         | 
        
        <img src="./fig/demo.gif" width="800" class="center">
        
        
        ## Download System Outputs
        We haven't released datasets or corresponding system outputs that require licenses. But If you have licenses please fill in this [form](https://docs.google.com/forms/d/1rl7dgOTroT4hazUsd8CaSbGPKFbo2HNOO5pFBsM8IY0/edit) and we will send them to you privately. (Description of output's format can refer [here](https://github.com/neulab/ExplainaBoard/tree/main/output_format)
        If these system outputs are useful for you, you can [cite our work](http://explainaboard.nlpedia.ai/explainaboard.bib).
        
        
        
        ## Test Your Results
        ```
        pip install -r requirements.txt
        ```
        
        ### Description of Each Directory
        * `task-[task_name]`: fine-grained analysis for each task, 
          aiming to generating fine-grained analysis results with the json format.
          For example, task-mlqa can calculate the fine-graied F1 scores for different systems,
          and output corresponding json files in task-mlqa/output/ .
          
        * `meta-eval` is a sort of controller, which can be used to start the fine-graind anlsysis of all
        tasks, and analyze output json files.
        
            - calculate fine-grained results for all tasks: ./meta-eval/run-allTasks.sh
            ```js
                cd ./meta-eval/
                ./run-allTasks.sh
             ```
          
            - merge json files of all tasks into a csv file, which would be useful for further SQL import:
            ./meta-eval/genCSV/json2csv.py
          
            ```js
                cd ./meta-eval/genCSV/json2csv.py
                python json2csv.py > explainabord.csv
            ```
        
        * `src` stores some auxiliary codes.
        
        ## Submit Your Results
        You can submit your system's output by this [form](https://docs.google.com/forms/d/e/1FAIpQLSdb_3PPRTXXjkl9MWUeVLc8Igw0eI-EtOrU93i6B61X9FRJKg/viewform) following the format [description](https://github.com/neulab/ExplainaBoard/tree/main/output_format).
        
        
        
        ## Acknowledgement
        We thanks all authors who share their system outputs with us: Ikuya Yamada, Stefan Schweter,
        Colin Raffel, Yang Liu, Li Dong. We also thank
        Vijay Viswanathan, Yiran Chen, Hiroaki Hayashi for useful discussion and feedback about ExplainaBoard.
        
        
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Description-Content-Type: text/markdown
