Metadata-Version: 2.1
Name: bazema-linker
Version: 1.2
Summary: Application building relations between drugs, scientific publications, pubmed, journals and clinical trials.
Home-page: UNKNOWN
Author: Baptiste Azéma
Author-email: baptiste@azema.tech
License: LICENSE
Description: Bazema linker
        =============
        
        Application building relations between drugs,
         scientific publications, pubmed, journals and
         clinical trials.
        
        The output is a JSON file.
        
        ## Design
               +-------------------------+
               | input folder            |
               |   + drugs.csv           |
               |   | pubmed.csv          |
               |   | pubmed.json         |
               |   + clinical_trials.csv |
               +-------------------------+
                           +         move valid
                           |         files    +------------------+
                           v           +----> |  archive folder  |
                  +--------+-------+   |      +------------------+
                  |                |+--+
                  | bazema_linker  |
                  | python job     |     move invalid
                  |                |±--+ files
                  +----------------+   |      +------------------+
                           +           +----> |  errors folder   |
                           |                  +------------------+
                           v
            +-----------------------------+
            |  output folder              |
            |   + result_2020_10_06.json  |
            +-----------------------------+
        
        Once the job is done, the input files are moved to an `archive`
        folder. 
        Invalid files (name invalid, format invalid, parsing impossible)
        are moved to an `errors` folder.
        
        ## Structure of input files
        
        - `drugs.csv`, 2 columns= `atccode` and`drug`
        - `pubmed.csv`, 4 columns= `id`, `title`, `date` and `journal`
        - `pubmed.json`, same structure as a JSON
        - `clinical_trials.csv`, 4 columns= `id`, `scientific_title`, `date` and `journal`
        
        ## Structure of generated output
            [
                {
                    "drug": "drug name",
                    "clinical_trials": [
                        {
                            "title": "title of article",
                            "date": "2020-01-01"
                        }, {...}
                    ],
                    "pubmed": [
                        {
                            "title": "title of article",
                            "date": "2020-01-01"
                        }, {...}
                    ],
                    "journals": [
                        {
                            "date": "2020-01-01",
                            "journal": "journal name"
                        }, {...}
                    ]
                },
                {...}
            ]
        
        
        ## Usage
        
        ### Requirements
        
        - Python >= 3.6
        
        #### Installation
        
            virtualenv -p python3 venv
            source venv/bin/activate
            
            pip install bazema_linker
        
        Display usage
            
            bazema_linker -h
            
        #### Example
        
            bazema_linker --input_dir data --output_dir result
            
        #### Development
            
            # Install
            virtualenv -p python3 venv
            source venv/bin/activate
            make install
            
            # Build
            make test # coverage tests
            make linter # runs pylint
            make build
            
        ## Ad-hoc Top journals
        
        You can get the name of the journal with the most different drugs using
        the script `top_journals.py` and a result file produced by `bazema_linker`.
        
        ### Usage
            # no depedency required
            python top_journals.py result/result_2020-10-06.json
            
            # output
            Journal with most different drugs is "Science" with a total of "15" different drugs.
        
        ## TODO
        
         - Handle high volume of data, like few tera-octets -> use a highly scalable
          framework (i.e. Apache Spark, Apache Beam). Pay attention when broadcasting data across
          workers. 
         - Deploy to Pypi using Github Actions
        
Platform: UNKNOWN
Requires-Python: ~=3.6
Description-Content-Type: text/markdown
