Metadata-Version: 2.1
Name: hca2scea
Version: 0.1.1
Summary: A tool to assist in the automatic conversion of hca metadata to scea metadata MAGE-TAB files.
Home-page: https://github.com/ebi-ait/hca-to-scea-tools
Author: Ami Day, Yusra Haider, Alegria Aclan, Javier Ferrer
Author-email: ami@ebi.ac.uk, yhaider@ebi.ac.uk, aaclan@ebi.ac.uk, javier.f.g@um.es
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: Apache Software License
Description-Content-Type: text/markdown

## hca2scea
A tool to assist in the automatic conversion of an hca metadata spreadsheet to scea metadata MAGE-TAB files.

## Installation

    pip install hca2scea
    
## Description

The tool takes as input an HCA metadata spreadsheet and converts the metadata to SCEA MAGE-TAB files which are then saved to an output directory.

## Usage

To run it as a package, after installing it via pip:


```shell script
$ hca2scea -h                                                  
usage: hca2scea [-h] -s SPREADSHEET -id PROJECT_UUID -study STUDY [-name {cs_name,cs_id,sp_name,sp_id,other}] -ac
                ACCESSION_NUMBER -c CURATORS [CURATORS ...] -et {baseline,differential} [-facs] -f EXPERIMENTAL_FACTORS
                [EXPERIMENTAL_FACTORS ...] -pd PUBLIC_RELEASE_DATE -hd HCA_UPDATE_DATE [-o OUTPUT_DIR] [-zip]

run hca -> scea tool

optional arguments:
  -h, --help            show this help message and exit
  -s SPREADSHEET, --spreadsheet SPREADSHEET
                        Please provide a path to the HCA project spreadsheet.
  -id PROJECT_UUID, --project_uuid PROJECT_UUID
                        Please provide an HCA ingest project submission id.
  -study STUDY          Please provide the SRA or ENA study accession.
  -name {cs_name,cs_id,sp_name,sp_id,other}
                        Please indicate which field to use as the sample name. cs=cell suspension, sp = specimen.
  -ac ACCESSION_NUMBER, --accession_number ACCESSION_NUMBER
                        Provide an E-HCAD accession number. Please find the next suitable accession number by checking
                        the google tracker sheet.
  -c CURATORS [CURATORS ...], --curators CURATORS [CURATORS ...]
                        space separated names of curators
  -et {baseline,differential}, --experiment_type {baseline,differential}
                        Please indicate whether this is a baseline or differential experimental design
  -facs                 Please specify if FACS was used to isolate single cells.
  -f EXPERIMENTAL_FACTORS [EXPERIMENTAL_FACTORS ...], --experimental_factors EXPERIMENTAL_FACTORS [EXPERIMENTAL_FACTORS ...]
                        space separated list of experimental factors
  -pd PUBLIC_RELEASE_DATE, --public_release_date PUBLIC_RELEASE_DATE
                        Please enter the public release date in this format: YYYY-MM-DD
  -hd HCA_UPDATE_DATE, --hca_update_date HCA_UPDATE_DATE
                        Please enter the last time the HCA prohect submission was updated in this format: YYYY-MM-DD
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Provide full path to preferred output dir
  -zip, --zip_format    Please indicate whether you would like the script to output alltxt files separately or together
                        in 1 zip file.```

To run it as a python module:

```shell script 
cd /path-to/hca2scea
python -m hca-to-scea-tools.hca_to_scea.hca2scea -h
```

## Arguments

| Argument   | Argument name            | Description                                                                                        | Required? |
|------------|--------------------------|----------------------------------------------------------------------------------------------------|-----------|
|-s          | HCA spreadsheet          | Path to HCA spreadsheet (.xlsx)                                                                    | yes       |
|-id         | HCA project uuid         | This is added to the 'secondary accessions' field in idf file                                      | yes       | 
|-c          | Curator initials         | HCA Curator initials. Space-separated list.                                                        | yes       |
|-ac         | accession number         | Provide an SCEA accession number (integer).                                                        | yes       |
|-et         | Experiment type          | Must be 1 of [differential,baseline]                                                               | yes       |
|-f          | Factor value             | A space-separated list of user-defined factor values e.g. age disease                              | yes       |
|-pd         | Dataset publication date | provide in YYYY-MM-DD E.g. from GEO                                                                | yes       |
|-hd         | HCA last update date     | provide in YYYY-MM-DD The last time the HCA project was updated in ingest  UI (production)         | yes       |
|-study      | study accession (SRPxxx) | The study accession will be used to find the paths to the fastq files for the given runs           | yes       |
|-name       | HCA name field           | Which HCA field to use for the biomaterial names columns. Must be 1 of                             | no        |
|            |                          | [cs_name, cs_id, sp_name, sp_id, other] where cs indicates cell suspension and sp indicates        |           |
|            |                          |  specimen from organism. Default is cs_name.                                                       |           |
|-facs       | optional argument        | If FACS was used as a single cell isolation method, indicate this by adding the -facs argument.    | no        |
|-o          | optional argument        | An output dir path can optionally be provided. If it does not exist, it will be created.           | no        |
|-zip        | optional argument        | Indicate if you would like the resulting output files to be output in a single zip file.           | no        |         

## Definitions

**Factor values**

A factor value is a chosen experimental characteristic which can be used to group or differentiate samples. Multiple factor values can be entered and should be chosen from the following list.

- Known disease(s)
- Development stage
- Organ
- Organ part
- Selected cell type(s)
- Individual

There must be at least 1 factor value. If you cannot identify a factor value i.e. all donors and samples share the same metadata with respect to the above list of factor values, then enter 'Individual'.

Datasets with more than 1 technolgoy type are not eligible for SCEA. Therefore, technology type is not a valid factor value.

**Experiment type**

An experiment with samples which can be grouped or differentiatied by a factor value is classified as 'differential'. The list of possible factor values can be found above.

If 1 or more factor values other than 'Individual' is identified, then the experiment type should be 'Differential'. If the only factor value is 'Individual', then the experiment type should be 'Baseline'.

**Related E-HCAD-ID**

If the project has been split into two separate E-HCAD datasets, due to different technologies being used in the same project, or any other reason, then enter the E-HCAD-ID for the other dataset here. Example: E-HCAD-50.

## Examples

**Required arguments only**

`python3 hca2scea.py -s /home/aday/GSE111976-endometrium_MC_SCEA.xlsx -id 379ed69e-be05-48bc-af5e-a7fc589709bf -study SRP135922 -ac 50 -c AD -et differential -f menstrual cycle day -pd 2021-06-29 -hd 2021-02-12`

**Specify optional name argument**

`python3 hca2scea.py -s /home/aday/GSE111976-endometrium_MC_SCEA.xlsx -id 379ed69e-be05-48bc-af5e-a7fc589709bf -study SRP135922 -name cs_name -ac 50 -c AD -et differential -f menstrual cycle day -pd 2021-06-29 -hd 2021-02-12`

**Specify that FACS was used**

`python3 hca2scea.py -s /home/aday/GSE111976-endometrium_MC_SCEA.xlsx -id 379ed69e-be05-48bc-af5e-a7fc589709bf -study SRP135922 -ac 50 -c AD -et differential -f menstrual cycle day -pd 2021-06-29 -hd 2021-02-12 -facs`

**Specify optional output dir**

`python3 hca2scea.py -s /home/aday/GSE111976-endometrium_MC_SCEA.xlsx -id 379ed69e-be05-48bc-af5e-a7fc589709bf -study SRP135922 -ac 50 -c AD -et differential -f menstrual cycle day -pd 2021-06-29 -hd 2021-02-12 -o my_output_dir`

## Developer Notes

### Developing Code in Editable Mode

Using `pip`'s editable mode, projects using hca-to-scea as a dependency can refer to the latest code in this repository 
directly without installing it through PyPI. This can be done either by manually cloning the code
base:

    pip install -e path/to/hca2scea

or by having `pip` do it automatically by providing a reference to this repository:

    pip install -e \
    git+https://github.com/ebi-ait/hca-to-scea-tools.git\
    #egg=hca2scea
    
    
### Publish to PyPI

1. Create PyPI Account through the [registration page](https://pypi.org/account/register/).
    
   Take note that PyPI requires email addresses to be verified before publishing.

2. Package the project for distribution.
 
        python setup.py sdist
    
3. Install [Twine](https://pypi.org/project/twine/)

        pip install twine        
    
4. Upload the distribution package to PyPI. 

        twine upload dist/*
        
    Running `python setup.py sdist` will create a package in the `dist` directory of the project
    base directory. 
  


