Metadata-Version: 2.1
Name: HAPPE
Version: 0.1.2
Summary: A tool to visualize the haplotype pattern and various information in excel.
Home-page: UNKNOWN
Author: Cong Feng
Author-email: congfeng97@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown

# HAPPE

A tool to visualize the haplotype pattern and various information in excel.
Please cite this paper when using HAPPE for your publications
> Cong Feng, Xingwei Wang, Shishi Wu, Weidong Ning, Bo Song, Jianbin Yan, and Shifeng Cheng. 2022. “HAPPE: A Tool for Population Haplotype Analysis and Visualization in Editable Excel Tables.” Frontiers in Plant Science 13 (July): 927407. https://doi.org/10.3389/fpls.2022.927407.


![avatar](./Figure_S2.jpg)


## Installing HAPPE

There easiest way to install `HAPPE` is to use pip3. 

```sh
pip3 install HAPPE
```

or you can clone the project to your local directory and installing it with:

```sh
python3 setup.py install --record log.txt
#if u want to uninstall the package:
#cat log.txt | xargs rm -rf
```

then you should have the  `HAPPE` command available.
```sh
$ HAPPE -h

usage: HAPPE [-h] -g CONFIG -v GZVCF [-k KEEP] [-r REGION]
                          [-s SNPLIST] -i INF -c COLOR [-I SNPINF] [-R REF]
                          [-F FUNCANN] [-f | -x | -n] [-D DEPTH] [-d DEPTHBIN]
                          -o OUTPUT

show haplotype patterns in excel file./fengcong@caas.cn

optional arguments:
  -h, --help            show this help message and exit
  -g CONFIG, --config CONFIG
                        config file.[required]
  -v GZVCF, --gzvcf GZVCF
                        gzvcf, bcftools indexed.use to annotation and get
                        ref/alt basepair.[required]
  -k KEEP, --keep KEEP  keep sample, if u wana plot a subset of
                        --gzvcf.[optional]
  -r REGION, --region REGION
                        if u wana plot a subset of --gzvcf, u can use this
                        option. if u use this option , ucant use -s
                        option[optional]
  -s SNPLIST, --snplist SNPLIST
                        snp id list(format:chr_pos). if u use this option , u
                        cant use -r option.[optional]
  -w TREEWIDTH, --treewidth TREEWIDTH
                        How many columns do you want to occupy for this tree
                        topology.(default=1000)[optional]
  -i INF, --inf INF     the information of each sample.[required]
  -c COLOR, --color COLOR
                        the color of each sample.[required]
  -I SNPINF, --snpinf SNPINF
                        more information about SNP.[optional]
  -R REF, --Ref REF     change Reference and color system.[optional]
  -F FUNCANN, --FuncAnn FUNCANN
                        functional annotation file.[optional]
  -f, --functional      only functional SNP
  -x, --coding          only coding region SNP
  -n, --noncoding       only noncoding region SNP
  -D DEPTH, --Depth DEPTH
                        depth dir for each sample.[optional]
  -d DEPTHBIN, --Depthbin DEPTHBIN
                        Depth bin size.[optional,default:50]
  -o OUTPUT, --output OUTPUT
                        output prefix
```
## Preparing config file
```ini
[software]
bgzip=
bcftools=
tabix=
```
## Preparing the vcf file

1. The SNP/INDEL ID must be in the format :`Chromosome_position`.
2. Only bi-allelic remains in vcf file.
2. Compress `vcf` to `vcf.gz` using bgzip
3. Use `bcftools index` to create an index for the `vcf.gz` file.

## Preparing the depth information
if you want to integrate the depth information, you need to prepare the depth file as follows:

1. Create a directory for each sample with the name of the sample.
2. using `mosdepth` to calc the depth of each position for each sample.
```sh
#example
mosdepth -f ref.fa -Q 0 sample1/sample1.Q0  path/to/sample1.bam
```

## Usage
```sh
"-g  CONFIG", required parameter, give the paths to bcftools, bgzip and tabix in the CONFIG file. 

"-v GZVCF", required parameter, input vcf file.

"-k SAMPLELIST", required parameter, list of samples to be retained, one sample per line.

"-r REGION", optional parameter, the genomic region to be displayed, format: chromosome: start-end.

"-s VARIANTLIST", optional parameter, the list of variant IDs you need to keep, using this parameter you cannot use the -r parameter.

"-w TREEWIDTH", optional parameter, the width of the tree topology.

"-i INFORMATION", optional parameter, additional sample information, the first column must be the sample ID.

"-c COLOR", optional parameter, Specify the color of each sample, the first column is the sample id and the second column is the color hex code.

"-I VARINFORMATION", optional parameter, Additional variant annotation information, such as GWAS p-value. the first colum is the variant id and each column is the annotation information with header.

"-f", optional parameter, Only the variant that changes the amino acid is retained.( Requires that the input vcf file has been annotated with SnpEff.)

"-x", optional parameter, Only the variant in the CDS region is retained.( Requires that the input vcf file has been annotated with SnpEff.)

"-n"， optional parameter, Only the variant in the non-coding region is retained.( Requires that the input vcf file has been annotated with SnpEff.)

"-D DIRECTORY"， optional parameter, This directory contains the depth information for each sample calculated using mosdepth, one directory per sample.

"-d WINDOWSIZE", optional parameter, window size for calculate normalized depth.

"-o PREFIX"， required parameter, output prefix.

```
## example
```sh
HAPPE \
-g config.ini \
-v test.vcf.gz \
-r chr7A:71669854-71670886 \
-i 1000_Inf.txt \
-c 1000.pop.color \
-F FunctionalAnnotation_v1__HCgenes_v1.0.TAB \
-D path/to/depth_data/ \
-f \
-o test
## each file of the prameter
## -g config.ini
# [software]
# bgzip=path_to/bgzip
# bcftools=path_to/bcftools
# tabix=path_to/tabix

## -i 1000_Inf.txt
## Just make sure the first column is the sample name.
# Sample_ID	... ...
# sample1   ... ...

## -c 1000.pop.color
## Just make sure the first column is the sample name and the second column is color code.
# Sample_ID	color
# sample1	FF0000
# ...       ...

## -F FunctionalAnnotation_v1__HCgenes_v1.0.TAB
## just make sure the first column is the gene name , and the forth column is the functional annotation.
# Gene_name	XXX XXX function ... ...
# gene1     XXX XXX func1    ... ...

## -D path/to/depth_data/
##Make sure that the files *mosdepth.summary.txt and *per-base.bed.gz are in the directory for each sample in this directory.
```


