Metadata-Version: 2.1
Name: smunger
Version: 0.0.16
Summary: munger for GWAS summary statistics.
Home-page: https://github.com/jianhua/smunger
License: MIT
Author: Jianhua Wang
Author-email: jianhua.mert@gmail.com
Requires-Python: >=3.8,<3.12
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Provides-Extra: dev
Provides-Extra: doc
Provides-Extra: test
Requires-Dist: black (>=22.3.0) ; extra == "test"
Requires-Dist: bump2version (>=1.0.1,<2.0.0) ; extra == "dev"
Requires-Dist: flake8 (>=3.9.2,<4.0.0) ; extra == "test"
Requires-Dist: flake8-docstrings (>=1.6.0,<2.0.0) ; extra == "test"
Requires-Dist: isort (>=5.8.0,<6.0.0) ; extra == "test"
Requires-Dist: jupyter (>=1.0.0,<2.0.0)
Requires-Dist: liftover (>=1.1.16,<2.0.0)
Requires-Dist: mkdocs (>=1.4.2,<2.0.0) ; extra == "doc"
Requires-Dist: mkdocs-autorefs (>=0.4.1,<0.5.0) ; extra == "doc"
Requires-Dist: mkdocs-include-markdown-plugin (>=4.0.3,<5.0.0) ; extra == "doc"
Requires-Dist: mkdocs-material (>=8.5.11,<9.0.0) ; extra == "doc"
Requires-Dist: mkdocs-material-extensions (>=1.1.1,<2.0.0)
Requires-Dist: mkdocstrings[python] (>=0.19.1,<0.20.0) ; extra == "doc"
Requires-Dist: mypy (>=0.900,<0.901) ; extra == "test"
Requires-Dist: pandas (>=1.5.3,<2.0.0)
Requires-Dist: pip (>=20.3.1,<21.0.0) ; extra == "dev"
Requires-Dist: pre-commit (>=2.12.0,<3.0.0) ; extra == "dev"
Requires-Dist: pytabix (>=0.1,<0.2)
Requires-Dist: pytest (>=6.2.4,<7.0.0) ; extra == "test"
Requires-Dist: pytest-cov (>=2.12.0,<3.0.0) ; extra == "test"
Requires-Dist: requests (>=2.28.2,<3.0.0)
Requires-Dist: rich (>=13.3.1,<14.0.0)
Requires-Dist: scipy (>=1.10.1,<2.0.0)
Requires-Dist: toml (>=0.10.2,<0.11.0) ; extra == "dev"
Requires-Dist: tox (>=3.20.1,<4.0.0) ; extra == "dev"
Requires-Dist: twine (>=3.3.0,<4.0.0) ; extra == "dev"
Requires-Dist: typer (>=0.7.0,<0.8.0)
Requires-Dist: virtualenv (>=20.2.2,<21.0.0) ; extra == "dev"
Description-Content-Type: text/markdown

# smunger


[![pypi](https://img.shields.io/pypi/v/smunger.svg)](https://pypi.org/project/smunger/)
[![python](https://img.shields.io/pypi/pyversions/smunger.svg)](https://pypi.org/project/smunger/)
<!-- [![Build Status](https://github.com/jianhua/smunger/actions/workflows/dev.yml/badge.svg)](https://github.com/jianhua/smunger/actions/workflows/dev.yml) -->
<!-- [![codecov](https://codecov.io/gh/jianhua/smunger/branch/main/graphs/badge.svg)](https://codecov.io/github/jianhua/smunger) -->
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)



munger for GWAS summary statistics


<!-- * Documentation: <https://jianhua.github.io/smunger> -->
<!-- * GitHub: <https://github.com/jianhua/smunger> -->
* PyPI: <https://pypi.org/project/smunger/>
* Free software: MIT


## Features

- [x]  define column properties
    - [x]  required columns: CHR, BP, EA, NEA
    - [x]  optional columns: BETA, SE, P, EAF, MAF
    - [x]  Auxiliary columns: OR, OR_SE, Z
    - [x]  Data types
    - [x]  Data ranges
    - [x]  Allow missing values and default missing values
- [x]  semi-automatically header mapping
    - [x]  read first five rows and display in terminal
    - [x]  guess header map by common column names
    - [x]  manually check if the mapping is correct
    - [x]  input the right column number if it is wrong
    - [x]  check if OR, OR_SE, Z are present if BETA, SE are absent
    - [x]  save the final column map to json for further munging
- [x]  data munging
    - [x]  EA ≠ NEA
    - [x]  if EAF presents, MAF = min(EAF, 1-EAF)
    - [x]  convert OR/ORSE to BETA/SE, if BETA, SE are absent and OR, ORSE are present
    - [x]  remove duplicate SNPs with same chr-bp-sorted(EA,NEA), keep the one with lowest P
    - [x]  output: \t separated, `bgzip` compress, `tabix` index.
    - [x]  optional output: significant SNPs, munge report
    
    |  | CHR | BP | rsID | EA | NEA | EAF | MAF | BETA | SE | P | OR | OR_SE | Z |
    | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
    | type | int | int | str | str | str | float | float | float | float | float | float | float | float |
    | allow null | False | False | True | False | False | False | False | True | False | True | True | False | True |
    | null value |  |  |  |  |  |  |  | 0 |  | 0.999 | 1 |  | 0 |
    | range | [1，23] | (0,inf) |  | only contains ‘ACGT’ | only contains ‘ACGT’ | [0,1] | [0,0.5] | (-inf,inf) | (0, inf) | (0,1) | (0, inf) | (0, inf) | (-inf,inf) |
- [x]  liftover
    - [x]  guess genome build
    - [x]  liftover
- [x]  annotate
    - [x]  annotate rsID

## Credits

This package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and the [waynerv/cookiecutter-pypackage](https://github.com/waynerv/cookiecutter-pypackage) project template.

