Metadata-Version: 2.1
Name: statcheck
Version: 0.0.1
Summary: Python implementation of the R package 'statcheck' used for extracting and analysing statistical tests in scientific articles.
Author-email: Hubert Plisiecki <hplisiecki@gmail.com>
Project-URL: Homepage, https://github.com/hplisiecki/statcheck_python
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE.txt


<!-- README.md is generated from README.Rmd. Please edit that file -->
<!-- after editing README.Rmd, run devtools::build_readme() -->

# statcheck <a href='http://statcheck.io'><img src='man/figures/logo.jpg' align="right" height="100" /></a>

<!-- badges: start -->


<!-- badges: end -->

## Credits
This is a python implementation of the R package `statcheck` published by Michèle B. Nuijten [MicheleNuijten]. The original package can by found at her Github 
page. The code relies heavily on Nuijten's work and is currently a python implementation of the [original package](https://github.com/MicheleNuijten/statcheck), with the goal of making it more accessible to the 
python community. The original package was published under the GNU General Public License v3.0. The curent implementation is published under the MIT 
License. To ensure usability, all the original tests were recoded to the python version.

## What is statcheck?

`statcheck` is a free, open source Python package that can be used to
automatically extract statistical null-hypothesis significant testing
(NHST) results from articles and recompute the *p*-values based on the
reported test statistic and degrees of freedom to detect possible
inconsistencies.

`statcheck` is mainly useful for:

1.  **Self-checks**: you can use `statcheck` to make sure your
    manuscript doesn’t contain copy-paste errors or other
    inconsistencies before you submit it to a journal.
2.  **Peer review**: editors and reviewers can use `statcheck` to check
    submitted manuscripts for statistical inconsistencies. They can ask
    authors for a correction or clarification before publishing a
    manuscript.
3.  **Research**: `statcheck` can be used to automatically extract
    statistical test results from articles that can then be analyzed.
    You can for instance investigate whether you can predict statistical
    inconsistencies (see e.g., [Nuijten et al.,
    2017](https://www.collabra.org/article/10.1525/collabra.102/)), or
    use it to analyze p-value distributions (see e.g., [Hartgerink et
    al., 2016](https://peerj.com/articles/1935/)).

## How does statcheck work?

The algorithm behind `statcheck` consists of four basic steps:

1.  **Convert** pdf and html articles to plain text files.
2.  **Search** the text for instances of NHST results. Specifically,
    `statcheck` can recognize *t*-tests, *F*-tests, correlations,
    *z*-tests,
    ![\chi^2](https://latex.codecogs.com/png.image?%5Cdpi%7B110%7D&space;%5Cbg_white&space;%5Cchi%5E2 "\chi^2")
    -tests, and Q-tests (from meta-analyses) if they are reported
    completely (test statistic, degrees of freedom, and *p*-value) and
    in APA style.
3.  **Recompute** the *p*-value using the reported test statistic and
    degrees of freedom.
4.  **Compare** the reported and recomputed *p*-value. If the reported
    *p*-value does not match the computed one, the result is marked as
    an *inconsistency* (`Error` in the output). If the reported
    *p*-value is significant and the computed is not, or vice versa, the
    result is marked as a *gross inconsistency* (`DecisionError` in the
    output).

`statcheck` takes into account correct rounding of the test statistic,
and has the option to take into account one-tailed testing. See the
[manual](http://rpubs.com/michelenuijten/statcheckmanual) for details.

## Installation and use

For detailed information about installing and using `statcheck`, see the
Documentation  file in the github repository, or refer to the R [documentation](https://statcheck.readthedocs.io/en/latest/).

### Installation
```bash
pip install statcheck
```
### Example Usage
```python
from statcheck import checkPDFdir
dir = 'path/to/pdf/directory'
Res, pRes = checkPDFdir(dir, subdir = False)

# Res is a pandas dataframe with the analysis of statistical results
Res
# pRes is a pandas dataframe with extracted p-values
pRes
```
### Running tests
```bash
pip install pytest
pytest tests/
```

[statcheck.io](http://statcheck.io/) is a web-based interface for
statcheck.  

### Author of the Python implementation
** Hubert Plisiecki **
* [Github](https://github.com/hplisiecki)
* [ResearchGate](https://www.researchgate.net/profile/Hubert-Plisiecki-2)
* [LinkedIn](https://www.linkedin.com/in/hubert-plisiecki-64182b1ab/)

## Citation
```yaml
---
@misc{MicheleNuijten,  
  author = {Michèle B. Nuijten},  
  title = {statcheck},  
  year = {2021},  
  url = {{https://github.com/MicheleNuijten/statcheck}}  
}
---
```
