Metadata-Version: 2.1
Name: forestplot
Version: 0.2.2
Summary: A Python package to make publication-ready but customizable forest plots.
Home-page: https://github.com/lsys/forestplot
Author: Lucas Shen
Author-email: lucas@lucasshen.com
Maintainer: Lucas Shen
Maintainer-email: lucas@lucasshen.com
License: MIT
Keywords: visualization,python,data-science,dataviz,pandas,matplotlib,mpl,forestplot,blobbogram
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Description-Content-Type: text/markdown
License-File: LICENSE

<div id="top"></div> 
<h1 align="center" >
  Forestplot
</h1>
<p align="center">
  <a href="https://pypi.org/project/forestplot">
  <img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/forestplot?label=Python&logo=python&logoColor=white">
  </a><br>
  <b>Easy API for forest plots.</b><br>
  A Python package to make publication-ready but customizable forest plots.
</p>

<p align="center"><img width="100%" src="https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/main.png"></p>

-----------------------------------------------------------
This package makes publication-ready forest plots easy to make out-of-the-box. Users provide a `dataframe` (e.g. from a spreadsheet) where rows correspond to a variable/study with columns including estimates, variable labels, and lower and upper confidence interval limits.
Additional options allow easy addition of columns in the `dataframe` as annotations in the plot.

<!---------------------- Project shields ---------------------->

|    |    |
| --- | --- |
| Release | [![PyPI](https://img.shields.io/pypi/v/forestplot?color=blue&label=PyPI&logo=pypi&logoColor=white)](https://pypi.org/project/forestplot/) [![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/forestplot?logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/forestplot) [![GitHub release (latest by date)](https://img.shields.io/github/v/release/lsys/forestplot?color=blue&label=Latest%20release)](https://github.com/LSYS/forestplot/releases) |
| Status | [![CI](https://github.com/LSYS/forestplot/actions/workflows/CI.yml/badge.svg)](https://github.com/LSYS/forestplot/actions/workflows/CI.yml) [![Notebooks](https://github.com/LSYS/forestplot/actions/workflows/nb.yml/badge.svg)](https://github.com/LSYS/forestplot/actions/workflows/nb.yml) |
| Coverage |  [![Codecov](https://img.shields.io/codecov/c/github/lsys/forestplot?logo=codecov&logoColor=white&label=codecov)](https://app.codecov.io/gh/LSYS/forestplot) |
| Python | [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/forestplot?label=Python%203.6%2B&logo=python&logoColor=white)](https://pypi.org/project/forestplot/) |
| Docs | [![Read the Docs (version)](https://img.shields.io/readthedocs/forestplot/stable?label=docs&logo=readthedocs&logoColor=white)](https://forestplot.readthedocs.io/en/latest/?badge=latest) [![DocLinks](https://github.com/LSYS/forestplot/actions/workflows/links.yml/badge.svg)](https://github.com/LSYS/forestplot/actions/workflows/links.yml)|
| Meta | [![GitHub](https://img.shields.io/github/license/lsys/forestplot?color=purple&label=License)](https://choosealicense.com/licenses/mit/) [![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![types - Mypy](https://img.shields.io/badge/types-Mypy-blue.svg)](https://github.com/python/mypy) [![DOI](https://zenodo.org/badge/510013191.svg)](https://zenodo.org/badge/latestdoi/510013191) |
| Binder| [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/lsys/forestplot/main?labpath=examples%2Freadme-examples.ipynb) |

<!---------------------- TABLE OF CONTENT ---------------------->
# Table of Contents[![](https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/pin.svg)](#table-of-contents)
<details open><summary><b>show/hide</b></summary><p>

> - [Installation](#installation)
> - [Quick Start](#quick-start)
> - [Some Examples with Customizations](#some-examples-with-customizations)
> - [Gallery and API Options](#gallery-and-api-options)
> - [Known Issues](#known-issues)
> - [Background and Additional Resources](#background-and-additional-resources)
> - [Contributing](#contributing)
</p></details><p></p>

<!------------------------- INSTALLATION ------------------------->
## Installation[![](https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/pin.svg)](#installation)

Install from PyPI<br>
 [![PyPI](https://img.shields.io/pypi/v/forestplot?color=blue&label=PyPI&logo=pypi&logoColor=white)](https://pypi.org/project/forestplot/)
```bash
pip install forestplot
```

Install from conda-forge<br>
[![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/forestplot?logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/forestplot)
```bash
conda install forestplot
```

Install from source<br>
[![GitHub release (latest by date)](https://img.shields.io/github/v/release/lsys/forestplot?color=blue&label=Latest%20release)](https://github.com/LSYS/forestplot/releases)<br>
```bash
git clone https://github.com/LSYS/forestplot.git
cd forestplot
pip install .
```

Developer installation<br>
```bash
git clone https://github.com/LSYS/forestplot.git
cd forestplot
pip install -r requirements_dev.txt

make lint
make test
```

<p align="right">(<a href="#top">back to top</a>)</p>


<!------------------------- QUICK START ------------------------->
## Quick Start[![](https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/pin.svg)](#quick-start)

```python
import forestplot as fp

df = fp.load_data("sleep")  # companion example data
df.head(3)
```
|    | var      |          r |   moerror | label                     | group         |    ll |    hl |   n |    power |     p-val |
|---:|:---------|-----------:|----------:|:--------------------------|:--------------|------:|------:|----:|---------:|----------:|
|  0 | age      |  0.0903729 | 0.0696271 | in years                  | age           |  0.02 |  0.16 | 706 | 0.671578 | 0.0163089 |
|  1 | black    | -0.0270573 | 0.0770573 | =1 if black               | other factors | -0.1  |  0.05 | 706 | 0.110805 | 0.472889  |
|  2 | clerical |  0.0480811 | 0.0719189 | =1 if clerical worker     | occupation    | -0.03 |  0.12 | 706 | 0.247768 | 0.201948  |

(* This is a toy example of how certain factors correlate with the amount of sleep one gets. See the [notebook that generates the data](https://nbviewer.org/github/LSYS/forestplot/blob/main/examples/get-sleep.ipynb).)

<details><summary><i>The example input dataframe above have 4 key columns</i></summary>

  | Column    | Description                                     | Required  |
  |:----------|:------------------------------------------------|:----------|
  | `var`     | Variable field                                  |           |
  | `r`       | Correlation coefficients (estimates to plot)    | &check;   |
  | `moerror` | Conf. int.'s *margin of error*.                 |           |
  | `label`   | Variable labels                                 | &check;   |
  | `group`   | Variable grouping labels                        |           |
  | `ll`      | Conf. int. *lower limits*                       | &check;  |
  | `hl`      | Containing the conf. int. *higher limits*       | &check;  |
  | `n`       | Sample size                                     |           |
  | `power`   | Statistical power                               |           |
  | `p-val`   | P-value                                         |           |

  (See [Gallery and API Options](#gallery-and-api-options) for more details on required and optional arguments.)  
</details>

Make the forest plot
```python
fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # columns containing conf. int. lower and higher limits
              varlabel="label",  # column containing variable label
              ylabel="Confidence interval",  # y-label title
              xlabel="Pearson correlation",  # x-label title
              )
```
<p align="left"><img width="55%" src="https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/vanilla.png"></p>

Save the plot
```python
plt.savefig("plot.png", bbox_inches="tight")
```
<p align="right">(<a href="#top">back to top</a>)</p>

<!------------------ EXAMPLES of CUSTOMIZATIONS ------------------>
## Some Examples With Customizations[![](https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/pin.svg)](#examples-with-customizations)

1. Add variable groupings, add group order, and sort by estimate size.
```python
fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              moerror="moerror",  # columns containing conf. int. margin of error
              varlabel="label",  # column containing variable label
              capitalize="capitalize",  # Capitalize labels
              groupvar="group",  # Add variable groupings 
              # group ordering
              group_order=["labor factors", "occupation", "age", "health factors", 
                           "family factors", "area of residence", "other factors"],
              sort=True,  # sort in ascending order (sorts within group if group is specified)               
              )
```
<p align="left"><img width="75%" src="https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/group-grouporder-sort.png"></p>

2. Add p-values on the right and color alternate rows gray
```python
fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # columns containing conf. int. lower and higher limits
              varlabel="label",  # column containing variable label
              capitalize="capitalize",  # Capitalize labels
              groupvar="group",  # Add variable groupings 
              # group ordering
              group_order=["labor factors", "occupation", "age", "health factors", 
                           "family factors", "area of residence", "other factors"],
              sort=True,  # sort in ascending order (sorts within group if group is specified)               
              pval="p-val",  # Column of p-value to be reported on right
              color_alt_rows=True,  # Gray alternate rows
              ylabel="Est.(95% Conf. Int.)",  # ylabel to print
              **{"ylabel1_size": 11}  # control size of printed ylabel
              )
```

<p align="left"><img width="80%" src="https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/group-grouporder-pvalue-sort-colorrows.png"></p>


3. Customize annotations and make it a table
```python
fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # lower & higher limits of conf. int.
              varlabel="label",  # column containing the varlabels to be printed on far left
              capitalize="capitalize",  # Capitalize labels
              pval="p-val",  # column containing p-values to be formatted
              annote=["n", "power", "est_ci"],  # columns to report on left of plot
              annoteheaders=["N", "Power", "Est. (95% Conf. Int.)"],  # ^corresponding headers
              rightannote=["formatted_pval", "group"],  # columns to report on right of plot 
              right_annoteheaders=["P-value", "Variable group"],  # ^corresponding headers
              xlabel="Pearson correlation coefficient",  # x-label title
              table=True,  # Format as a table
              )
```

<p align="left"><img width="85%" src="https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/leftannote-rightannote-table.png"></p>

4. Strip down all bells and whistle
```python
fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # lower & higher limits of conf. int.
              varlabel="label",  # column containing the varlabels to be printed on far left
              capitalize="capitalize",  # Capitalize labels
              ci_report=False,  # Turn off conf. int. reporting
              flush=False,  # Turn off left-flush of text
              **{'fontfamily': 'sans-serif'}  # revert to sans-serif                              
              )
```               
<p align="left"><img width="40%" src="https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/vcoefplot.png"></p>

5. Example with more customizations
```python
fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # lower & higher limits of conf. int.
              varlabel="label",  # column containing the varlabels to be printed on far left
              capitalize="capitalize",  # Capitalize labels
              pval="p-val",  # column containing p-values to be formatted
              annote=["n", "power", "est_ci"],  # columns to report on left of plot
              annoteheaders=["N", "Power", "Est. (95% Conf. Int.)"],  # ^corresponding headers
              rightannote=["formatted_pval", "group"],  # columns to report on right of plot 
              right_annoteheaders=["P-value", "Variable group"],  # ^corresponding headers
              groupvar="group",  # column containing group labels
              group_order=["labor factors", "occupation", "age", "health factors", 
                           "family factors", "area of residence", "other factors"],                   
              xlabel="Pearson correlation coefficient",  # x-label title
              xticks=[-.4,-.2,0, .2],  # x-ticks to be printed
              sort=True,  # sort estimates in ascending order
              table=True,  # Format as a table
              # Additional kwargs for customizations
              **{"marker": "D",  # set maker symbol as diamond
                 "markersize": 35,  # adjust marker size
                 "xlinestyle": (0, (10, 5)),  # long dash for x-reference line 
                 "xlinecolor": "#808080",  # gray color for x-reference line
                 "xtick_size": 12,  # adjust x-ticker fontsize
                }  
              )
```
<p align="left"><img width="80%" src="https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/main.png"></p>

<details><summary><i>Annotations arguments allowed include:</i></summary>
  
  * `ci_range`: Confidence interval range (e.g. `(-0.39 to -0.25)`).
  * `est_ci`: Estimate and CI (e.g. `-0.32(-0.39 to -0.25)`).
  * `formatted_pval`: Formatted p-values (e.g. `0.01**`).
  
  To confirm what processed `columns` are available as annotations, you can do:
  
  ```python
  processed_df, ax = fp.forestplot(df, 
                                   ...  # other arguments here
                                   return_df=True  # return processed dataframe with processed columns
                                  )
  processed_df.head(3)
  ```
  
  |    | label                | group         |   n |          r | CI95%         |       p-val |      BF10 |   power | var    |    hl |    ll |   moerror |   formatted_r |   formatted_ll |   formatted_hl | ci_range         | est_ci                | formatted_pval   |   formatted_n |   formatted_power | formatted_est_ci      | yticklabel                                                        | formatted_formatted_pval   | formatted_group   | yticklabel2            |
|---:|:---------------------|:--------------|----:|-----------:|:--------------|------------:|----------:|--------:|:-------|------:|------:|----------:|--------------:|---------------:|---------------:|:-----------------|:----------------------|:-----------------|--------------:|------------------:|:----------------------|:------------------------------------------------------------------|:---------------------------|:------------------|:-----------------------|
|  0 | Mins worked per week | Labor factors | 706 | -0.321384  | [-0.39 -0.25] | 1.99409e-18 | 1.961e+15 |    1    | totwrk | -0.25 | -0.39 | 0.0686165 |         -0.32 |          -0.39 |          -0.25 | (-0.39 to -0.25) | -0.32(-0.39 to -0.25) | 0.0***           |           706 |              1    | -0.32(-0.39 to -0.25) | Mins worked per week            706  1.0    -0.32(-0.39 to -0.25) | 0.0***                     | Labor factors     | 0.0***   Labor factors |
|  1 | Years of schooling   | Labor factors | 706 | -0.0950039 | [-0.17 -0.02] | 0.0115515   | 1.137     |    0.72 | educ   | -0.02 | -0.17 | 0.0749961 |         -0.1  |          -0.17 |          -0.02 | (-0.17 to -0.02) | -0.10(-0.17 to -0.02) | 0.01**           |           706 |              0.72 | -0.10(-0.17 to -0.02) | Years of schooling              706  0.72   -0.10(-0.17 to -0.02) | 0.01**                     | Labor factors     | 0.01**   Labor factors |
  
</details>
<p align="right">(<a href="#top">back to top</a>)</p>




<!------------------- GALLERY AND API OPTIONS ------------------->
## Gallery and API Options[![](https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/pin.svg)](#gallery-and-api-options)

[![Notebooks](https://github.com/LSYS/forestplot/actions/workflows/nb.yml/badge.svg)](https://github.com/LSYS/forestplot/actions/workflows/nb.yml)

Check out [this jupyter notebook](https://nbviewer.org/github/LSYS/forestplot/blob/main/examples/readme-examples.ipynb) for a gallery variations of forest plots possible out-of-the-box.
The table below shows the list of arguments users can pass in.
More fined-grained control for base plot options (eg font sizes, marker colors) can be inferred from the [example notebook gallery](https://nbviewer.org/github/LSYS/forestplot/blob/main/examples/readme-examples.ipynb). 


| Option      | Description                                                                                                                                                 | Required   |
|:-------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:---|
| `dataframe`           | Pandas dataframe where rows are variables (or studies for meta-analyses) and columns include estimated effect sizes, labels, and confidence intervals, etc. | &check; |
| `estimate`            | Name of column in `dataframe` containing the *estimates*.                                                                                                   | &check; |
| `varlabel`            | Name of column in `dataframe` containing the *variable labels* (study labels if meta-analyses).                                                             | &check; |
| `ll`                  | Name of column in `dataframe` containing the conf. int. *lower limits*.                                                                                     | &check; |
| `hl`                  | Name of column in `dataframe` containing the conf. int. *higher limits*.                                                                                    | &check; |
| `logscale`            | If True, make the x-axis log scale. Default is False.                                                                                                     |  |
| `capitalize`          | How to capitalize strings. Default is None. One of "capitalize", "title", "lower", "upper", "swapcase".                                                      | |
| `form_ci_report`      | If True (default), report the estimates and confidence interval beside the variable labels.                                                                 |          |
| `ci_report`           | If True (default), format the confidence interval as a string.                                                                                              |          |
| `groupvar`            | Name of column in `dataframe` containing the variable *grouping labels*.                                                                                    |       |
| `group_order`         | List of group labels indicating the order of groups to report in the plot.                                                                                  |       |
| `annote`              | List of columns to add as annotations on the left-hand side of the plot.                                                                                    |       |
| `annoteheaders`       | List of column headers for the left-hand side annotations.                                                                                                  |       |
| `rightannote`         | List of columns to add as annotations on the right-hand side of the plot.                                                                                   |       |
| `right_annoteheaders` | List of column headers for the right-hand side annotations.                                                                                                 |       |
| `pval`                | Name of column in `dataframe` containing the p-values.                                                                                                      |       |
| `starpval`            | If True (default), format p-values with stars indicating statistical significance.                                                                          |          |
| `sort`                | If True, sort variables by `estimate` values in ascending order.                                                                                            |          |
| `sortby`              | Name of column to sort by. Default is `estimate`.                                                                                                           |       |
| `flush`               | If True (default), left-flush variable labels and annotations.                                                                                              |          |
| `decimal_precision`   | Number of decimal places to print. (Default = 2)                                                                                                            |          |
| `figsize`             | Tuple indicating core figure size. Default is (4, 8)                                                                                                        |          |
| `xticks`              | List of xticklabels to print on x-axis.                                                                                                                     |       |
| `ylabel`              | Y-label title.                                                                                                                                              |      |
| `xlabel`              | X-label title.                                                                                                                                              |       |
| `color_alt_rows`      | If True, shade out alternating rows in gray.                                                                                                                |          |
| `preprocess`          | If True (default), preprocess the `dataframe` before plotting.                                                                                              |          |
| `return_df`           | If True, returned the preprocessed `dataframe`.                                                                                                             |          |

<p align="right">(<a href="#top">back to top</a>)</p>

<!------------------------ KNOWN ISSUES ------------------------>
## Known Issues[![](https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/pin.svg)](#known-issues)

* Variable labels coinciding with group variables may lead to unexpected formatting issues in the graph.
* Left-flushing of annotations relies on the `monospace` font.
* Plot may give strange behavior for few rows of data (six rows or fewer. [see this issue](https://github.com/LSYS/forestplot/issues/52))
* Plot can get cluttered with too many variables/rows (~30 onwards) 
<p align="right">(<a href="#top">back to top</a>)</p>

<!----------------- BACKGROUND AND ADDITIONAL RESOURCES ----------------->
## Background and Additional Resources[![](https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/pin.svg)](#background-and-additional-resources)

**More about forest plots**

[Forest plots](https://en.wikipedia.org/wiki/Forest_plot) have many aliases (h/t Chris Alexiuk). Other names include coefplots, coefficient plots, meta-analysis plots, dot-and-whisker plots, blobbograms, margins plots, regression plots, and ropeladder plots. 

[Forest plots](https://en.wikipedia.org/wiki/Forest_plot) in the medical and health sciences literature are plots that report results from different studies as a meta-analysis. Markers are centered on the estimated effect and horizontal lines running through each marker depicts the confidence intervals.

The simplest version of a forest plot has two columns: one for the variables/studies, and the second for the estimated coefficients and confidence intervals.
This layout is similar to coefficient plots ([coefplots](http://repec.sowi.unibe.ch/stata/coefplot/getting-started.html)) and is thus useful for more than meta-analyses.

<details><summary><i>More resources about forest plots</i></summary><p>

* [[1]](https://doi.org/10.1038/s41433-021-01867-6) Chang, Y., Phillips, M.R., Guymer, R.H. et al. The 5â€‰min meta-analysis: understanding how to read and interpret a forest plot. Eye 36, 673â€“675 (2022).
* [[2]](https://doi.org/10.1136/bmj.322.7300.1479) Lewis S, Clarke M. Forest plots: trying to see the wood and the trees BMJ 2001; 322 :1479 
</p></details><p></p>

**More about this package**

The package is lightweight, built on `pandas`, `numpy`, and `matplotlib`.

It is slightly opinioniated in that the aesthetics of the plot inherits some of my sensibilities about what makes a nice figure.
You can however easily override most defaults for the look of the graph. This is possible via `**kwargs` in the `forestplot` API (see [Gallery and API options](#gallery-and-api-options)) and the `matplotlib` API.

**Planned enhancements** include forest plots where each row can have multiple coefficients (e.g. from multiple models). 

<details><summary><i>Related packages</i></summary><p>

* [[1]](https://www.stata-journal.com/article.html?article=gr0059) [Stata] Jann, Ben (2014). Plotting regression coefficients and other estimates. The Stata Journal 14(4): 708-737. 
* [[2]](https://www.statsmodels.org/devel/examples/notebooks/generated/metaanalysis1.html) [Python] Meta-Analysis in statsmodels
* [[3]](https://github.com/seafloor/forestplot) [Python] Matt Bracher-Smith's Forestplot
* [[4]](https://github.com/fsolt/dotwhisker) [R] Solt, Frederick and Hu, Yue (2021) dotwhisker: Dot-and-Whisker Plots of Regression Results
* [[5]](https://rpubs.com/mbounthavong/forest_plots_r) [R] Bounthavong, Mark (2021) Forest plots. RPubs by RStudio
</p></details><p></p>

<p align="right">(<a href="#top">back to top</a>)</p>


<!----------------------- CONTRIBUTING ----------------------->
## Contributing[![](https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/pin.svg)](#contributing)

Contributions are welcome, and they are greatly appreciated!

**Potential ways to contribute:**

* Raise issues/bugs/questions
* Write tests for missing coverage
* Add features (see [examples notebook](https://nbviewer.org/github/LSYS/forestplot/blob/main/examples/readme-examples.ipynb) for a survey of  existing features)
* Add example datasets with companion graphs
* Add your graphs with companion code

**Issues**

Please submit bugs, questions, or issues you encounter to the [GitHub Issue Tracker](https://github.com/lsys/forestplot/issues).
For bugs, please provide a minimal reproducible example demonstrating the problem.

**Pull Requests**

Please feel free to open an issue on the [Issue Tracker](https://github.com/lsys/forestplot/issues) if you'd like to discuss potential contributions via PRs.

<p align="right">(<a href="#top">back to top</a>)</p>


