# austen_plots
## Introduction
This repository contains demo data and code for  
[Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding](https://arxiv.org/abs/2003.01747)  
_Victor Veitch and Anisha Zaveri_

If a common cause affects both a treatment and outcome it can induce a spurious correlation. 
For example, the wealth of a patient influences both their health outcomes and whether they take an expensive drug.
The presence of the common cause (wealth) induces a spurious positive association between the drug and the outcome.
Austen plots are a simple visual method of determining whether some unobserved common could explain away the association between a specified treatment and outcome. The included software produces Austen plots from the outputs of standard data modeling used in causal inference pipelines. 

## Requirements
The code has been tested on Python 3.8.12 with the packages specified in `requirements.txt`

## Instructions
See demo notebook `austen_plots_demo.ipynb`
### Without Bootstrapping
Use files under `example_data/` as reference
1) Fit your data using any model and generate predictions for g, the propensity score, and Q, the conditional expected outcome.
2) Generate a .csv file with the following columns: 'g', 'Q', 't', 'y'. These correspond to the propensity score, the conditional expected outcome, the treatment and the outcome. For reference look at `input_df.csv` provided under `example_data/`.
3) (Optional, but recommended) Repeat step 1 with key covariates dropped before model fitting. For each such instance, generate a .csv file similar to step 2. Save all such files under a single directory (called `covariates` in the example). If you name one of these files 'treatment.csv', the code assumes that these are predictions generated from data without treatment, and thus this is not plotted on the graph. However, the Rsqhat value for 'treatment' is provided in the output co-ordinates file.
4) Decide a meaningful amount of bias you would like to test for, based on domain knowledge about your dataset. Let's fix this as 2 for the example dataset
5) Run the following code (values correspond to the example dataset).  

```python
from austen_plots.AustenPlot import AustenPlot
import os

input_df_path = './example_data/input_df.csv'
bias = 2.0

# if you have no covariate controls skip specifying covariate_dir_path
covariate_dir_path = './example_data/covariates/'

ap = AustenPlot(input_df_path, covariate_dir_path)
p, plot_coords, variable_coords = ap.fit(bias=2.0)
# or if you would like to calculate an Austen plot using ATT instead
p, plot_coords, variable_coords = ap.fit(bias=2.0, do_att=True)

#save outputs
output_dir = './example_data/output/'
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

p.save(os.path.join(output_dir,
                        'austen_plot.png'), dpi=500, verbose=False)
plot_coords.to_csv(os.path.join(output_dir, 'plot_coords.csv'), index=False)
variable_coords.to_csv(os.path.join(output_dir, 'variable_coords.csv'), index=False)
```

### With Bootstrapping
You can optionally decide to generate plots with bootstrap confidence intervals. For this, after doing the steps 1-4 under the section 'Without bootstrapping' do the following:
1) Create a directory for bootstrapped inputs. In the example this is called `bootstrap`.
2) Within `bootstrap` create subdirectories for each bootstrap iteration.
3) Within each boostrapped subdirectory save .csv and, optionally, covariate files, as described in steps 1-3 using 'g', 'Q', 't', 'y' values obtained from a bootstrapped dataset. These should have the same names as those in the parent folder. 
_Recommendation_: If you are generating these values using cross validation techniques on a model, ensure that replicate rows generated by the bootstrapping procedure are within the same fold.
4) (Optional) Decide a value for confidence interval cutoffs (Default=0.95)
5) Run the following code (values correspond to the example dataset). 

```python
from austen_plots.AustenPlot import AustenPlot
import os

input_df_path = './example_data/input_df.csv'
bias = 2.0
ci_cutoff = 0.9
bootstrap_dir_path = './example_data/bootstrap/'

# if you have no covariate controls skip specifying covariate_dir_path
covariate_dir_path = './example_data/covariates/'

ap = AustenPlot(input_df_path, covariate_dir_path, bootstrap_dir_path)
p, plot_coords, variable_coords = ap.fit(bias=2.0, do_bootstrap=True, ci_cutoff=0.9)
# or if you would like to calculate an Austen plot using ATT instead
p, plot_coords, variable_coords = ap.fit(bias=2.0, do_bootstrap=True, , ci_cutoff=0.9, do_att=True)

# save outputs as shown above
```
