==========
Validation
==========

Estimation of the treatment effect cannot be validated the same way as regular ML predictions because the true value is not available except for the experimental data. Here we focus on the internal validation methods under the assumption of unconfoundedness of potential outcomes and the treatment status conditioned on the feature set available to us.

Validation with Multiple Estimates
----------------------------------

We can validate the methodology by comparing the estimates with other approaches, checking the consistency of estimates across different levels and cohorts.

Model Robustness for Meta Algorithms
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In meta-algorithms we can assess the quality of user-level treatment effect estimation by comparing estimates from different underlying ML algorithms. We will report MSE, coverage (overlapping 95% confidence interval), uplift curve. In addition, we can split the sample within a cohort and compare the result from out-of-sample scoring and within-sample scoring.

User Level/Segment Level/Cohort Level Consistency
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We can also evaluate user-level/segment level/cohort level (as in `CeViChE <https://docs.google.com/presentation/d/1WaXgwIwFsgBmrjz0-awk6TS5kIpVgCJxPg9Wm8mzcmg/edit#slide=id.g4dec088d29_0_141>`_) estimation consistency by conducting T-test.

Stability between Cohorts
~~~~~~~~~~~~~~~~~~~~~~~~~

Treatment effect may vary from cohort to cohort but should not be too volatile. For a given cohort, we will compare the scores generated by model fit to another score with the ones generated by its own model.

Validation with Synthetic Data Sets
-----------------------------------

We can test the methodology with simulations, where we generate data with known causal and non-causal links between the outcome, treatment and some of confounding variables.

We implemented the following sets of synthetic data generation mechanisms based on :cite:`nie2017quasi`:

Mechanism 1
~~~~~~~~~~~

| This generates a complex outcome regression model with easy treatment effect with input variables :math:`X_i \sim Unif(0, 1)^d`.
| The treatment flag is a binomial variable, whose d.g.p. is:
|
|   :math:`P(W_i = 1 | X_i) = logit(trim_{0.1}(sin(\pi X_{i1} X_{i2}))`
|
| The outcome variable is:
|
|   :math:`y_i = sin(\pi X_{i1} X_{i2}) + 2(X_{i3} - 0.5)^2 + X_{i4} + 0.5 X_{i5} + (W_i - 0.5)(X_{i1} + X_{i2})/ 2 + \epsilon_i`
|

Mechanism 2
~~~~~~~~~~~

| This simulates a randomized trial. The input variables are generated by :math:`X_i \sim N(0, I_{d\times d})`
|
| The treatment flag is generated by a fair coin flip:
|
|   :math:`P(W_i = 1|X_i) = 0.5`
|
| The outcome variable is
|
|   :math:`y_i = max(X_{i1} + X_{i2}, X_{i3}, 0) + max(X_{i4} + X_{i5}, 0) + (W_i - 0.5)(X_{i1} + \log(1 + e^{X_{i2}}))`
|

Mechanism 3
~~~~~~~~~~~

| This one has an easy propensity score but a difficult control outcome. The input variables follow :math:`X_i \sim N(0, I_{d\times d})`
|
| The treatment flag is a binomial variable, whose d.g.p is:
|
|   :math:`P(W_i = 1 | X_i) = logit(X_{i2} + X_{i3})`
|
| The outcome variable is:
|
|   :math:`y_i = 2\log(1 + e^{X_{i1} + X_{i2} + X_{i3}}) + (W_i - 0.5)`
|

Mechanism 4
~~~~~~~~~~~

| This contains an unrelated treatment arm and control arm, with input data generated by :math:`X_i \sim N(0, I_{d\times d})`.
|
| The treatment flag is a binomial variable whose d.g.p. is:
|
|   :math:`P(W_i = 1 | X_i) = logit(X_{i1} + X_{i2})`
|
| The outcome variable is:
|
|   :math:`y_i = \frac{1}{2}\big(max(X_{i1} + X_{i2} + X_{i3}, 0) + max(X_{i4} + X_{i5}, 0)\big) + (W_i - 0.5)(max(X_{i1} + X_{i2} + X_{i3}, 0) - max(X_{i4}, X_{i5}, 0))`
|
