Leakage in X-Learner in-sample prediction

## Issue at hand

@ArseniyZvyagintsevQC brought the following to our attention:

Let us assume a binary treatment variant scenario in which we want to work with in-sample predictions, i.e. `is_oos=False`.

The current implementation would go about fitting five models, three of which considered nuisance models and two of which considered treatment models:

| model | target | cross-fitting dataset | stage | name |
|---|---|---|---|---|
| $\hat\{\mu}_0$ | $Y_i$  | $\\{(X_i, Y_i) \| W_i=0\\}$ |  nuisance | `"treatment_variant"` |
| $\hat{\mu}_1$ | $Y_i$ | $\\{(X_i, Y_i) \| W_i=1\\}$ | nuisance | `"treatment_variant"` |
| $\hat{e}$ | $W_i$ | $\\{(X_i, Y_i)\\}$ | nuisance/propensity | `"propensity_model"` |
| $\hat{\tau}_0$ | $\hat{\mu}(X_i) - Y_0$ | $\\{(X_i, Y_i) \| W_i=0\\}$ | treatment | `"control_effect_model"` |
| $\hat{\tau}_1$ | $Y_i - \hat{\mu}(X_i)$ | $\\{(X_i, Y_i) \| W_i=1\\}$ | treatment | `"treatment_effect_model"` |

More background on this [here](https://metalearners.readthedocs.io/en/latest/background.html#x-learner).

Note that each of these models is cross-fitted. More precisely, each is cross-fitted wrt the data it has seen at training time.

Let's suppose now that we are at inference time and encounter an in-sample data point $i$. Wlog, let's assume that $W_i=1$.
In order to come up with a CATE estimate, the `predict` method will run 
- $\hat{\tau}_0(X_i)$ with `is_oos=True` since this datapoint has not been seen during training time of the model $\hat{\tau}_0$
- $\hat{\tau}_1(X_i)$ with `is_oos=False` since this datapoint has indeed been seen during the training time of the model $\hat{\tau}_1$

The latter call makes sure we avoid leakage in $\hat{\tau}_1$. The former call, however, does not completely avoid leakage:
even though $i$ hasn't been seen in the training of $\hat{\tau}_0$, it has been seen in $\hat\{\mu}_1$, which is, in turn, used by $\hat{\tau}_0$. Therefore, the observed outcome $Y_i$ can leak into the estimate $\hat{\tau}(X_i)$.

## Next steps
We can devise an extreme, naïve approach to counteract this issue by training every type of model once per datapoint. Clearly, this ensures the absence of data leakage. The challenge with this issue revolves around coming up with a design that
- allows for arbitrary numbers (>1, <=n) of cross-fitting folds, i.e. not fixing it to be equal to the number of training data points
- integrates well into the structure of the library


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leakage in X-Learner in-sample prediction #80

Issue at hand

Next steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

model	target	cross-fitting dataset	stage	name
$\hat{\mu}_0$	$Y_i$	$\{(X_i, Y_i) \| W_i=0\}$	nuisance	`"treatment_variant"`
$\hat{\mu}_1$	$Y_i$	$\{(X_i, Y_i) \| W_i=1\}$	nuisance	`"treatment_variant"`
$\hat{e}$	$W_i$	$\{(X_i, Y_i)\}$	nuisance/propensity	`"propensity_model"`
$\hat{\tau}_0$	$\hat{\mu}(X_i) - Y_0$	$\{(X_i, Y_i) \| W_i=0\}$	treatment	`"control_effect_model"`
$\hat{\tau}_1$	$Y_i - \hat{\mu}(X_i)$	$\{(X_i, Y_i) \| W_i=1\}$	treatment	`"treatment_effect_model"`

Leakage in X-Learner in-sample prediction #80

Description

Issue at hand

Next steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions