Provide nuisance estimates to pseudo-outcome methods

## Status quo

As of now we have the following interface for the pseudo-outcome methods in the R-Learner and R-Learner:

- DR-Learner https://github.com/Quantco/metalearners/blob/d863df1dda4c278e8c50012c6df02e5732a2a923/metalearners/drlearner.py#L381-L390

- R-Learner
https://github.com/Quantco/metalearners/blob/d863df1dda4c278e8c50012c6df02e5732a2a923/metalearners/rlearner.py#L469-L479

Since both pseudo outcome kinds require nuisance model estimates and since these are visibly not provided as input arguments, they are estimated as part of the respective pseudo outcome method. 

Importantly, the pseudo outcome methods are treatment-variant specific. Yet, the nuisance estimates estimated as part of the pseudo outcome methods are not treatment variant specific:
- In the case of the R-Learner, the overall outcome model $\hat{\mu}$ is applied on all data; the overall propensity model $\hat{e}$ is applied on all data. Only after the estimation is the data filtered wrt to the treatment variant at hand: https://github.com/Quantco/metalearners/blob/d863df1dda4c278e8c50012c6df02e5732a2a923/metalearners/rlearner.py#L495-L508
  
- In the case of the DR-Learner, the propensity $\hat{e}$ and all conditional average outcomes $\hat{mu}_k$ are estimated for all data points; filtering of variant-specific information only happens thereafter:
https://github.com/Quantco/metalearners/blob/d863df1dda4c278e8c50012c6df02e5732a2a923/metalearners/drlearner.py#L394-L411

## Assessment

In the case of $k>2$ many treatment variants, the above approach causes needlessly much effort since the same nuisance estimates are created, i.e. repeated, for every single treatment variant, which is not considered to be the 'control'. 

Computational burden aside, it is not clear that it is a better method interface that the pseudo outcome methods does the estimation itself. Wouldn't it feel more natural that (and concerns be better separated if) the pseudo outcome methods merely _defined_ the pseudo outcome given the nuisance estimates, rather than estimating quantities itself?

	def _pseudo_outcome(
	self,
	X: Matrix,
	y: Vector,
	w: Vector,
	treatment_variant: int,
	is_oos: bool,
	oos_method: OosMethod = OVERALL,
	epsilon: float = _EPSILON,
	) -> np.ndarray:

	def _pseudo_outcome_and_weights(
	self,
	X: Matrix,
	y: Vector,
	w: Vector,
	treatment_variant: int,
	is_oos: bool,
	oos_method: OosMethod = OVERALL,
	mask: Vector \| None = None,
	epsilon: float = _EPSILON,
	) -> tuple[np.ndarray, np.ndarray]:

	y_estimates = self.predict_nuisance(
	X=X,
	is_oos=is_oos,
	model_kind=OUTCOME_MODEL,
	model_ord=0,
	oos_method=oos_method,
	)[mask]
	w_estimates = self.predict_nuisance(
	X=X,
	is_oos=is_oos,
	model_kind=PROPENSITY_MODEL,
	model_ord=0,
	oos_method=oos_method,
	)[mask]

	conditional_average_outcome_estimates = (
	self.predict_conditional_average_outcomes(
	X=X,
	is_oos=is_oos,
	oos_method=oos_method,
	)
	)

	propensity_estimates = self.predict_nuisance(
	X=X,
	is_oos=is_oos,
	oos_method=oos_method,
	model_kind=PROPENSITY_MODEL,
	model_ord=0,
	)

	y0_estimate = conditional_average_outcome_estimates[:, 0]
	y1_estimate = conditional_average_outcome_estimates[:, treatment_variant]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide nuisance estimates to pseudo-outcome methods #82

Status quo

Assessment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provide nuisance estimates to pseudo-outcome methods #82

Description

Status quo

Assessment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions