Prediction with lm_lin() fixes #415 #416

mollyow · 2025-01-07T01:02:22Z

This PR modifies get_X() in predict.lm_robust() to appropriately create model matrices for new data predicted from lm_lin() models with multi-valued and factorial treatments, and fixes #415.

Changes in this PR:

allows for prediction with lm_lin() when treatment is a factor and/or multi-valued (primary goal)
- adds saved treatment_levels to the returned lm_lin model object
- stops prediction for lm_lin if the treatment values in new data are not a subset of treatment_levels. ¹
standardizes model fit for lm_lin() models with no intercept ²
adds tests to ensure identical predictions from lm_lin() models where treatment is either numeric or factorial, and fit with/without an intercept (there may be extremely small floating point differences)
adds relevant examples to predict.lm_robust and lm_lin documentation.

Notes:
1: This change has consequences for use with margins::margins(), which now must be instructed that treatment is a factor.

Why? margins() perturbs values of the variable to get marginal effects. The perturbed values will not be a subset of the original treatment levels, which now throws an error.
If we only ever had binary treatments, this would not be an issue. However, lm_lin() allows users to input multi-valued treatments as a numeric variable, but recognizes each distinct numeric value as a different treatment level. Meanwhile, margins() would still treat these variables as continuous, resulting in different errors or weird behavior. The margins() package already has intended behavior for factor variables following Stata implementation (margins #6), users just need to explicitly instruct margins that treatment is a factor to get correct behavior.

2:
There is a small change to modify behavior of lm_lin() for binary 0/1 treatments with no intercept. I think it's an open question what correct behavior should be here, as in Winston's original paper all models have intercepts. Previously, with binary treatment, if there's no intercept you would get a model with a treatment indicator, de-meaned covariates, and treatment interacted with covariates:

set.seed(60637)

N <- 40
dat <- data.frame(
  x = rnorm(N, mean = 2.3),
  x2 = rpois(N, lambda = 2),
  x3 = runif(N)
)

dat$y0 <- rnorm(N) + dat$x
dat$y1 <- dat$y0 + 0.35

dat$z_bin <- sample(0:1, size = nrow(dat), replace = TRUE)
dat$y <- (dat$z_bin == 0)*dat$y0 + (dat$z_bin == 1)*dat$y1

# Binary 01 treatments with lm_lin and no intercept
lmlin_bin <- lm_lin(y ~ z_bin-1, covariates = ~ x, data = dat)

Produces:

           Estimate Std. Error   t value     Pr(>|t|)   CI Lower CI Upper DF
z_bin     2.5165919  0.2647700 9.5048223 1.797926e-11  1.9801169 3.053067 37
x_c       0.6103258  0.6999342 0.8719759 3.888449e-01 -0.8078756 2.028527 37
z_bin:x_c 0.5577459  0.7818820 0.7133377 4.801122e-01 -1.0264974 2.141989 37

This is difficult to interpret in terms of a treatment effect.

This PR changes behavior for binary 0/1 treatment to be the same as what you would see when treatment is multi-valued or otherwise treated as a factor. It also allows you to back out the Lin estimate of the ATE.

> devtools::install_github("mollyow/estimatr")
> library(estimatr)
> lm_lin(y ~ z_bin-1, covariates = ~ x, data = dat)
            Estimate Std. Error   t value     Pr(>|t|)  CI Lower  CI Upper DF
z_bin0     2.3921681  0.1136183 21.054425 7.934125e-22 2.1617395 2.6225967 36
z_bin1     2.5165919  0.2647700  9.504822 2.370080e-11 1.9796134 3.0535704 36
z_bin0:x_c 0.7151286  0.1249358  5.723967 1.625271e-06 0.4617470 0.9685102 36
z_bin1:x_c 1.1680717  0.3484703  3.351998 1.896537e-03 0.4613412 1.8748022 36

A few comments:

(Edit: resolved) If the new data has additional factor levels/treatment values, prediction will fail as the model matrix will not be the correct dimension. If there are new treatment levels AND old treatment levels are dropped, there could be some bad behavior, as names of treatment levels are not checked.
My understanding is that expected behavior for predict.lm_robust() without new data is failure, because the model object does not save the original model matrix. This PR does not modify that behavior, and so doesn't address Luke's question in predict and residuals have odd behavior #403.

mollyow · 2025-01-22T03:56:35Z

tests/testthat/test-lm-robust_margins.R

  data("alo_star_men")
-  lml <- lm_lin(GPA_year1 ~ ssp, ~  gpa0, data = alo_star_men, se_type = "classical")
+  # instruct margins to treat treatment as a factor
+  lml <- lm_lin(GPA_year1 ~ factor(ssp), ~  gpa0, data = alo_star_men, se_type = "classical")


My understanding is that margins needs to be instructed which variables are factors, to treat them accordingly. https://cran.r-project.org/web/packages/margins/vignettes/Introduction.html#Using_the_at_Argument

Otherwise this results in an error in prediction, which is stopped if treatment values in new data are not a subset of treatment values in the old data (the consequence of margin's perturbing variables)

mollyow · 2025-01-22T03:57:29Z

tests/testthat/test-lm-lin.R

  expect_equal(
    lmlo$term,
-    c("z", "X1_c", "z:X1_c")
+    c("z0", "z1", "z0:X1_c", "z1:X1_c")


The PR changes the expected behavior for binary treatment with no intercept.

mollyow · 2025-01-22T03:59:39Z

R/estimatr_lm_lin.R

+  # Store unique treatment values
+  if(attr(terms(model_data), "dataClasses")[attr(terms(model_data),"term.labels")[1]] == "factor"){
+    return_list[["treatment_levels"]] <- model_data$xlevels[[1]]
+  } else {
+    return_list[["treatment_levels"]] <- sort(unique(design_matrix[, design_mat_treatment]))
+  }


This is added so that when the model matrix is generated for predictions, we can ensure that the new data only includes a subset of treatment levels that were in the original model fit. Without being able to check this, weird behavior could result from predictions where the new data does not share identical treatment levels with the original data. This is saved in $xlevels in the model object if treatment is a factor, but if treatment is entered into the model as a numeric variable, this information is not otherwise saved.

mollyow · 2025-01-22T04:01:13Z

R/S3_predict.R


  X <- get_X(object, newdata, na.action)

-  # lm_lin scaling


all of lm_lin scaling is moved down to get_X()

mollyow · 2025-01-22T04:01:45Z

R/S3_predict.R

-    # Interacted with treatment
-    treat_name <- attr(object$terms, "term.labels")[1]
-    interacted_covars <- X[, treat_name] * demeaned_covars


This does not have the desired behavior when there are multiple treatment levels.

mollyow · 2025-01-22T04:18:10Z

R/estimatr_lm_lin.R


  # Check case where treatment is not factor and is not binary
-  if (any(!(treatment %in% c(0, 1)))) {
+  if (any(!(treatment %in% c(0, 1))) | (!has_intercept&ncol(treatment) ==1) ) {


This change and the subsequent one modify how lm_lin() fits without an intercept and with 0/1 treatment values.

mollyow · 2025-01-22T04:18:49Z

R/estimatr_lm_lin.R

-    # If no intercept, but treatment is only one column,
-    # need to add base terms for covariates
-    if (n_treat_cols == 1) {
-      X <- cbind(
-        treatment,
-        demeaned_covars,
-        interacted_covars
-      )


This special case is resolved

cran version 1.0.4

graemeblair and others added 3 commits March 29, 2024 19:30

version up

31e477a

generate factorial model matrix for lm_lin in predict

a1734ba

deal with intercepts

bedcc7e

mollyow changed the title ~~Prediction with lm_lin() in #415~~ Prediction with lm_lin() fixes #415 Jan 7, 2025

mollyow added 4 commits January 21, 2025 12:40

Ensure treatment levels in newdata are subset of those for lin model fit

4b53e99

save treatment levels for lin model prediction

05c9920

resolve issue with interaction ordering in prediction matrix

c1da381

tests for lm lin with no intercept and for prediction

59d612b

mollyow commented Jan 22, 2025

View reviewed changes

R/S3_predict.R

X <- get_X(object, newdata, na.action)

# lm_lin scaling

Copy link

Author

mollyow Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of lm_lin scaling is moved down to get_X()

mollyow commented Jan 22, 2025

View reviewed changes

mollyow and others added 8 commits January 22, 2025 16:38

adds lm_lin prediction examples

4c9b28f

Merge pull request DeclareDesign#417 from DeclareDesign/cran-patch-mar29

9e24ee8

cran version 1.0.4

cran prep

3d0266c

Merge branch 'main' into pr/416

8ee4040

news & version bump

d4bd29b

docs

b824e87

test fix

121cb5f

githb acion

3f14afb

graemeblair changed the base branch from main to lh-fixes February 28, 2025 17:19

graemeblair added 2 commits February 28, 2025 09:21

remove zzz broom test

8f229d6

Merge branch 'lh-fixes' into main

e31c6ea

graemeblair merged commit b5da215 into DeclareDesign:lh-fixes Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction with lm_lin() fixes #415 #416

Prediction with lm_lin() fixes #415 #416

Uh oh!

mollyow commented Jan 7, 2025 •

edited

Loading

Uh oh!

mollyow Jan 22, 2025 •

edited

Loading

Uh oh!

mollyow Jan 22, 2025

Uh oh!

mollyow Jan 22, 2025 •

edited

Loading

Uh oh!

mollyow Jan 22, 2025

Uh oh!

mollyow Jan 22, 2025

Uh oh!

mollyow Jan 22, 2025

Uh oh!

mollyow Jan 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Prediction with lm_lin() fixes #415 #416

Prediction with lm_lin() fixes #415 #416

Uh oh!

Conversation

mollyow commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mollyow Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mollyow Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

mollyow Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mollyow Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

mollyow Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

mollyow Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

mollyow Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mollyow commented Jan 7, 2025 •

edited

Loading

mollyow Jan 22, 2025 •

edited

Loading

mollyow Jan 22, 2025 •

edited

Loading