-
Notifications
You must be signed in to change notification settings - Fork 32
Feasture Request: fixest::i() operator to set reference levels and interact Categorical Variables #244
Copy link
Copy link
Open
Description
Hi @matthewwardrop ,
A very common operation in econometric analyses is to interact a categorical variable with another variable (for example, in difference-in-differences regressions).
To very flexibly handle such interactions, the fixest R package has introduced a novel operator, the i() operator.
It allows to easily set reference levels for individual categorical variables and their interaction. On top, it provides sugar for binning levels of categorical variables.
Would you consider to add the i() operator to formulaic's transforms? The best starting point to learn about it are the fixest docs, but I have also attached some examples and comparisons to formulaic below.
import pandas as pd
import numpy as np
from formulaic import model_matrix
from formulaic.transforms import stateful_transform
from formulaic.transforms.contrasts import C, TreatmentContrasts
rng = np.random.default_rng(91)
f1 = rng.choice(["a", "b", "c"], 10)
f2 = rng.choice([1, 2, 3], 10)
y = rng.normal(0, 1, 10)
df = pd.DataFrame({"factor1":f1, "factor2":f2, "y": y})Easily set the reference level for one categorical
library(fixest)
library(reticulate)
df = py$df
fit = feols(y~ i(factor1), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)" "factor1::b" "factor1::c"
fit = feols(y~ i(factor1, ref = "b"), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)" "factor1::a" "factor1::c" This could be easily achieved by a stateful transform in formulaic:
@stateful_transform
def i(factor_var, ref=None, _state=None, _metadata=None, _spec=None):
if "i" not in _state:
_state["i"] = C(data = factor_var, contrasts = TreatmentContrasts(ref))
return _state["i"]
model_matrix("i(factor1, ref = 'a')", data = df).head()
# Intercept i(factor1, ref='a')[T.b] i(factor1, ref='a')[T.c]
# 0 1.0 0 1
# 1 1.0 0 0
# 2 1.0 1 0
# 3 1.0 0 1
# 4 1.0 0 0Interacting Variables
library(fixest)
library(reticulate)
df = py$df
fit = feols(y~ i(factor1, factor2), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> head()
# (Intercept) factor1::a:factor2 factor1::b:factor2 factor1::c:factor2
# [1,] 1 0 0 1
# [2,] 1 3 0 0
# [3,] 1 0 3 0
# [4,] 1 0 0 1
# [5,] 1 3 0 0
# [6,] 1 0 0 3
y, X = model_matrix("y ~ C(factor1):factor2", data = df)
X.columns
#Index(['Intercept', 'C(factor1)[a]:factor2', 'C(factor1)[b]:factor2',
# 'C(factor1)[c]:factor2'],
# dtype='object')Two variables, reference level used
fit = feols(y~ i(factor1, factor2, ref = "a"), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)" "factor1::b:factor2" "factor1::c:factor2"
y, X = model_matrix("y ~ C(factor1, contr.treatment('a')):factor2", data = df)
X.columns
#Index(['Intercept',
# 'C(factor1, contr.treatment('a'))[a]:factor2',
# 'C(factor1, contr.treatment('a'))[b]:factor2',
# 'C(factor1, contr.treatment('a'))[c]:factor2'],
# dtype='object')
# so need to drop column 'C(factor1, contr.treatment('a'))[a]:factor2' by handBinning
# binning # group fe levels a & b into 'bin'
fit = feols(y~ i(factor1, factor2, bin = list(bin= c("a","b"))), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)" "factor1::bin:factor2" "factor1::c:factor2"
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels