Skip to content

Feasture Request: fixest::i() operator to set reference levels and interact Categorical Variables #244

@s3alfisc

Description

@s3alfisc

Hi @matthewwardrop ,

A very common operation in econometric analyses is to interact a categorical variable with another variable (for example, in difference-in-differences regressions).

To very flexibly handle such interactions, the fixest R package has introduced a novel operator, the i() operator.

It allows to easily set reference levels for individual categorical variables and their interaction. On top, it provides sugar for binning levels of categorical variables.

Would you consider to add the i() operator to formulaic's transforms? The best starting point to learn about it are the fixest docs, but I have also attached some examples and comparisons to formulaic below.

import pandas as pd
import numpy as np
from formulaic import model_matrix
from formulaic.transforms import stateful_transform
from formulaic.transforms.contrasts import C, TreatmentContrasts


rng = np.random.default_rng(91)
f1 = rng.choice(["a", "b", "c"], 10)
f2 = rng.choice([1, 2, 3], 10)
y = rng.normal(0, 1, 10)

df = pd.DataFrame({"factor1":f1, "factor2":f2, "y": y})

Easily set the reference level for one categorical

library(fixest)
library(reticulate)
df = py$df

fit = feols(y~ i(factor1), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)" "factor1::b"  "factor1::c" 

fit = feols(y~ i(factor1, ref = "b"), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)" "factor1::a"  "factor1::c" 

This could be easily achieved by a stateful transform in formulaic:

@stateful_transform
def i(factor_var, ref=None, _state=None, _metadata=None, _spec=None):

    if "i" not in _state:
        _state["i"] = C(data = factor_var, contrasts = TreatmentContrasts(ref))

    return _state["i"]

model_matrix("i(factor1, ref = 'a')", data = df).head()


# Intercept	i(factor1, ref='a')[T.b]	i(factor1, ref='a')[T.c]
# 0	1.0	0	1
# 1	1.0	0	0
# 2	1.0	1	0
# 3	1.0	0	1
# 4	1.0	0	0

Interacting Variables

library(fixest)
library(reticulate)
df = py$df

fit = feols(y~ i(factor1, factor2), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> head()
#      (Intercept) factor1::a:factor2 factor1::b:factor2 factor1::c:factor2
# [1,]           1                  0                  0                  1
# [2,]           1                  3                  0                  0
# [3,]           1                  0                  3                  0
# [4,]           1                  0                  0                  1
# [5,]           1                  3                  0                  0
# [6,]           1                  0                  0                  3
y, X = model_matrix("y ~ C(factor1):factor2", data = df)
X.columns 
#Index(['Intercept', 'C(factor1)[a]:factor2', 'C(factor1)[b]:factor2',
#       'C(factor1)[c]:factor2'],
#      dtype='object')

Two variables, reference level used

fit = feols(y~ i(factor1, factor2, ref = "a"), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)"        "factor1::b:factor2" "factor1::c:factor2"
y, X = model_matrix("y ~ C(factor1, contr.treatment('a')):factor2", data = df)
X.columns
#Index(['Intercept', 
#      'C(factor1, contr.treatment('a'))[a]:factor2',
#       'C(factor1, contr.treatment('a'))[b]:factor2',
#       'C(factor1, contr.treatment('a'))[c]:factor2'],
#      dtype='object')

# so need to drop column 'C(factor1, contr.treatment('a'))[a]:factor2' by hand

Binning

# binning # group fe levels a & b into 'bin'
fit = feols(y~ i(factor1, factor2, bin = list(bin= c("a","b"))), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)"          "factor1::bin:factor2" "factor1::c:factor2"  

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions