Skip to content

get_model_matrix for MULTISTAGE formula #264

@leostimpfle

Description

@leostimpfle

Hi @matthewwardrop, we're refactoring pyfixest's formula parser and are considering moving its instrumental variable functionality to formulaic's multi-stage syntax. I realise that it's still labelled experimental in the documentation, so the first question is whether you would advise in principle against relying on the syntax in production for now?

My second question is regarding the current implementation, specifically how the formula syntax is expected to interact with the construction of model matrices. For example, setting the appropriate FeatureFlags, the parsing of the multi-stage syntax works as expected. However, naively calling get_model_matrix on this formula, throws an error (because the second stage relies on the estimate of the first stage X2_hat). In #24, you wrote that

On a multipart formula like this one, calls to get_model_matrix will need to specify the part and stage for which the model matrix should be generated.

Unfortunately, I wasn't able to infer from the documentation or the code if this is still accurate (I think the quote dates from a few years before the feature was implemented).

I also had a quick look at linearmodels because it already uses formulaic's multi-stage syntax for IV estimation. However, my understanding after a preliminary glance at its codebase is that the multi-stage syntax is parsed internally into single-stage formulas before being passed to get_model_matrix (see here), so that the multi-stage syntax is currently not used.

import formulaic
from formulaic.parser import DefaultFormulaParser
import pyfixest as pf

data = pf.get_data(N=1_000, seed=0, model="Feols")
fml = "Y ~ X1 + [X2 ~ Z1]"

# parsing works as expected
formula = formulaic.Formula(
    fml,
    _parser=DefaultFormulaParser(feature_flags=DefaultFormulaParser.FeatureFlags.ALL),
)

# model matrix construction throws error:
# formulaic.errors.FactorEvaluationError: Unable to evaluate factor `X2_hat`. [NameError: `X2_hat` is not present in the dataset or evaluation context.]
formula(data=data)

(cc @s3alfisc)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions