Skip to content

How to include structural zeros? #152

@windisch

Description

@windisch

What's the preferred way to model structural zeros in a Formula?

Assume the following toy example: I have a $3\times 2$ contingency table that looks like this

e f
a 1 0
b 2 3
c 4 0

given as a pandas dataframe as follows:

df = pd.DataFrame(
    data={
        'F1': ['a', 'a', 'b', 'b', 'c', 'c'],
        'F2': ['e', 'f', 'e', 'f', 'e', 'f'],
        'n': [ 1, 0, 2, 3, 4, 0]
    })

The combinations $(a, f)$ and $(c,f)$ are structural zeros (i.e., it's impossible to have non-zero values in these cells). Now, assume I want to fit the model n ~ C(F1):C(F2) on that data as follows

y, X = Formula('n ~ C(F1):C(F2)').get_model_matrix(df, ensure_full_rank=False)

then the corresponding variables C(F1)[T.a]:C(F2)[T.f] and C(F1)[T.c]:C(F2)[T.f] are columns of X. Is there a way to remove these parameters already in the formula? Is there another concept in formulaic to deal with this type of constraints?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions