Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
e6a7587
Bypass FormulaParser
leostimpfle Dec 28, 2025
f94a814
Reverse order to match hard-coded targets
leostimpfle Dec 28, 2025
3118d18
Fix pre-commit
leostimpfle Dec 28, 2025
d0a8821
Freeze _MultipleEstimation
leostimpfle Dec 28, 2025
100c357
Sort independents by default for tests against fixest
leostimpfle Dec 28, 2025
f4b2ea0
Encode no fixed effects as None instead of '0'
leostimpfle Dec 28, 2025
2e93cbe
Fix if fixed effects are None
leostimpfle Dec 28, 2025
bf82eb6
Fix encoding for multiple estimation of fixed effects
leostimpfle Dec 28, 2025
a928a6b
Replace typing.Optional with union type
leostimpfle Dec 29, 2025
ce13140
Close #1117
leostimpfle Dec 29, 2025
c4d750a
Reorder checks to comply with test failurs
leostimpfle Dec 29, 2025
8e8e5fe
Add new model matrix functionality
leostimpfle Dec 30, 2025
f75da04
Add singleton warning
leostimpfle Dec 30, 2025
761ea08
Various fixes (did2s and i()-syntax still failing)
leostimpfle Dec 30, 2025
d79f4e9
Fix pre-commit
leostimpfle Dec 30, 2025
c24f969
Retain nulls in fixed effect encoding
leostimpfle Dec 31, 2025
972eb66
Refactor fixest::i, closes #782, fixes #921, fixes #1109
leostimpfle Dec 31, 2025
415f5bc
Fix pre-commit
leostimpfle Dec 31, 2025
e23e7b2
Deal with log-related infinities
leostimpfle Jan 1, 2026
9219a81
Drop intercept after matrix construction for fixed effects
leostimpfle Jan 1, 2026
f3b7e67
Monkey patch formulaic
leostimpfle Jan 1, 2026
986d21d
Encode fixed effects only when non-numeric
leostimpfle Jan 1, 2026
be7aa93
Fix inference of reduced_rank
leostimpfle Jan 1, 2026
0e0402d
Use to_numpy
leostimpfle Jan 1, 2026
0e7facf
fix binning to keep values not specified in binning as is instead of NaN
s3alfisc Jan 1, 2026
31714ea
adjust tests for i-interaction
s3alfisc Jan 1, 2026
9aef7c6
Drop first level in factor-factor interaction
leostimpfle Jan 1, 2026
ab5695a
explain in docstrings why no fixed effects in formula first and secon…
s3alfisc Jan 1, 2026
b8ab4c5
Merge branch 'formula' of https://github.com/py-econometrics/pyfixest…
leostimpfle Jan 1, 2026
51142b0
Rewrite ModelMatrix
leostimpfle Jan 2, 2026
8a1ec74
Add documentation for new ModelMatrix, fix MyPy
leostimpfle Jan 2, 2026
e601780
Fix fixed effect encoding
leostimpfle Jan 2, 2026
d6d9943
fix circular import
s3alfisc Jan 2, 2026
6cde256
Update saturated with new i synatx
leostimpfle Jan 2, 2026
f8e575b
Fix pre-commit
leostimpfle Jan 2, 2026
1a736db
drop use of model_matrix_fixest in did2s & run tests against cached v…
s3alfisc Jan 2, 2026
f97e286
all did2s tests marked as pytest.against_r_core
s3alfisc Jan 2, 2026
2f878e9
add new formula functions to docs
s3alfisc Jan 2, 2026
318248f
add deprecation warning for model_matrix_fixest - remove it in future…
s3alfisc Jan 2, 2026
2e70f66
Fix linting and small clean-ups
leostimpfle Jan 2, 2026
af3f36a
deprecation warning for FormulaParser
s3alfisc Jan 2, 2026
9543547
move QuantregMulti from FormulaParser to parse()
s3alfisc Jan 2, 2026
5fa7e00
add unit tests for parser, similar to what exists for the legacy Fixe…
s3alfisc Jan 2, 2026
7b62030
add unit tests for parser, similar to what exists for the legacy Fixe…
s3alfisc Jan 2, 2026
148ba50
pacify mypy
s3alfisc Jan 2, 2026
5a69e10
Clean factor_interaction, add tests with null values
leostimpfle Jan 4, 2026
ec30160
Fix pre-commit
leostimpfle Jan 4, 2026
3395ce5
Improve docs and function/attribute names
leostimpfle Jan 4, 2026
5661b85
Merge branch 'master' into formula
leostimpfle Jan 4, 2026
e67810e
fix incorrect test expectation with IV and fixed effects
s3alfisc Jan 4, 2026
0b4de2d
fix incorrect ordering of fixed effect and IV part of formula
s3alfisc Jan 4, 2026
7065321
test for expected behavior of 0 fixed effects in formula syntax
s3alfisc Jan 4, 2026
aa093f6
clarification on overlap between independent, endogenous, instruments
s3alfisc Jan 4, 2026
292b496
clarifications on overlap of dependent, endogenous, instruments
s3alfisc Jan 4, 2026
a520f06
fix silent pass through of incorrect syntax of Y ~ X | f1 | f2 by cat…
s3alfisc Jan 4, 2026
4ce3c29
only one tilde in part 2 permitted (same motif as before)
s3alfisc Jan 4, 2026
532049b
is_multiple only checks dependent, independent, fixed effects for mul…
s3alfisc Jan 4, 2026
3704dd9
consolidate multiple estimation flag setting & checks
s3alfisc Jan 4, 2026
1ee80af
add examples to specifications
s3alfisc Jan 4, 2026
c21b0e9
Fix pre-commit
leostimpfle Jan 5, 2026
65da109
Remove sort
leostimpfle Jan 5, 2026
647ad27
Remove FORMULAIC_FEATURE_FLAG
leostimpfle Jan 5, 2026
5731196
Fix #1137
leostimpfle Jan 12, 2026
c501470
Simplify formula parsing (#1157)
leostimpfle Feb 2, 2026
1499dfa
Trigger CI
leostimpfle Feb 2, 2026
07474b2
Merge master
leostimpfle Feb 2, 2026
b5d2110
Fix pre-commit
leostimpfle Feb 2, 2026
335dda0
Fix fixed-effect encoding
leostimpfle Feb 2, 2026
efebc3a
Fix pre-commit
leostimpfle Feb 2, 2026
03c9df3
Remove obsolete reference in docs/_quarto.yml
leostimpfle Feb 2, 2026
17c0c0b
Fix encoding of fixed effects (missing -1)
leostimpfle Feb 2, 2026
aefc961
Fix ordering of endogenous and exogenous variables
leostimpfle Feb 2, 2026
bb9c559
Compare based on coefficient names rather than position
leostimpfle Feb 2, 2026
6e06c51
Fix did2s formula
leostimpfle Feb 2, 2026
e4cc517
Remove whitespace in fixest-style formula
leostimpfle Feb 2, 2026
d5bfaba
Retain input formatting of formula
leostimpfle Feb 2, 2026
da2f41f
Replace na_index_str with frozenset[int]
leostimpfle Feb 2, 2026
f25b4e8
Merge branch 'master' into formula
s3alfisc Feb 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,5 @@ coverage.xml
# pixi environments
.pixi/*
!.pixi/config.toml
SKILL.md
CLAUDE.md
7 changes: 7 additions & 0 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,13 @@ quartodoc:
- report.coefplot
- report.iplot
- did.visualize.panelview
- title: Formula Parsing & Model Matrix
desc: |
Internal APIs for formula parsing and model matrix construction
contents:
- estimation.formula.parse.Formula
- estimation.formula.model_matrix.ModelMatrix
- estimation.formula.factor_interaction.factor_interaction
- title: Misc / Utilities
desc: |
PyFixest internals and utilities
Expand Down
6 changes: 6 additions & 0 deletions docs/_sidebar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,12 @@ website:
- reference/report.iplot.qmd
- reference/did.visualize.panelview.qmd
section: Summarize and Visualize
- contents:
- reference/estimation.formula.parse.Formula.qmd
- reference/estimation.formula.parse.parse.qmd
- reference/estimation.formula.model_matrix.ModelMatrix.qmd
- reference/estimation.formula.factor_interaction.factor_interaction.qmd
section: Formula Parsing & Model Matrix
- contents:
- reference/estimation.demean.qmd
- reference/estimation.detect_singletons.qmd
Expand Down
2 changes: 1 addition & 1 deletion docs/quickstart.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -507,7 +507,7 @@ multi_fit.etable()
You can access an individual model by its name - i.e. a formula - via the `all_fitted_models` attribute.

```{python}
multi_fit.all_fitted_models["Y~X1"].tidy()
multi_fit.all_fitted_models["Y ~ X1"].tidy()
```

or equivalently via the `fetch_model` method:
Expand Down
52 changes: 30 additions & 22 deletions pyfixest/did/did2s.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
from pyfixest.did.did import DID
from pyfixest.estimation import feols
from pyfixest.estimation.feols_ import Feols
from pyfixest.estimation.FormulaParser import FixestFormulaParser
from pyfixest.estimation.model_matrix_fixest_ import model_matrix_fixest
from pyfixest.estimation.formula import model_matrix
from pyfixest.estimation.formula.parse import Formula


class DID2S(DID):
Expand Down Expand Up @@ -304,37 +304,48 @@ def _did2s_vcov(

# some formula parsing to get the correct formula for the first and second stage model matrix
first_stage_x, first_stage_fe = first_stage.split("|")
first_stage_fe_list = [f"C({i})" for i in first_stage_fe.split("+")]
first_stage_fe_list = [f"C({i.strip()})" for i in first_stage_fe.split("+")]
first_stage_fe_fml = "+".join(first_stage_fe_list)
first_stage = f"{first_stage_x}+{first_stage_fe_fml}"

second_stage = f"{second_stage}"
first_stage_fml = f"{first_stage_x}+{first_stage_fe_fml}"

# note for future Alex: intercept needs to be dropped! it is not as fixed
# effects are converted to dummies, hence has_fixed checks are False

FML1 = FixestFormulaParser(f"{yname} {first_stage}")
FML2 = FixestFormulaParser(f"{yname} {second_stage}")
FixestFormulaDict1 = FML1.FixestFormulaDict
FixestFormulaDict2 = FML2.FixestFormulaDict
# Create Formula objects for the new model_matrix system.
# First stage: use `- 1` so that C() dummy encoding keeps all levels,
# matching the feols demeaning approach (which implicitly includes all
# fixed-effect levels). Removing `- 1` would cause formulaic to drop
# reference levels, changing the GMM vcov standard errors.
FML1 = Formula(
_second_stage=f"{yname} ~ {first_stage_fml.replace('~', '').strip()} - 1",
)
# Second stage: do NOT use `- 1`. Formulaic needs the intercept present
# for full-rank encoding (dropping a reference level for factors like
# i(treat)). The intercept column is then removed by drop_intercept=True
# below, matching what feols does in _did2s_estimate.
FML2 = Formula(
_second_stage=f"{yname} ~ {second_stage.replace('~', '').strip()}",
)

mm_dict_first_stage = model_matrix_fixest(
FixestFormula=next(iter(FixestFormulaDict1.values()))[0],
mm_first_stage = model_matrix.create_model_matrix(
formula=FML1,
data=data,
weights=None,
drop_singletons=False,
drop_intercept=False,
ensure_full_rank=True,
drop_intercept=True,
)
X1 = cast(pd.DataFrame, mm_dict_first_stage.get("X"))
X1 = mm_first_stage.independent

mm_second_stage = model_matrix_fixest(
FixestFormula=next(iter(FixestFormulaDict2.values()))[0],
mm_second_stage = model_matrix.create_model_matrix(
formula=FML2,
data=data,
weights=None,
drop_singletons=False,
ensure_full_rank=True,
drop_intercept=True,
) # reference values not dropped, multicollinearity error
X2 = cast(pd.DataFrame, mm_second_stage.get("X"))
)
X2 = mm_second_stage.independent

X1 = csr_matrix(X1.to_numpy() * weights_array[:, None])
X2 = csr_matrix(X2.to_numpy() * weights_array[:, None])
Expand All @@ -359,10 +370,7 @@ def _did2s_vcov(
X10 = X10.tocsr()
X2 = X2.tocsr() # type: ignore

for (
_,
g,
) in enumerate(clustid):
for _, g in enumerate(clustid):
idx_g: np.ndarray = cluster_col.values == g
X10g = X10[idx_g, :]
X2g = X2[idx_g, :]
Expand Down
42 changes: 16 additions & 26 deletions pyfixest/did/saturated_twfe.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,15 +203,14 @@ def aggregate(
treated_periods = list(period_set)

df_agg = pd.DataFrame(
index=treated_periods,
index=pd.Index(treated_periods, name="period"),
columns=["Estimate", "Std. Error", "t value", "Pr(>|t|)", "2.5%", "97.5%"],
)
df_agg.index.name = "period"

for period in treated_periods:
R = np.zeros(len(coefs))
for cohort in cohort_list:
cohort_pattern = rf"\[{re.escape(str(period))}\]:.*{re.escape(cohort)}$"
cohort_pattern = rf"^(?:.+)::{period}:(?:.+)::{cohort}$"
match_idx = [
i
for i, name in enumerate(coefnames)
Expand Down Expand Up @@ -319,28 +318,20 @@ def _saturated_event_study(
unit_id: str,
cluster: Optional[str] = None,
):
cohort_dummies = pd.get_dummies(
df.first_treated_period, drop_first=True, prefix="cohort_dummy"
ff = f"{outcome} ~ i(rel_time, first_treated_period, ref = -1.0, ref2=0.0) | {unit_id} + {time_id}"
m = feols(fml=ff, data=df, vcov={"CRV1": cluster}) # type: ignore
res = m.tidy().reset_index()
res = res.join(
res["Coefficient"].str.extract(
r".+::(?P<time>.+):.+::(?P<cohort>.+)", expand=True
)
)
df_int = pd.concat([df, cohort_dummies], axis=1)

ff = f"""
{outcome} ~
{"+".join([f"i(rel_time, {x}, ref = -1.0)" for x in cohort_dummies.columns.tolist()])}
| {unit_id} + {time_id}
"""
m = feols(fml=ff, data=df_int, vcov={"CRV1": cluster}) # type: ignore
res = m.tidy()
res["time"] = res["time"].astype(float)
# create a dict with cohort specific effect curves
res_cohort_eventtime_dict: dict[str, dict[str, pd.DataFrame | np.ndarray]] = {}
for cohort in cohort_dummies.columns:
res_cohort = res.filter(like=cohort, axis=0)
event_time = (
res_cohort.index.str.extract(r"\[(?:T\.)?(-?\d+(?:\.\d+)?)\]")
.astype(float)
.values.flatten()
)
res_cohort_eventtime_dict[cohort] = {"est": res_cohort, "time": event_time}
for cohort, res_cohort in res.groupby("cohort"):
event_time = res_cohort["time"].to_numpy()
res_cohort_eventtime_dict[str(cohort)] = {"est": res_cohort, "time": event_time}

return m, res_cohort_eventtime_dict

Expand All @@ -366,11 +357,10 @@ def _test_treatment_heterogeneity(
"""
mmres = model.tidy().reset_index()
P = mmres.shape[0]
mmres[["time", "cohort"]] = mmres.Coefficient.str.split(":", expand=True)
mmres["time"] = mmres.time.str.extract(r"\[(?:T\.)?(-?\d+(?:\.\d+)?)\]").astype(
float
mmres[["time", "cohort"]] = mmres["Coefficient"].str.extract(
r".+::(?P<time>.+):.+::(?P<cohort>.+)", expand=True
)
mmres["cohort"] = mmres.cohort.str.extract(r"(\d+)")
mmres["time"] = mmres["time"].astype(float)
# indices of coefficients that are deviations from common event study coefs
event_study_coefs = mmres.loc[~(mmres.cohort.isna()) & (mmres.time > 0)].index
# Method 2 (K x P) - more efficient
Expand Down
5 changes: 5 additions & 0 deletions pyfixest/errors/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ class EmptyVcovError(Exception): # noqa: D101
pass


class FormulaSyntaxError(Exception): # noqa: D101
pass


__all__ = [
"CovariateInteractionError",
"DepvarIsNotNumericError",
Expand All @@ -67,6 +71,7 @@ class EmptyVcovError(Exception): # noqa: D101
"EndogVarsAsCovarsError",
"FeatureDeprecationError",
"FixedEffectInteractionError",
"FormulaSyntaxError",
"InstrumentsAsCovarsError",
"MatrixNotFullRankError",
"NanInClusterVarError",
Expand Down
22 changes: 12 additions & 10 deletions pyfixest/estimation/FixestMulti_.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from pyfixest.estimation.feols_compressed_ import FeolsCompressed
from pyfixest.estimation.fepois_ import Fepois
from pyfixest.estimation.feprobit_ import Feprobit
from pyfixest.estimation.FormulaParser import FixestFormulaParser
from pyfixest.estimation.formula.parse import Formula
from pyfixest.estimation.literals import (
DemeanerBackendOptions,
QuantregMethodOptions,
Expand Down Expand Up @@ -214,7 +214,6 @@ def _prepare_estimation(
self._ssc_dict: dict[str, Union[str, bool]] = {}
self._drop_singletons = False
self._is_multiple_estimation = False
self._drop_intercept = False
self._weights = weights
self._has_weights = False
if weights is not None:
Expand All @@ -225,16 +224,19 @@ def _prepare_estimation(
self._quantile_tol = quantile_tol
self._quantile_maxiter = quantile_maxiter

FML = FixestFormulaParser(fml)
FML.set_fixest_multi_flag()
formula_dictionary = Formula.parse_to_dict(fml)
self._is_multiple_estimation = (
FML._is_multiple_estimation
sum(len(v) for v in formula_dictionary.values()) > 1
or self._run_split
or (isinstance(quantile, list) and len(quantile) > 1)
)
self.FixestFormulaDict = FML.FixestFormulaDict
self.FixestFormulaDict = formula_dictionary
self._method = estimation
self._is_iv = FML.is_iv
self._is_iv = any(
formula.first_stage is not None
for _, formulas in formula_dictionary.items()
for formula in formulas
)
# self._fml_dict = fxst_fml.condensed_fml_dict
# self._fml_dict_iv = fxst_fml.condensed_fml_dict_iv
self._ssc_dict = ssc if ssc is not None else {}
Expand Down Expand Up @@ -299,9 +301,9 @@ def _estimate_all_models(
for _, fval in enumerate(_fixef_keys):
fixef_key_models = FixestFormulaDict.get(fval)

# dictionary to cache demeaned data with index: na_index_str,
# dictionary to cache demeaned data keyed by na_index,
# only relevant for `.feols()`
lookup_demeaned_data: dict[str, pd.DataFrame] = {}
lookup_demeaned_data: dict[frozenset[int], pd.DataFrame] = {}

for FixestFormula in fixef_key_models: # type: ignore
# loop over both dictfe and dictfe_iv (if the latter is not None)
Expand Down Expand Up @@ -430,7 +432,7 @@ def _estimate_all_models(
# if X is empty: no inference (empty X only as shorthand for demeaning)
if not FIT._X_is_empty:
# inference
vcov_type = _get_vcov_type(vcov, fval)
vcov_type = _get_vcov_type(vcov)
FIT.vcov(
vcov=vcov_type,
vcov_kwargs=vcov_kwargs,
Expand Down
9 changes: 9 additions & 0 deletions pyfixest/estimation/FormulaParser.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import re
import warnings
from itertools import product
from typing import Optional, Union

Expand Down Expand Up @@ -41,6 +42,14 @@ def __init__(self, fml: str):
None

"""
warnings.warn(
"FixestFormulaParser is deprecated and will be removed in a future version. "
"Use `pyfixest.estimation.formula.parse.parse()` instead. "
"See https://py-econometrics.github.io/pyfixest/reference/estimation.formula.parse.parse.html",
FutureWarning,
stacklevel=2,
)

depvars, covars, fevars, endogvars, instruments = _deparse_fml(fml)

# Parse all individual formula components that allow for
Expand Down
16 changes: 8 additions & 8 deletions pyfixest/estimation/demean_.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ def demean_model(
X: pd.DataFrame,
fe: Optional[pd.DataFrame],
weights: Optional[np.ndarray],
lookup_demeaned_data: dict[str, Any],
na_index_str: str,
lookup_demeaned_data: dict[frozenset[int], Any],
na_index: frozenset[int],
fixef_tol: float,
fixef_maxiter: int,
demean_func: Callable,
Expand Down Expand Up @@ -42,9 +42,9 @@ def demean_model(
A dictionary with keys for each fixed effects combination and potentially
values of demeaned data frames. The function checks this dictionary to
see if some of the variables have already been demeaned.
na_index_str : str
A string with indices of dropped columns. Used for caching of demeaned
variables.
na_index : frozenset[int]
A frozenset of indices of dropped rows. Used as a hashable cache key
for demeaned variables.
fixef_tol: float
The tolerance for the demeaning algorithm.
fixef_maxiter: int
Expand Down Expand Up @@ -79,9 +79,9 @@ def demean_model(
if fe is not None:
fe_array = fe.to_numpy()
# check if looked dict has data for na_index
if lookup_demeaned_data.get(na_index_str) is not None:
if lookup_demeaned_data.get(na_index) is not None:
# get data out of lookup table: list of [algo, data]
value = lookup_demeaned_data.get(na_index_str)
value = lookup_demeaned_data.get(na_index)
if value is not None:
try:
_, YX_demeaned_old = value
Expand Down Expand Up @@ -146,7 +146,7 @@ def demean_model(
YX_demeaned = pd.DataFrame(YX_demeaned)
YX_demeaned.columns = yx_names

lookup_demeaned_data[na_index_str] = [None, YX_demeaned]
lookup_demeaned_data[na_index] = [None, YX_demeaned]

else:
# nothing to demean here
Expand Down
4 changes: 2 additions & 2 deletions pyfixest/estimation/fegaussian_.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import pandas as pd

from pyfixest.estimation.feglm_ import Feglm
from pyfixest.estimation.FormulaParser import FixestFormula
from pyfixest.estimation.formula.parse import Formula as FixestFormula
from pyfixest.estimation.literals import DemeanerBackendOptions


Expand All @@ -24,7 +24,7 @@ def __init__(
collin_tol: float,
fixef_tol: float,
fixef_maxiter: int,
lookup_demeaned_data: dict[str, pd.DataFrame],
lookup_demeaned_data: dict[frozenset[int], pd.DataFrame],
tol: float,
maxiter: int,
solver: Literal[
Expand Down
4 changes: 2 additions & 2 deletions pyfixest/estimation/feglm_.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
_drop_multicollinear_variables,
)
from pyfixest.estimation.fepois_ import _check_for_separation
from pyfixest.estimation.FormulaParser import FixestFormula
from pyfixest.estimation.formula.parse import Formula as FixestFormula
from pyfixest.estimation.literals import DemeanerBackendOptions
from pyfixest.estimation.solvers import solve_ols
from pyfixest.utils.dev_utils import DataFrameType
Expand All @@ -37,7 +37,7 @@ def __init__(
collin_tol: float,
fixef_tol: float,
fixef_maxiter: int,
lookup_demeaned_data: dict[str, pd.DataFrame],
lookup_demeaned_data: dict[frozenset[int], pd.DataFrame],
tol: float,
maxiter: int,
solver: Literal[
Expand Down
Loading
Loading