From 0cfea5c193ba70148a8f40fb616cbad62e70ca18 Mon Sep 17 00:00:00 2001
From: mdizadi <mdizadi@gmail.com>
Date: Fri, 6 Mar 2026 16:15:26 +0100
Subject: [PATCH 1/7] Modifying the doc file for IV.

---
 docs/tutorials/instrumental-variables.qmd | 375 ++++++++++++++++------
 pyfixest/utils/dgps.py                    | 191 +++++++++++
 2 files changed, 470 insertions(+), 96 deletions(-)

diff --git a/docs/tutorials/instrumental-variables.qmd b/docs/tutorials/instrumental-variables.qmd
index a43eb4d57..599c2c424 100644
--- a/docs/tutorials/instrumental-variables.qmd
+++ b/docs/tutorials/instrumental-variables.qmd
@@ -8,6 +8,11 @@ format:
 toc: true
 toc-title: "On this page"
 toc-location: left
+execute:
+  freeze: false
+bibliography: references.bib
+nocite: |
+  @wooldridge2010, @neal2020, @lundborg2017, @borusyak2022
 ---
 
 ::: {.callout-note}
@@ -17,20 +22,48 @@ You should have read the [Getting Started](../getting-started.qmd) page and have
 
 ## Introduction
 
-Ordinary Least Squares (OLS) delivers biased estimates when a regressor is correlated with the error term --- due to omitted variables, simultaneity, or measurement error. **Instrumental Variables (IV)** estimation solves this by finding a variable $Z$ that:
+Estimation of a linear model via Ordinary Least Squares (OLS) yields biased and inconsistent estimates when a regressor is correlated with the error term — a problem known as endogeneity, which arises, for example, in the presence of unobserved confounders. **Instrumental Variable (IV)** estimation addresses this by finding a variable $Z$ that satisfies three conditions:
+
+1. **Relevance**: $Z$ has a causal effect on $X$. 
+2. **Exclusion Ristriction**: $Z$ causal effect on $Y$ is fully mediated by $X$. 
+3. **Instrumental Unconfoundedness**: $Z$ has no unobserved common causes with $Y$.
+
+```{dot}
+//| fig-align: center
+//| fig-cap: "IV DAG: Z is an instrumental variable, U is an unobserved confounder of the effect of X on Y. The dashed node and arrows indicate that U is unobserved. The direct arrow from Y to X shows the relevance. There is no direct arrow from Z to Y, encoding the exclusion restriction. No direct dashed arrow from the unobserved variable to Z shows the instrumental unconfoundedness."
+digraph IV {
+    layout=neato;
+    node [shape=circle, fontname="serif", fixedsize=true, width=0.4, fontsize=13];
+    edge [arrowsize=0.6];
+
+    Z [label="Z", pos="-1.0,1.0!"];
+    U [label="U", style=dashed, pos="1.0,1.4!"];
+    X [label="X", pos="0,0!"];
+    Y [label="Y", pos="2,0!"];
+
+    Z -> X;
+    X -> Y;
+    U -> X [style=dashed];
+    U -> Y [style=dashed];
+}
+```
+
+`pyfixest` estimates the IV using two-stage least squares (2SLS) estimator where it first projects $X$ onto $Z$ (and all other exogenous variables) to obtain $\hat{X}$, then uses $\hat{X}$ to estimate the Average Treatment Effect (ATE) of $X$ on $Y$. The intuition is that by using $\hat{X}$ the dashed path from the unobserved varialbe is blocked or removed.
+
+When panel data are available, selection bias may also stem from time-invariant  unobserved heterogeneity — unit-specific characteristics (e.g., ability, culture, geography) that are fixed over time but correlated with both treatment and outcome.`pyfixest` addresses this simultaneously by applying a within-transformation (demeaning) to absorb unit fixed effects before running 2SLS, following the FE-IV approach described in @wooldridge2010 [Ch. 11]. Crucially, after demeaning, the 
+instrument must retain within-unit variation over time — time-invariant instruments are eliminated along with the fixed effects and cannot be used for identification. When both fixed effects and an instrument are specified, `pyfixest` therefore isolates 
+the clean variation in treatment that is both within-unit and driven by the instrument, blocking confounding from time-invariant unobservables and time-varying endogenous confounders simultaneously. 
 
-1. **Relevance**: $Z$ is correlated with the endogenous regressor $X$.
-2. **Exclusion**: $Z$ affects the outcome $Y$ *only through* $X$.
 
-The two-stage least squares (2SLS) estimator first regresses $X$ on $Z$ (and controls), then uses the predicted $\hat{X}$ in the outcome equation. In `pyfixest`, the IV syntax is:
+In `pyfixest`, the IV syntax is:
 
 ```
 Y ~ exogenous_controls | fixed_effects | endogenous ~ instrument
 ```
 
-This tutorial walks through three applications --- each highlighting a different reason researchers reach for IV.
+This tutorial walks through three applications, all addressing endogeneity from **selection bias — a form of Omitted Variable Bias (OVB) where unobserved confounders drive both selection into treatment and the outcome.** Applications 1 and 2 do so in observational settings — from individual-level selection and regional-level sorting respectively — while Application 3 arises in an experimental setting with imperfect compliance, where self-selection into treatment remains a problem despite randomized encouragement.
 
-```python
+```{python}
 import pyfixest as pf
 ```
 
@@ -39,22 +72,26 @@ import pyfixest as pf
 
 **Does having children reduce women's earnings?**
 
-A naive regression of earnings on fertility is biased: women with stronger career ambitions may both earn more *and* have fewer children. Since career ambition is positively correlated with earnings but negatively correlated with fertility, OLS *overstates* the motherhood penalty. Lundborg, Plug & Rasmussen (2017) exploit the quasi-random success of IVF treatment as an instrument for fertility among women who sought treatment.
+A naive regression of earnings on fertility is biased: women with stronger career ambitions may both earn more and have fewer children. Since career ambition is positively correlated with earnings but negatively correlated with fertility, OLS overstates the motherhood penalty. This is a classic case of OVB: career ambition is unobserved, yet it drives both fertility decisions and earnings outcomes. @lundborg2017 exploit the quasi-random success of IVF treatment as an instrument for fertility among women who sought treatment. 
+
+The relevance assumption holds because IVF treatment success creates exogenous variation in the likelihood of having children. The exclusion restriction is supported by the fact that observed working histories of successfully and unsuccessfully treated women are virtually identical before they seek IVF treatment — meaning IVF success is essentially random with respect to labor market potential.
+
+Instrumental unconfoundedness is plausible in this setting, as IVF success is largely determined by biological factors outside a woman's control, making it difficult to conceive of an unobserved variable that jointly drives both the success of treatment and labor market outcomes.
 
 ### Synthetic Data
 
-```python
+```{python}
 ivf_df = pf.get_ivf_data()
 ivf_df.head()
 ```
 
-
+The dataset has $N = 2{,}000$ observations. The **true causal effect** of `num_children` on `earnings` is $\beta = -0.15$ — this is what IV should recover (full DGP in the [Appendix](#appendix-dgp-design-notes)).
 
 ### Naive OLS
 
 Without accounting for endogeneity, OLS overstates the penalty because career ambition is an omitted variable that increases earnings while reducing fertility:
 
-```python
+```{python}
 fit_ols = pf.feols("earnings ~ num_children", data=ivf_df)
 fit_ols.summary()
 ```
@@ -65,9 +102,9 @@ fit_ols.summary()
 
 ### IV Estimation
 
-Using `ivf_success` as an instrument for `num_children`:
+A possible IV would be using `ivf_success` as an instrument for `num_children`:
 
-```python
+```{python}
 fit_iv = pf.feols("earnings ~ 1 | num_children ~ ivf_success", data=ivf_df)
 fit_iv.summary()
 ```
@@ -78,7 +115,7 @@ The IV estimate is closer to the true effect of -0.15 --- less negative than the
 
 ### Compare OLS and IV
 
-```python
+```{python}
 pf.etable(
     [fit_ols, fit_iv],
     labels={"earnings": "Earnings", "num_children": "Number of Children"},
@@ -90,22 +127,93 @@ pf.etable(
 
 ### First-Stage Diagnostics
 
-A strong first stage is essential for IV to work. We check the first-stage F-statistic and run `IV_Diag()`:
+The only IV diagnostic currently available in pyfixest is the IV weakness (relevance) test, which provides two statistics: f_stat and effective_f. A strong first stage is essential for IV to work. We check the first-stage F-statistic where critical values change depending on the error structure (IID vs. robust) and effective_f (Olea & Pflueger (2013) robust version) and run `IV_Diag()`:
 
-```python
+```{python}
 fit_iv.first_stage()
 ```
 
 
-```python
+```{python}
 fit_iv.IV_Diag()
 ```
 
 
-The first-stage F-statistic should be well above 10, confirming that IVF success is a strong predictor of fertility.
+The first-stage F-statistic should be well above 10 (under IID errors) or effective_f above 10 regardeless of error structure, confirming that IVF success is a strong predictor of fertility.
+
+
+## Application 2: Shift-Share (Bartik) Instruments
+
+**Does immigration affect local wages?**
+
+A long-standing question in labor economics. The challenge: regions that attract immigrants may also have booming labor markets, biasing OLS upward. The **shift-share** (Bartik) instrument, formalized by @borusyak2022, constructs predicted local immigration from:
+
+$$
+B_r = \sum_{k=1}^{K} s_{rk} \cdot g_k
+$$
+
+where $s_{rk}$ is region $r$'s historical share of immigrants from origin $k$, and $g_k$ is the national inflow from origin $k$. Because the instrument is constructed from *national* shocks interacted with *historical* shares, it is plausibly exogenous to current local labor demand.
+
+### Synthetic Data
+
+```{python}
+bartik_df = pf.get_bartik_data()
+bartik_df.head()
+```
+
+The dataset has $N = 300$ regions. The **true causal effect** of `immigration` on `wages` is $\beta = -0.3$ — this is what IV should recover (full DGP in the [Appendix](#appendix-dgp-design-notes)).
+
+### OLS vs IV
+
+```{python}
+# OLS: biased because local demand drives both immigration and wages
+fit_ols_b = pf.feols("wages ~ immigration + log_population", data=bartik_df)
+
+# IV: using the Bartik instrument
+fit_iv_b = pf.feols(
+    "wages ~ log_population | immigration ~ bartik_instrument",
+    data=bartik_df,
+)
+```
+
+
+```{python}
+pf.etable(
+    [fit_ols_b, fit_iv_b],
+    labels={
+        "wages": "Wages",
+        "immigration": "Immigration",
+        "log_population": "Log Population",
+    },
+    caption="Effect of Immigration on Wages: OLS vs Bartik IV",
+)
+```
+
+
+
+OLS attenuates the negative wage effect (or may even show a positive coefficient) because local demand is a positive confounder. The IV estimate is closer to the true effect of -0.3.
+
+### Diagnostics
 
+```{python}
+fit_iv_b.first_stage()
+```
 
-## Application 2: A/B Encouragement Design
+
+```{python}
+fit_iv_b.IV_Diag()
+```
+
+
+```{python}
+pf.coefplot([fit_ols_b, fit_iv_b], keep="immigration")
+```
+
+
+
+
+
+## Application 3: A/B Encouragement Design
 
 **Estimating the effect of feature adoption on revenue when users don't comply with treatment assignment.**
 
@@ -113,18 +221,18 @@ A tech company runs an A/B test: half of users are *encouraged* (shown a banner)
 
 ### Synthetic Data
 
-```python
+```{python}
 ab_df = pf.get_encouragement_data()
 ab_df.head()
 ```
 
-
+The dataset has $N = 4{,}000$ users. The **true LATE** of `adopted_feature` on `revenue` is $2.0$ — this is what IV should recover (full DGP in the [Appendix](#appendix-dgp-design-notes)).
 
 ### Three Estimands
 
 We estimate the **reduced form** (ITT), the **first stage**, and the **IV/LATE**:
 
-```python
+```{python}
 # Intent-to-treat (reduced form)
 fit_itt = pf.feols("revenue ~ assigned_treatment | user_type", data=ab_df)
 
@@ -140,7 +248,7 @@ The Wald estimator says: $\text{LATE} = \frac{\text{ITT}}{\text{First Stage}} =
 
 Let's verify:
 
-```python
+```{python}
 itt_coef = fit_itt.coef()["assigned_treatment"]
 fs_coef = fit_fs.coef()["assigned_treatment"]
 late_coef = fit_late.coef()["adopted_feature"]
@@ -155,7 +263,7 @@ print(f"IV/LATE coefficient:     {late_coef:.4f}")
 
 ### Compare All Three
 
-```python
+```{python}
 pf.etable(
     [fit_itt, fit_fs, fit_late],
     labels={
@@ -170,7 +278,7 @@ pf.etable(
 
 
 
-```python
+```{python}
 pf.coefplot([fit_itt, fit_late], keep="assigned_treatment|adopted_feature")
 ```
 
@@ -178,80 +286,11 @@ pf.coefplot([fit_itt, fit_late], keep="assigned_treatment|adopted_feature")
 
 ### IV Diagnostics
 
-```python
+```{python}
 fit_late.IV_Diag()
 ```
 
 
-## Application 3: Shift-Share (Bartik) Instruments
-
-**Does immigration affect local wages?**
-
-A long-standing question in labor economics. The challenge: regions that attract immigrants may also have booming labor markets, biasing OLS upward. The **shift-share** (Bartik) instrument, formalized by Borusyak, Hull & Jaravel (2022), constructs predicted local immigration from:
-
-$$
-B_r = \sum_{k=1}^{K} s_{rk} \cdot g_k
-$$
-
-where $s_{rk}$ is region $r$'s historical share of immigrants from origin $k$, and $g_k$ is the national inflow from origin $k$. Because the instrument is constructed from *national* shocks interacted with *historical* shares, it is plausibly exogenous to current local labor demand.
-
-### Synthetic Data
-
-```python
-bartik_df = pf.get_bartik_data()
-bartik_df.head()
-```
-
-
-
-### OLS vs IV
-
-```python
-# OLS: biased because local demand drives both immigration and wages
-fit_ols_b = pf.feols("wages ~ immigration + log_population", data=bartik_df)
-
-# IV: using the Bartik instrument
-fit_iv_b = pf.feols(
-    "wages ~ log_population | immigration ~ bartik_instrument",
-    data=bartik_df,
-)
-```
-
-
-```python
-pf.etable(
-    [fit_ols_b, fit_iv_b],
-    labels={
-        "wages": "Wages",
-        "immigration": "Immigration",
-        "log_population": "Log Population",
-    },
-    caption="Effect of Immigration on Wages: OLS vs Bartik IV",
-)
-```
-
-
-
-OLS attenuates the negative wage effect (or may even show a positive coefficient) because local demand is a positive confounder. The IV estimate is closer to the true effect of -0.3.
-
-### Diagnostics
-
-```python
-fit_iv_b.first_stage()
-```
-
-
-```python
-fit_iv_b.IV_Diag()
-```
-
-
-```python
-pf.coefplot([fit_ols_b, fit_iv_b], keep="immigration")
-```
-
-
-
 ## IV Diagnostics in PyFixest
 
 Weak instruments --- instruments that are only loosely correlated with the endogenous variable --- lead to biased and unreliable IV estimates. PyFixest provides two key diagnostic tools to detect this problem.
@@ -262,7 +301,7 @@ The `.first_stage()` method re-estimates the first-stage regression and computes
 
 The F-statistic adapts to your variance-covariance specification: if you fit the IV model with heteroskedasticity-robust or cluster-robust standard errors, the first-stage F-statistic is computed accordingly.
 
-```python
+```{python}
 # Re-use the motherhood penalty IV model
 fit_iv.first_stage()
 
@@ -285,7 +324,7 @@ where $\hat{\pi}$ are the first-stage coefficients on the excluded instruments,
 
 The `.IV_Diag()` method computes both the standard F-statistic and the effective F-statistic in one call:
 
-```python
+```{python}
 fit_iv.IV_Diag()
 
 print(f"Standard F-statistic:  {fit_iv._f_stat_1st_stage:.1f}")
@@ -308,3 +347,147 @@ print(f"Effective F-statistic: {fit_iv._eff_F:.1f}")
 - [Regression Tables](regression-tables.qmd) --- customize publication-ready output tables.
 - [`Feiv` API Reference](../reference/estimation.models.feiv_.Feiv.qmd) --- full documentation of the IV estimator class.
 :::
+
+## Appendix: DGP Design Notes
+
+This appendix documents the design rationale, parameter choices, and verification checks for the three synthetic datasets used in this tutorial. All DGPs live in `pyfixest/utils/dgps.py`.
+
+### General Principle {.unlisted}
+
+Every DGP encodes a **true causal effect** as a named constant and introduces an **unobserved confounder** $U$ that creates OVB. The bias follows the formula:
+
+$$
+\text{plim}(\hat{\beta}_{\text{OLS}}) = \beta_{\text{true}} + \underbrace{\frac{\text{Cov}(X,\, U)}{\text{Var}(X)} \cdot \gamma_U}_{\text{OVB}}
+$$
+
+where $\gamma_U$ is the effect of $U$ on $Y$. IV recovers $\beta_{\text{true}}$ because the instrument $Z$ is uncorrelated with $U$ (instrumental unconfoundedness), so $\text{Cov}(Z, U) = 0$.
+
+---
+
+### App 1 — `get_ivf_data(N=2000, seed=1234)` {.unlisted}
+
+**Story:** Motherhood penalty. Career ambition is the unobserved confounder.
+**Instrument:** IVF treatment success — quasi-random conditional on seeking treatment.
+
+#### Structural equations
+
+| Variable | Equation |
+|---|---|
+| `career_ambition` ($U$) | $\sim \mathcal{N}(0,\,1)$ — unobserved |
+| `ivf_success` ($Z$) | $\sim \text{Bernoulli}(0.45)$ — instrument, $\perp U$ |
+| `num_children` ($X$) | $= 1.2 - 0.4\cdot U + 0.8\cdot Z + \mathcal{N}(0,\,0.5)$ |
+| `earnings` ($Y$) | $= 10 + 0.6\cdot U - 0.15\cdot X + \mathcal{N}(0,\,1)$ |
+
+#### Named parameters
+
+| Parameter | Value | Role |
+|---|---|---|
+| `true_effect` | $-0.15$ | Structural coefficient: children $\to$ earnings |
+| `ambition_on_children` | $-0.4$ | Confounder $\to$ $X$ (negative: ambition reduces fertility) |
+| `ambition_on_earnings` | $0.6$ | Confounder $\to$ $Y$ (positive: ambition raises earnings) |
+| `ivf_on_children` | $0.8$ | Instrument $\to$ $X$ (first-stage strength) |
+
+#### OVB derivation
+
+$$
+\text{Cov}(\text{num\_children},\, U) \approx -0.4 \cdot \text{Var}(U) = -0.4
+$$
+$$
+\text{Var}(\text{num\_children}) \approx (-0.4)^2 + (0.8)^2 \cdot 0.45 \cdot 0.55 + 0.25 \approx 0.57
+$$
+$$
+\text{OVB} = 0.6 \cdot \frac{-0.4}{0.57} \approx -0.42
+\qquad \Rightarrow \qquad
+\hat\beta_{\text{OLS}} \approx -0.15 + (-0.42) \approx -0.57
+$$
+
+OLS overstates the penalty roughly 4×. IV recovers $\approx -0.15$.
+
+#### Verification checks
+- First-stage coefficient on `ivf_success` $\approx 0.8$; $F \gg 10$.
+- OLS coefficient on `num_children` $\approx -0.57$ (more negative than true).
+- IV coefficient on `num_children` $\approx -0.15$.
+
+---
+
+### App 2 — `get_bartik_data(N=300, seed=1234)` {.unlisted}
+
+**Story:** Immigration and local wages. Local demand shock is the unobserved confounder.
+**Instrument:** Bartik shift-share — national inflow shocks interacted with historical settlement shares.
+
+#### Structural equations
+
+| Variable | Equation |
+|---|---|
+| `local_demand` ($U$) | $\sim \mathcal{N}(0,\,1)$ — unobserved |
+| `bartik_instrument` ($Z$) | $\sim \mathcal{N}(0,\,1)$ — instrument, $\perp U$ |
+| `log_population` | $= 2 + 0.1\cdot U + \mathcal{N}(0,\,0.3)$ — control |
+| `immigration` ($X$) | $= 0.5 + 0.7\cdot Z + 0.9\cdot U + \mathcal{N}(0,\,0.5)$ |
+| `wages` ($Y$) | $= 8 + 0.5\cdot U - 0.3\cdot X + 0.2\cdot\texttt{log\_pop} + \mathcal{N}(0,\,1)$ |
+
+#### Named parameters
+
+| Parameter | Value | Role |
+|---|---|---|
+| `true_effect` | $-0.3$ | Structural coefficient: immigration $\to$ wages |
+| `demand_on_immig` | $0.9$ | Confounder $\to$ $X$ (positive: booming regions attract immigrants) |
+| `demand_on_wages` | $0.5$ | Confounder $\to$ $Y$ (positive: booming regions pay more) |
+| `bartik_on_immig` | $0.7$ | Instrument $\to$ $X$ (first-stage strength) |
+
+#### OVB derivation
+
+Both `demand_on_immig` and `demand_on_wages` are positive, so $\text{Cov}(\text{immigration},\, U \mid \text{log\_pop}) > 0$ and $\gamma_U = 0.5 > 0$, giving a **positive OVB**:
+
+$$
+\text{OVB} = 0.5 \cdot \frac{0.9}{\text{Var}(\text{immigration} \mid \text{log\_pop})} > 0
+\qquad \Rightarrow \qquad
+\hat\beta_{\text{OLS}} \approx -0.3 + \text{positive} > -0.3
+$$
+
+OLS attenuates (or reverses) the negative wage effect. IV recovers $\approx -0.3$.
+
+#### Verification checks
+- First-stage coefficient on `bartik_instrument` $\approx 0.7$; $F \gg 10$ even at $N = 300$.
+- OLS coefficient on `immigration` less negative than $-0.3$ (or positive).
+- IV coefficient on `immigration` $\approx -0.3$.
+
+---
+
+### App 3 — `get_encouragement_data(N=4000, seed=1234)` {.unlisted}
+
+**Story:** A/B encouragement design with imperfect compliance. No unobserved confounder — endogeneity arises from self-selection into adoption.
+**Instrument:** `assigned_treatment` — fully randomized.
+**Fixed effect:** `user_type` $\in \{0, 1, 2\}$ — absorbed in all three models.
+
+#### Structural equations
+
+| Variable | Equation |
+|---|---|
+| `user_type` | $\sim \text{Uniform}\{0,\,1,\,2\}$ — fixed effect, $\mu = \{0{\to}0,\;1{\to}1,\;2{\to}{-0.5}\}$ |
+| `assigned_treatment` ($Z$) | $\sim \text{Bernoulli}(0.5)$ — randomized instrument |
+| `adopted_feature` ($D$) | $\sim \text{Bernoulli}(p)$, $p = 0.70$ if $Z=1$, else $0.15$ |
+| `revenue` ($Y$) | $= 5 + \mu_{\texttt{user\_type}} + 2.0\cdot D + \mathcal{N}(0,\,1)$ |
+
+#### Named parameters
+
+| Parameter | Value | Role |
+|---|---|---|
+| `true_late` | $2.0$ | LATE: revenue effect of adoption for compliers |
+| `p_adopt_encouraged` | $0.70$ | $\Pr(\text{adopt} \mid Z=1)$ |
+| `p_adopt_control` | $0.15$ | $\Pr(\text{adopt} \mid Z=0)$ |
+
+#### Wald identity (exact by construction)
+
+$$
+\text{First Stage} = 0.70 - 0.15 = 0.55
+\qquad
+\text{ITT} = 2.0 \times 0.55 = 1.10
+\qquad
+\text{LATE} = \frac{1.10}{0.55} = 2.0 \checkmark
+$$
+
+#### Verification checks
+- `fit_fs.coef()["assigned_treatment"]` $\approx 0.55$.
+- `fit_itt.coef()["assigned_treatment"]` $\approx 1.10$.
+- `fit_late.coef()["adopted_feature"]` $\approx 2.0$.
+- Wald ratio `ITT / FS` $\approx 2.0$ (holds exactly in population; negligible sampling noise at $N=4000$).
diff --git a/pyfixest/utils/dgps.py b/pyfixest/utils/dgps.py
index 22d10d212..2c9a9421e 100644
--- a/pyfixest/utils/dgps.py
+++ b/pyfixest/utils/dgps.py
@@ -2,6 +2,197 @@
 import pandas as pd
 
 
+def get_ivf_data(N=2000, seed=1234):
+    """
+    Synthetic data for the motherhood penalty IV application (IVF instrument).
+
+    DGP
+    ---
+    Unobserved confounder: career_ambition ~ N(0, 1)
+
+    First stage (num_children on ivf_success):
+        num_children = 1.2 - 0.4*career_ambition + 0.8*ivf_success + N(0, 0.5)
+        → ivf_success is relevant (first-stage coefficient ≈ 0.8, F >> 10)
+
+    Outcome (structural equation):
+        earnings = 10 + 0.6*career_ambition + TRUE_EFFECT*num_children + N(0, 1)
+        TRUE_EFFECT = -0.15
+
+    OVB formula for naive OLS (earnings ~ num_children):
+        bias ≈ γ_ambition * Cov(num_children, ambition) / Var(num_children)
+             ≈ 0.6 * (-0.4) / 0.57 ≈ -0.42
+        β_OLS ≈ -0.15 + (-0.42) ≈ -0.57   (overstates the penalty)
+        β_IV  ≈ -0.15                        (recovers the true effect)
+
+    Parameters
+    ----------
+    N : int, optional
+        Number of observations. Default is 2000.
+    seed : int, optional
+        Random seed. Default is 1234.
+
+    Returns
+    -------
+    pandas.DataFrame
+        Columns: ``earnings``, ``num_children``, ``ivf_success``.
+    """
+    # --- DGP parameters ---
+    true_effect = -0.15          # causal effect of num_children on earnings
+    ambition_on_children = -0.4  # confounder → endogenous var (creates OVB)
+    ambition_on_earnings = 0.6   # confounder → outcome (creates OVB)
+    ivf_on_children = 0.8        # first-stage strength (instrument → endogenous)
+
+    rng = np.random.default_rng(seed)
+    career_ambition = rng.normal(0, 1, N)
+    ivf_success = rng.binomial(1, 0.45, N)
+    num_children = np.clip(
+        1.2
+        + ambition_on_children * career_ambition
+        + ivf_on_children * ivf_success
+        + rng.normal(0, 0.5, N),
+        0,
+        None,
+    )
+    earnings = (
+        10
+        + ambition_on_earnings * career_ambition
+        + true_effect * num_children
+        + rng.normal(0, 1, N)
+    )
+    return pd.DataFrame(
+        {
+            "earnings": earnings,
+            "num_children": num_children,
+            "ivf_success": ivf_success,
+        }
+    )
+
+
+def get_bartik_data(N=300, seed=1234):
+    """
+    Synthetic data for a Bartik (shift-share) IV application on immigration and wages.
+
+    DGP
+    ---
+    Unobserved confounder: local_demand ~ N(0, 1)
+
+    First stage (immigration on bartik_instrument, conditional on log_population):
+        immigration = 0.5 + 0.7*bartik_instrument + 0.9*local_demand + N(0, 0.5)
+        → bartik_instrument is relevant; bartik ⊥ local_demand (exogenous)
+
+    Outcome (structural equation):
+        wages = 8 + 0.5*local_demand + TRUE_EFFECT*immigration + 0.2*log_population + N(0, 1)
+        TRUE_EFFECT = -0.3
+
+    OVB for naive OLS (wages ~ immigration + log_population):
+        Partial bias from local_demand ≈ 0.5 * 0.9/Var(immigration|log_pop) > 0
+        β_OLS on immigration ≈ -0.3 + positive_bias → attenuated (less negative or positive)
+        β_IV  on immigration ≈ -0.3  (recovers the true effect)
+
+    Parameters
+    ----------
+    N : int, optional
+        Number of observations (regions). Default is 300.
+    seed : int, optional
+        Random seed. Default is 1234.
+
+    Returns
+    -------
+    pandas.DataFrame
+        Columns: ``wages``, ``immigration``, ``log_population``, ``bartik_instrument``.
+    """
+    # --- DGP parameters ---
+    true_effect = -0.3           # causal effect of immigration on wages
+    demand_on_immig = 0.9        # confounder → endogenous var
+    demand_on_wages = 0.5        # confounder → outcome (creates positive OVB)
+    bartik_on_immig = 0.7        # first-stage strength
+
+    rng = np.random.default_rng(seed)
+    local_demand = rng.normal(0, 1, N)
+    bartik_instrument = rng.normal(0, 1, N)
+    log_population = 2 + 0.1 * local_demand + rng.normal(0, 0.3, N)
+    immigration = (
+        0.5
+        + bartik_on_immig * bartik_instrument
+        + demand_on_immig * local_demand
+        + rng.normal(0, 0.5, N)
+    )
+    wages = (
+        8
+        + demand_on_wages * local_demand
+        + true_effect * immigration
+        + 0.2 * log_population
+        + rng.normal(0, 1, N)
+    )
+    return pd.DataFrame(
+        {
+            "wages": wages,
+            "immigration": immigration,
+            "log_population": log_population,
+            "bartik_instrument": bartik_instrument,
+        }
+    )
+
+
+def get_encouragement_data(N=4000, seed=1234):
+    """
+    Synthetic data for an A/B encouragement design IV application.
+
+    DGP
+    ---
+    Instrument: assigned_treatment ~ Bernoulli(0.5)  [randomized, exogenous]
+    Fixed effect: user_type ∈ {0, 1, 2}
+
+    First stage (compliance):
+        P(adopt | encouraged)     = 0.70  (compliers + always-takers)
+        P(adopt | not encouraged) = 0.15  (always-takers only)
+        First-stage coefficient   = 0.70 - 0.15 = 0.55
+
+    Outcome (structural equation):
+        revenue = 5 + user_type_FE + TRUE_LATE*adopted_feature + N(0, 1)
+        TRUE_LATE = 2.0  (effect on compliers)
+
+    Wald identity (exact by construction):
+        ITT  = E[Y|Z=1] - E[Y|Z=0] = 2.0 * 0.55 = 1.10
+        LATE = ITT / first_stage   = 1.10 / 0.55 = 2.0  ✓
+
+    Parameters
+    ----------
+    N : int, optional
+        Number of observations (users). Default is 4000.
+    seed : int, optional
+        Random seed. Default is 1234.
+
+    Returns
+    -------
+    pandas.DataFrame
+        Columns: ``revenue``, ``assigned_treatment``, ``adopted_feature``, ``user_type``.
+    """
+    # --- DGP parameters ---
+    true_late = 2.0              # LATE: causal effect of adoption on revenue for compliers
+    p_adopt_encouraged = 0.70    # P(adopt | Z=1): compliers + always-takers
+    p_adopt_control = 0.15       # P(adopt | Z=0): always-takers only
+    # first_stage = p_adopt_encouraged - p_adopt_control = 0.55
+    # ITT         = true_late * first_stage              = 1.10
+    # LATE        = ITT / first_stage                    = 2.0
+
+    rng = np.random.default_rng(seed)
+    user_type = rng.choice([0, 1, 2], size=N)
+    user_type_effect = np.array([0.0, 1.0, -0.5])[user_type]
+    assigned_treatment = rng.binomial(1, 0.5, N)
+    p_adopt = np.where(assigned_treatment == 1, p_adopt_encouraged, p_adopt_control)
+    adopted_feature = rng.binomial(1, p_adopt, N)
+    revenue = 5 + user_type_effect + true_late * adopted_feature + rng.normal(0, 1, N)
+    return pd.DataFrame(
+        {
+            "revenue": revenue,
+            "assigned_treatment": assigned_treatment,
+            "adopted_feature": adopted_feature,
+            "user_type": pd.Categorical(user_type),
+        }
+    )
+
+
 def get_blw():
     """DGP for effect heterogeneity in panel data from Baker, Larcker, and Wang (2022)."""
     n = np.arange(1, 31)

From 63f40ebcbe23350f470744d925c20a3eaf288a0e Mon Sep 17 00:00:00 2001
From: mdizadi <mdizadi@gmail.com>
Date: Fri, 6 Mar 2026 16:32:17 +0100
Subject: [PATCH 2/7] adding the reference.bib

---
 docs/tutorials/references.bib | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
 create mode 100644 docs/tutorials/references.bib

diff --git a/docs/tutorials/references.bib b/docs/tutorials/references.bib
new file mode 100644
index 000000000..fda739e2d
--- /dev/null
+++ b/docs/tutorials/references.bib
@@ -0,0 +1,35 @@
+@book{wooldridge2010,
+  title     = {Econometric Analysis of Cross Section and Panel Data},
+  author    = {Wooldridge, Jeffrey M.},
+  year      = {2010},
+  edition   = {2nd},
+  publisher = {MIT Press}
+}
+
+@misc{neal2020,
+  title        = {Introduction to Causal Inference from a Machine Learning Perspective},
+  author       = {Neal, Brady},
+  year         = {2020},
+  howpublished = {\url{https://www.bradyneal.com/causal-inference-course}},
+  note         = {Chapter 19: Instrumental Variables}
+}
+
+@article{lundborg2017,
+  title   = {Can Women Have Children and a Career? {IV} Evidence from {IVF} Treatments},
+  author  = {Lundborg, Petter and Plug, Erik and Rasmussen, Astrid W{\"{u}}rtz},
+  journal = {American Economic Review},
+  volume  = {107},
+  number  = {6},
+  pages   = {1611--1637},
+  year    = {2017}
+}
+
+@article{borusyak2022,
+  title   = {Quasi-Experimental Shift-Share Research Designs},
+  author  = {Borusyak, Kirill and Hull, Peter and Jaravel, Xavier},
+  journal = {Review of Economic Studies},
+  volume  = {89},
+  number  = {1},
+  pages   = {181--213},
+  year    = {2022}
+}
\ No newline at end of file

From 346f84a93c96874c9ecf8219b044cfc28d837077 Mon Sep 17 00:00:00 2001
From: mdizadi <mdizadi@gmail.com>
Date: Fri, 6 Mar 2026 17:17:25 +0100
Subject: [PATCH 3/7] chore: commit __init__ files

---
 pyfixest/__init__.py       | 6 ++++++
 pyfixest/utils/__init__.py | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/pyfixest/__init__.py b/pyfixest/__init__.py
index b1cc65a8b..aa30f9347 100644
--- a/pyfixest/__init__.py
+++ b/pyfixest/__init__.py
@@ -21,7 +21,10 @@
     "feglm",
     "feols",
     "fepois",
+    "get_bartik_data",
     "get_data",
+    "get_encouragement_data",
+    "get_ivf_data",
     "get_motherhood_event_study_data",
     "get_ssc",
     "get_twin_data",
@@ -69,7 +72,10 @@
     "qplot": "pyfixest.report",
     "make_table": "pyfixest.report",
     # utils
+    "get_bartik_data": "pyfixest.utils",
     "get_data": "pyfixest.utils",
+    "get_encouragement_data": "pyfixest.utils",
+    "get_ivf_data": "pyfixest.utils",
     "get_motherhood_event_study_data": "pyfixest.utils",
     "get_twin_data": "pyfixest.utils",
     "get_worker_panel": "pyfixest.utils",
diff --git a/pyfixest/utils/__init__.py b/pyfixest/utils/__init__.py
index 43c37fdac..d09f4871c 100644
--- a/pyfixest/utils/__init__.py
+++ b/pyfixest/utils/__init__.py
@@ -1,4 +1,7 @@
 from pyfixest.utils.dgps import (
+    get_bartik_data,
+    get_encouragement_data,
+    get_ivf_data,
     get_motherhood_event_study_data,
     get_twin_data,
     get_worker_panel,
@@ -10,7 +13,10 @@
 )
 
 __all__ = [
+    "get_bartik_data",
     "get_data",
+    "get_encouragement_data",
+    "get_ivf_data",
     "get_motherhood_event_study_data",
     "get_ssc",
     "get_twin_data",

From 60da724346f04a40e53c2566f7190344299ce82d Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Fri, 6 Mar 2026 16:41:14 +0000
Subject: [PATCH 4/7] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 docs/tutorials/instrumental-variables.qmd | 12 ++++++------
 docs/tutorials/references.bib             |  2 +-
 pyfixest/utils/dgps.py                    | 20 ++++++++++----------
 3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/docs/tutorials/instrumental-variables.qmd b/docs/tutorials/instrumental-variables.qmd
index 599c2c424..a60ce8c71 100644
--- a/docs/tutorials/instrumental-variables.qmd
+++ b/docs/tutorials/instrumental-variables.qmd
@@ -24,8 +24,8 @@ You should have read the [Getting Started](../getting-started.qmd) page and have
 
 Estimation of a linear model via Ordinary Least Squares (OLS) yields biased and inconsistent estimates when a regressor is correlated with the error term — a problem known as endogeneity, which arises, for example, in the presence of unobserved confounders. **Instrumental Variable (IV)** estimation addresses this by finding a variable $Z$ that satisfies three conditions:
 
-1. **Relevance**: $Z$ has a causal effect on $X$. 
-2. **Exclusion Ristriction**: $Z$ causal effect on $Y$ is fully mediated by $X$. 
+1. **Relevance**: $Z$ has a causal effect on $X$.
+2. **Exclusion Ristriction**: $Z$ causal effect on $Y$ is fully mediated by $X$.
 3. **Instrumental Unconfoundedness**: $Z$ has no unobserved common causes with $Y$.
 
 ```{dot}
@@ -50,9 +50,9 @@ digraph IV {
 
 `pyfixest` estimates the IV using two-stage least squares (2SLS) estimator where it first projects $X$ onto $Z$ (and all other exogenous variables) to obtain $\hat{X}$, then uses $\hat{X}$ to estimate the Average Treatment Effect (ATE) of $X$ on $Y$. The intuition is that by using $\hat{X}$ the dashed path from the unobserved varialbe is blocked or removed.
 
-When panel data are available, selection bias may also stem from time-invariant  unobserved heterogeneity — unit-specific characteristics (e.g., ability, culture, geography) that are fixed over time but correlated with both treatment and outcome.`pyfixest` addresses this simultaneously by applying a within-transformation (demeaning) to absorb unit fixed effects before running 2SLS, following the FE-IV approach described in @wooldridge2010 [Ch. 11]. Crucially, after demeaning, the 
-instrument must retain within-unit variation over time — time-invariant instruments are eliminated along with the fixed effects and cannot be used for identification. When both fixed effects and an instrument are specified, `pyfixest` therefore isolates 
-the clean variation in treatment that is both within-unit and driven by the instrument, blocking confounding from time-invariant unobservables and time-varying endogenous confounders simultaneously. 
+When panel data are available, selection bias may also stem from time-invariant  unobserved heterogeneity — unit-specific characteristics (e.g., ability, culture, geography) that are fixed over time but correlated with both treatment and outcome.`pyfixest` addresses this simultaneously by applying a within-transformation (demeaning) to absorb unit fixed effects before running 2SLS, following the FE-IV approach described in @wooldridge2010 [Ch. 11]. Crucially, after demeaning, the
+instrument must retain within-unit variation over time — time-invariant instruments are eliminated along with the fixed effects and cannot be used for identification. When both fixed effects and an instrument are specified, `pyfixest` therefore isolates
+the clean variation in treatment that is both within-unit and driven by the instrument, blocking confounding from time-invariant unobservables and time-varying endogenous confounders simultaneously.
 
 
 In `pyfixest`, the IV syntax is:
@@ -72,7 +72,7 @@ import pyfixest as pf
 
 **Does having children reduce women's earnings?**
 
-A naive regression of earnings on fertility is biased: women with stronger career ambitions may both earn more and have fewer children. Since career ambition is positively correlated with earnings but negatively correlated with fertility, OLS overstates the motherhood penalty. This is a classic case of OVB: career ambition is unobserved, yet it drives both fertility decisions and earnings outcomes. @lundborg2017 exploit the quasi-random success of IVF treatment as an instrument for fertility among women who sought treatment. 
+A naive regression of earnings on fertility is biased: women with stronger career ambitions may both earn more and have fewer children. Since career ambition is positively correlated with earnings but negatively correlated with fertility, OLS overstates the motherhood penalty. This is a classic case of OVB: career ambition is unobserved, yet it drives both fertility decisions and earnings outcomes. @lundborg2017 exploit the quasi-random success of IVF treatment as an instrument for fertility among women who sought treatment.
 
 The relevance assumption holds because IVF treatment success creates exogenous variation in the likelihood of having children. The exclusion restriction is supported by the fact that observed working histories of successfully and unsuccessfully treated women are virtually identical before they seek IVF treatment — meaning IVF success is essentially random with respect to labor market potential.
 
diff --git a/docs/tutorials/references.bib b/docs/tutorials/references.bib
index fda739e2d..b4d9864fc 100644
--- a/docs/tutorials/references.bib
+++ b/docs/tutorials/references.bib
@@ -32,4 +32,4 @@ @article{borusyak2022
   number  = {1},
   pages   = {181--213},
   year    = {2022}
-}
\ No newline at end of file
+}
diff --git a/pyfixest/utils/dgps.py b/pyfixest/utils/dgps.py
index 2c9a9421e..f37c7462c 100644
--- a/pyfixest/utils/dgps.py
+++ b/pyfixest/utils/dgps.py
@@ -37,10 +37,10 @@ def get_ivf_data(N=2000, seed=1234):
         Columns: ``earnings``, ``num_children``, ``ivf_success``.
     """
     # --- DGP parameters ---
-    true_effect = -0.15          # causal effect of num_children on earnings
+    true_effect = -0.15  # causal effect of num_children on earnings
     ambition_on_children = -0.4  # confounder → endogenous var (creates OVB)
-    ambition_on_earnings = 0.6   # confounder → outcome (creates OVB)
-    ivf_on_children = 0.8        # first-stage strength (instrument → endogenous)
+    ambition_on_earnings = 0.6  # confounder → outcome (creates OVB)
+    ivf_on_children = 0.8  # first-stage strength (instrument → endogenous)
 
     rng = np.random.default_rng(seed)
     career_ambition = rng.normal(0, 1, N)
@@ -102,10 +102,10 @@ def get_bartik_data(N=300, seed=1234):
         Columns: ``wages``, ``immigration``, ``log_population``, ``bartik_instrument``.
     """
     # --- DGP parameters ---
-    true_effect = -0.3           # causal effect of immigration on wages
-    demand_on_immig = 0.9        # confounder → endogenous var
-    demand_on_wages = 0.5        # confounder → outcome (creates positive OVB)
-    bartik_on_immig = 0.7        # first-stage strength
+    true_effect = -0.3  # causal effect of immigration on wages
+    demand_on_immig = 0.9  # confounder → endogenous var
+    demand_on_wages = 0.5  # confounder → outcome (creates positive OVB)
+    bartik_on_immig = 0.7  # first-stage strength
 
     rng = np.random.default_rng(seed)
     local_demand = rng.normal(0, 1, N)
@@ -169,9 +169,9 @@ def get_encouragement_data(N=4000, seed=1234):
         Columns: ``revenue``, ``assigned_treatment``, ``adopted_feature``, ``user_type``.
     """
     # --- DGP parameters ---
-    true_late = 2.0              # LATE: causal effect of adoption on revenue for compliers
-    p_adopt_encouraged = 0.70    # P(adopt | Z=1): compliers + always-takers
-    p_adopt_control = 0.15       # P(adopt | Z=0): always-takers only
+    true_late = 2.0  # LATE: causal effect of adoption on revenue for compliers
+    p_adopt_encouraged = 0.70  # P(adopt | Z=1): compliers + always-takers
+    p_adopt_control = 0.15  # P(adopt | Z=0): always-takers only
     # first_stage = p_adopt_encouraged - p_adopt_control = 0.55
     # ITT         = true_late * first_stage              = 1.10
     # LATE        = ITT / first_stage                    = 2.0

From 68c90e49a7ad37b68b6ad6447b554bd0583552c8 Mon Sep 17 00:00:00 2001
From: Alexander Fischer <alexander-fischer1801@t-online.de>
Date: Sat, 7 Mar 2026 15:02:38 +0100
Subject: [PATCH 5/7] update lock file for CI run - it was stale

---
 pixi.lock | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pixi.lock b/pixi.lock
index 8395b6d6a..c01bed625 100644
--- a/pixi.lock
+++ b/pixi.lock
@@ -20395,7 +20395,7 @@ packages:
 - pypi: ./
   name: pyfixest
   version: 0.50.0a1
-  sha256: 5b834268a6f595b19a36496205f354e8a8e78b478b0cdede881790f16a0ec0d8
+  sha256: 0aafb935bff170369a8bf97f21c1ab20827e9faaefdcebbcdcdbf859d196ab8a
   requires_dist:
   - formulaic>=1.1.0
   - joblib>=1.4.2

From 94e2195fabfd9d4c9ece16fe3ff85c6d62e995eb Mon Sep 17 00:00:00 2001
From: Mo <mdizadi@pop-os.fritz.box>
Date: Wed, 11 Mar 2026 15:34:57 +0100
Subject: [PATCH 6/7] Modified the instrumental variables tutorial.

---
 docs/tutorials/instrumental-variables.qmd | 262 ++++++++++++++++++----
 docs/tutorials/references.bib             |  20 ++
 2 files changed, 233 insertions(+), 49 deletions(-)

diff --git a/docs/tutorials/instrumental-variables.qmd b/docs/tutorials/instrumental-variables.qmd
index a60ce8c71..d093d33a6 100644
--- a/docs/tutorials/instrumental-variables.qmd
+++ b/docs/tutorials/instrumental-variables.qmd
@@ -12,7 +12,7 @@ execute:
   freeze: false
 bibliography: references.bib
 nocite: |
-  @wooldridge2010, @neal2020, @lundborg2017, @borusyak2022
+  @wooldridge2010, @neal2020, @lundborg2017, @borusyak2022, @oleapflueger2013, @stockyogo2005
 ---
 
 ::: {.callout-note}
@@ -20,40 +20,81 @@ nocite: |
 You should have read the [Getting Started](../getting-started.qmd) page and have `pyfixest` installed.
 :::
 
+```{=html}
+<style>
+.column-margin figure {
+  margin-top: -1.8em;
+}
+.column-margin figcaption {
+  font-size: 0.72em;
+  margin-top: 0.15em;
+  line-height: 1.3;
+}
+.margin-note {
+  font-size: 0.75em;
+  line-height: 1.4;
+  color: #444;
+}
+.appendix-math .math.display {
+  font-size: 0.85em;
+}
+</style>
+```
+
 ## Introduction
 
 Estimation of a linear model via Ordinary Least Squares (OLS) yields biased and inconsistent estimates when a regressor is correlated with the error term — a problem known as endogeneity, which arises, for example, in the presence of unobserved confounders. **Instrumental Variable (IV)** estimation addresses this by finding a variable $Z$ that satisfies three conditions:
 
-1. **Relevance**: $Z$ has a causal effect on $X$.
-2. **Exclusion Ristriction**: $Z$ causal effect on $Y$ is fully mediated by $X$.
-3. **Instrumental Unconfoundedness**: $Z$ has no unobserved common causes with $Y$.
-
 ```{dot}
-//| fig-align: center
-//| fig-cap: "IV DAG: Z is an instrumental variable, U is an unobserved confounder of the effect of X on Y. The dashed node and arrows indicate that U is unobserved. The direct arrow from Y to X shows the relevance. There is no direct arrow from Z to Y, encoding the exclusion restriction. No direct dashed arrow from the unobserved variable to Z shows the instrumental unconfoundedness."
+//| column: margin
+//| fig-cap: "**Figure 1.** IV setup. Z is the instrument, T the endogenous treatment, Y the outcome, and U an unobserved confounder. Dashed elements are unobserved. U creates a backdoor path T ← U → Y, biasing OLS."
+//| fig-width: 2
+//| fig-height: 2
 digraph IV {
     layout=neato;
-    node [shape=circle, fontname="serif", fixedsize=true, width=0.4, fontsize=13];
-    edge [arrowsize=0.6];
+    node [shape=circle, fontname="serif", fixedsize=true, width=0.3, fontsize=11];
+    edge [arrowsize=0.5];
 
-    Z [label="Z", pos="-1.0,1.0!"];
-    U [label="U", style=dashed, pos="1.0,1.4!"];
-    X [label="X", pos="0,0!"];
+    Z [label="Z", pos="-0.25,1.2!"];
+    U [label="U", style=dashed, pos="1.0,1.2!"];
+    T [label="T", pos="0,0!"];
     Y [label="Y", pos="2,0!"];
 
-    Z -> X;
-    X -> Y;
-    U -> X [style=dashed];
+    Z -> T;
+    T -> Y;
+    U -> T [style=dashed];
     U -> Y [style=dashed];
 }
 ```
 
-`pyfixest` estimates the IV using two-stage least squares (2SLS) estimator where it first projects $X$ onto $Z$ (and all other exogenous variables) to obtain $\hat{X}$, then uses $\hat{X}$ to estimate the Average Treatment Effect (ATE) of $X$ on $Y$. The intuition is that by using $\hat{X}$ the dashed path from the unobserved varialbe is blocked or removed.
+1. **Relevance**: $Z$ has a causal effect on $T$.
+2. **Exclusion Restriction**: $Z$'s causal effect on $Y$ is fully mediated by $T$.
+3. **Instrumental Unconfoundedness**: $Z$ has no unobserved common causes with $Y$.
+
+In Figure 1, the path from $Z$ to $T$ shows relevance, no path from $Z$ to $Y$ encodes the exclusion restriction, and no path from the unobservable to the instrument shows the instrumental unconfoundedness.
 
-When panel data are available, selection bias may also stem from time-invariant  unobserved heterogeneity — unit-specific characteristics (e.g., ability, culture, geography) that are fixed over time but correlated with both treatment and outcome.`pyfixest` addresses this simultaneously by applying a within-transformation (demeaning) to absorb unit fixed effects before running 2SLS, following the FE-IV approach described in @wooldridge2010 [Ch. 11]. Crucially, after demeaning, the
-instrument must retain within-unit variation over time — time-invariant instruments are eliminated along with the fixed effects and cannot be used for identification. When both fixed effects and an instrument are specified, `pyfixest` therefore isolates
-the clean variation in treatment that is both within-unit and driven by the instrument, blocking confounding from time-invariant unobservables and time-varying endogenous confounders simultaneously.
+```{dot}
+//| column: margin
+//| fig-cap: "**Figure 2.** After 2SLS first stage. T is replaced by T̂ (fitted values from projecting T onto Z). The backdoor path U → T̂ is eliminated — only the exogenous variation in T driven by Z remains."
+//| fig-width: 2
+//| fig-height: 2
+digraph IV2 {
+    layout=neato;
+    node [shape=circle, fontname="serif", fixedsize=true, width=0.3, fontsize=11];
+    edge [arrowsize=0.5];
+
+    Z  [label="Z",  pos="-0.25,1.2!"];
+    U  [label="U",  style=dashed, pos="1.0,1.2!"];
+    Th [label="T̂", pos="0,0!"];
+    Y  [label="Y",  pos="2,0!"];
+
+    Z  -> Th;
+    Th -> Y;
+    U  -> Y [style=dashed];
+}
+```
 
+`pyfixest` estimates the IV using two-stage least squares (2SLS) estimator where it first projects $T$ onto $Z$ (and all other exogenous variables) to obtain $\hat{T}$, then uses $\hat{T}$ to estimate the causal effect of $T$ on $Y$. Because $\hat{T}$ is not a function of $U$, we can think of the dashed path from the unobserved variable as blocked or removed.
 
 In `pyfixest`, the IV syntax is:
 
@@ -61,7 +102,13 @@ In `pyfixest`, the IV syntax is:
 Y ~ exogenous_controls | fixed_effects | endogenous ~ instrument
 ```
 
-This tutorial walks through three applications, all addressing endogeneity from **selection bias — a form of Omitted Variable Bias (OVB) where unobserved confounders drive both selection into treatment and the outcome.** Applications 1 and 2 do so in observational settings — from individual-level selection and regional-level sorting respectively — while Application 3 arises in an experimental setting with imperfect compliance, where self-selection into treatment remains a problem despite randomized encouragement.
+::: {.callout-note}
+## Fixed Effects with IV
+
+When panel data are available, endogeneity may also stem from time-invariant unobserved heterogeneity — unit-specific characteristics (e.g., ability, culture, geography) that are fixed over time but correlated with both treatment and outcome. `pyfixest` addresses this simultaneously by applying a within-transformation (demeaning) to absorb unit fixed effects before running 2SLS, following the FE-IV approach described in @wooldridge2010 [Ch. 11]. Crucially, after demeaning, the instrument must retain within-unit variation over time — time-invariant instruments are eliminated along with the fixed effects and cannot be used for identification. When both fixed effects and an instrument are specified, `pyfixest` therefore isolates the clean variation in treatment that is both within-unit and driven by the instrument, blocking confounding from time-invariant unobservables and time-varying endogenous confounders simultaneously.
+:::
+
+This tutorial walks through three applications, all addressing endogeneity from **selection bias — a form of Omitted Variable Bias (OVB) where unobserved confounders drive both selection into treatment and the outcome.** Applications 1 and 2 operate in observational settings — exploiting quasi-random variation from individual-level selection and regional-level sorting respectively — while Application 3 arises in an experimental setting where encouragement is randomly assigned but treatment take-up remains subject to self-selection.
 
 ```{python}
 import pyfixest as pf
@@ -102,7 +149,21 @@ fit_ols.summary()
 
 ### IV Estimation
 
-A possible IV would be using `ivf_success` as an instrument for `num_children`:
+A natural instrument is `ivf_success`, used as an instrument for `num_children`:
+
+::: {.column-margin}
+::: {.margin-note}
+**2SLS = IV = Wald estimator**
+
+With a single instrument $Z$, the 2SLS estimator is numerically identical to the IV estimator:
+
+$$\hat{\beta}_{IV} = \frac{\widehat{\text{Cov}}(Y,\, Z)}{\widehat{\text{Cov}}(T,\, Z)}$$
+
+With a binary instrument, this simplifies to the **Wald estimator**:
+
+$$\hat{\beta}_{\text{Wald}} = \frac{\bar{Y}_{Z=1} - \bar{Y}_{Z=0}}{\bar{T}_{Z=1} - \bar{T}_{Z=0}}$$
+:::
+:::
 
 ```{python}
 fit_iv = pf.feols("earnings ~ 1 | num_children ~ ivf_success", data=ivf_df)
@@ -115,10 +176,14 @@ The IV estimate is closer to the true effect of -0.15 --- less negative than the
 
 ### Compare OLS and IV
 
+The table below places OLS and IV side by side.
+
 ```{python}
 pf.etable(
     [fit_ols, fit_iv],
     labels={"earnings": "Earnings", "num_children": "Number of Children"},
+    model_heads=["OLS", "IV"],
+
     caption="Motherhood Penalty: OLS vs IV",
 )
 ```
@@ -127,32 +192,41 @@ pf.etable(
 
 ### First-Stage Diagnostics
 
-The only IV diagnostic currently available in pyfixest is the IV weakness (relevance) test, which provides two statistics: f_stat and effective_f. A strong first stage is essential for IV to work. We check the first-stage F-statistic where critical values change depending on the error structure (IID vs. robust) and effective_f (Olea & Pflueger (2013) robust version) and run `IV_Diag()`:
+The only IV diagnostic currently available in pyfixest is the IV weakness (relevance) test, which provides two statistics: `f_stat` and `effective_f`. A strong first stage is essential for IV to work. We check the first-stage F-statistic (iid errors) and effective F-statistic (a robust version, @oleapflueger2013) and run `IV_Diag()`:
 
 ```{python}
+# first_stage() must be called before IV_Diag() — it fits the first-stage OLS
+# regression and stores the model in fit_iv._model_1st_stage, which IV_Diag() requires.
 fit_iv.first_stage()
+fit_iv._model_1st_stage.summary()
 ```
 
-
 ```{python}
 fit_iv.IV_Diag()
+print(f"First-stage F-statistic : {fit_iv._f_stat_1st_stage:.2f}")
+print(f"Effective F-statistic   : {fit_iv._eff_F:.2f}")
 ```
 
-
-The first-stage F-statistic should be well above 10 (under IID errors) or effective_f above 10 regardeless of error structure, confirming that IVF success is a strong predictor of fertility.
+Both F-statistics are well above 10, confirming that IVF success is a strong predictor of fertility. For publication-quality work, refer to the critical value tables in @oleapflueger2013 for effective F-statistic (indexed by number of instruments and tolerable bias level $\tau$) — `pyfixest` reports the statistic only and does not compute these critical values.
 
 
 ## Application 2: Shift-Share (Bartik) Instruments
 
 **Does immigration affect local wages?**
 
-A long-standing question in labor economics. The challenge: regions that attract immigrants may also have booming labor markets, biasing OLS upward. The **shift-share** (Bartik) instrument, formalized by @borusyak2022, constructs predicted local immigration from:
+A long-standing question in labor economics is how immigration affects local wages. The challenge is that regions that attract immigrants may also have booming labor markets, biasing OLS upward. The **shift-share** (Bartik) instrument, formalized by @borusyak2022, which is the weighted average of external shocks using local exposure shares as weights, constructs predicted local immigration from:
 
 $$
 B_r = \sum_{k=1}^{K} s_{rk} \cdot g_k
 $$
 
-where $s_{rk}$ is region $r$'s historical share of immigrants from origin $k$, and $g_k$ is the national inflow from origin $k$. Because the instrument is constructed from *national* shocks interacted with *historical* shares, it is plausibly exogenous to current local labor demand.
+where $s_{rk}$ is region $r$'s historical share of immigrants from origin $k$, and $g_k$ is the national inflow from origin $k$. The key identification assumption in @borusyak2022 is that the *shocks* $g_k$ are as-good-as-randomly assigned across origin countries — uncorrelated with unobserved local labor demand in destination regions. The historical shares $s_{rk}$ can themselves be endogenous (regions that historically attracted many immigrants may differ in other ways); what matters is that the national inflows that drive the instrument are exogenous.
+
+::: {.callout-note}
+## Key Insight from @borusyak2022
+
+The shift-share IV coefficient estimated at the *region* level is numerically identical to an IV estimated at the *shock* (origin-country) level — where regional outcomes and treatment are first averaged up using exposure weights, and shocks instrument for aggregated treatment. Identification therefore lives at the shock level. This means validity should be justified, tested, and visualized using shock-level regressions — not location-level ones — even though the 2SLS is run at the region level. Hence, all IV assumptions — relevance, exclusion restriction, and instrumental unconfoundedness — should be verified at the shock level, not the location level. In practice this means running shock-level regressions to inspect the first stage, test balance, and visualise identifying variation.
+:::
 
 ### Synthetic Data
 
@@ -165,6 +239,8 @@ The dataset has $N = 300$ regions. The **true causal effect** of `immigration` o
 
 ### OLS vs IV
 
+As in the first application, we can compare the naive OLS and IV estimates.
+
 ```{python}
 # OLS: biased because local demand drives both immigration and wages
 fit_ols_b = pf.feols("wages ~ immigration + log_population", data=bartik_df)
@@ -176,7 +252,6 @@ fit_iv_b = pf.feols(
 )
 ```
 
-
 ```{python}
 pf.etable(
     [fit_ols_b, fit_iv_b],
@@ -185,25 +260,32 @@ pf.etable(
         "immigration": "Immigration",
         "log_population": "Log Population",
     },
+    model_heads=["OLS", "IV"],
     caption="Effect of Immigration on Wages: OLS vs Bartik IV",
 )
 ```
 
-
-
 OLS attenuates the negative wage effect (or may even show a positive coefficient) because local demand is a positive confounder. The IV estimate is closer to the true effect of -0.3.
 
 ### Diagnostics
 
+@borusyak2022 recommend running shock-level regressions to inspect the first stage, since identification lives at the shock level. Because the synthetic data used here does not expose the underlying shares and shocks needed to construct those aggregates, the diagnostics below are run at the region level instead.
+
 ```{python}
+# first_stage() must be called before IV_Diag()
 fit_iv_b.first_stage()
+fit_iv_b._model_1st_stage.summary()
 ```
 
-
 ```{python}
 fit_iv_b.IV_Diag()
+print(f"First-stage F-statistic : {fit_iv_b._f_stat_1st_stage:.2f}")
+print(f"Effective F-statistic   : {fit_iv_b._eff_F:.2f}")
 ```
 
+The region-level first stage F-statistic and effective F-statistic both confirm the relevance of the Bartik instrument as a predictor of regional immigration, i.e., the values are well beyond 10. For precise critical values for effective F-statistic accounting for the number of instruments and acceptable bias, refer to the lookup tables in @oleapflueger2013 — `pyfixest` does not compute these automatically.
+
+We can also visualize the OLS and IV coefficient estimates together with their confidence intervals.
 
 ```{python}
 pf.coefplot([fit_ols_b, fit_iv_b], keep="immigration")
@@ -217,7 +299,65 @@ pf.coefplot([fit_ols_b, fit_iv_b], keep="immigration")
 
 **Estimating the effect of feature adoption on revenue when users don't comply with treatment assignment.**
 
-A tech company runs an A/B test: half of users are *encouraged* (shown a banner) to try a new feature. But not everyone who sees the banner actually adopts the feature, and some control users discover it on their own. The simple intent-to-treat (ITT) comparison underestimates the effect on users who actually adopt. Random assignment serves as an instrument for actual adoption, and the IV estimate recovers the **Local Average Treatment Effect (LATE)** on compliers.
+A tech company runs an A/B test: half of users are *encouraged* (shown a banner) to try a 
+new feature. But not everyone who sees the banner actually adopts the feature, and some 
+control users discover it on their own. The simple intent-to-treat (ITT) comparison ---
+contrasting encouraged and non-encouraged groups --- estimates the effect of encouragement, 
+not of adoption itself, and these two quantities differ whenever compliance is imperfect.
+
+To recover the effect of actual adoption, we partition users into four compliance types: 
+*compliers* who adopt if and only if encouraged, *always-takers* who adopt regardless, 
+*never-takers* who never adopt, and *defiers* who do the opposite of their assignment. We
+then invoke **monotonicity**: no user is a defier, meaning encouragement can only weakly
+increase the probability of adoption.
+
+::: {.column-margin}
+::: {.margin-note}
+**No linearity assumed**
+
+The three IV assumptions (relevance, exclusion restriction, instrumental unconfoundedness) still apply — and are credible here given random assignment. What LATE additionally requires, beyond those, is only **monotonicity** — that no unit is a defier. Crucially, no linearity or homogeneity of the treatment effect is assumed. The resulting estimand is the average effect for compliers, without any restriction on the shape of the treatment-response function.
+:::
+:::
+
+Always-takers always take the treatment and never-takers never do, regardless of encouragement — the instrument simply has no effect on them. By the exclusion restriction, it cannot affect their outcomes either. The outcome difference between encouraged and non-encouraged groups therefore reflects only complier variation, scaled down by the share of compliers in the population. Dividing by the first stage — which under monotonicity equals that complier share — recovers the LATE.
+
+::: {.column-margin}
+::: {.margin-note style="margin-top: 5em;"}
+**LATE = CACE (Complier Average Causal Effect)**
+
+$$\text{LATE} = \text{CACE} = E[Y(1) - Y(0) \mid \text{Complier}]$$
+
+where $Y(1)$ and $Y(0)$ are **potential outcomes** — the outcome a unit would realize under treatment and under control, respectively.
+:::
+:::
+
+This is the identification argument behind the **Local Average Treatment Effect (LATE)**.
+The Wald estimand --- the ratio of the reduced-form effect of $Z$ on $Y$ to the first-stage
+effect of $Z$ on $T$,
+
+$$
+\beta_{Wald} = \frac{E[Y \mid Z=1] - E[Y \mid Z=0]}{E[T \mid Z=1] - E[T \mid Z=0]}
+$$
+
+::: {.column-margin}
+::: {.margin-note}
+**Why Wald = LATE**
+
+By the exclusion restriction, always-takers and never-takers have the same $Y$ regardless of $Z$, so they cancel in the numerator. Only compliers remain:
+
+$$E[Y|Z=1] - E[Y|Z=0]$$
+$$= P(\text{Complier}) \cdot E[Y(1)-Y(0) \mid \text{Complier}]$$
+
+Under monotonicity, the denominator equals the complier share:
+
+$$E[T|Z=1] - E[T|Z=0] = P(\text{Complier})$$
+
+Dividing $P(\text{Complier})$ cancels and we recover LATE.
+:::
+:::
+
+equals **LATE**, also known as the
+**Complier Average Causal Effect (CACE)**, under monotonicity. Since random assignment to encouragement serves as an instrument for actual adoption — where $T \in \{0,1\}$ denotes whether a user adopts the feature — the 2SLS estimator consistently estimates this Wald estimand, and therefore LATE.
 
 ### Synthetic Data
 
@@ -244,9 +384,9 @@ fit_late = pf.feols("revenue ~ 1 | user_type | adopted_feature ~ assigned_treatm
 ```
 
 
-The Wald estimator says: $\text{LATE} = \frac{\text{ITT}}{\text{First Stage}} = \frac{\text{Cov}(Y, Z)}{\text{Cov}(D, Z)}$.
+The Wald estimator is: $\hat{\beta}_{\text{Wald}} = \frac{\widehat{\text{ITT}}}{\widehat{\text{First Stage}}} = \frac{\widehat{\text{Cov}}(Y,\, Z)}{\widehat{\text{Cov}}(D,\, Z)}$.
 
-Let's verify:
+We can verify this numerically using `pyfixest`:
 
 ```{python}
 itt_coef = fit_itt.coef()["assigned_treatment"]
@@ -263,6 +403,8 @@ print(f"IV/LATE coefficient:     {late_coef:.4f}")
 
 ### Compare All Three
 
+The table below places the ITT, first stage, and IV estimates side by side.
+
 ```{python}
 pf.etable(
     [fit_itt, fit_fs, fit_late],
@@ -278,16 +420,22 @@ pf.etable(
 
 
 
+The coefficient plot compares the ITT effect of encouragement on revenue with the IV (LATE) estimate of actual feature adoption. Because not all encouraged users adopt, the LATE is larger than the ITT — scaled up by the complier share.
+
 ```{python}
 pf.coefplot([fit_itt, fit_late], keep="assigned_treatment|adopted_feature")
 ```
 
-
-
 ### IV Diagnostics
 
+Since the instrument (`assigned_treatment`) is fully randomized, the first stage is expected to be very strong and both statistics should comfortably exceed 10.
+
 ```{python}
+# first_stage() must be called before IV_Diag()
+fit_late.first_stage()
 fit_late.IV_Diag()
+print(f"First-stage F-statistic : {fit_late._f_stat_1st_stage:.2f}")
+print(f"Effective F-statistic   : {fit_late._eff_F:.2f}")
 ```
 
 
@@ -297,12 +445,14 @@ Weak instruments --- instruments that are only loosely correlated with the endog
 
 ### The First-Stage F-Statistic
 
-The `.first_stage()` method re-estimates the first-stage regression and computes the **first-stage F-statistic**, which tests $H_0\colon \pi = 0$ (all instrument coefficients are jointly zero). The classic rule of thumb is $F > 10$ (Stock & Yogo, 2005).
+The `.first_stage()` method re-estimates the first-stage regression and computes the **first-stage F-statistic**, which tests $H_0\colon \pi = 0$ (all instrument coefficients are jointly zero). The classic rule of thumb is $F > 10$ for iid errors (@stockyogo2005).
 
-The F-statistic adapts to your variance-covariance specification: if you fit the IV model with heteroskedasticity-robust or cluster-robust standard errors, the first-stage F-statistic is computed accordingly.
 
 ```{python}
 # Re-use the motherhood penalty IV model
+# Note: IV_Diag() switches vcov to hetero internally for the effective F computation.
+# Reset to iid here to get the iid-based first-stage F-statistic.
+fit_iv.vcov("iid")
 fit_iv.first_stage()
 
 # The F-stat is stored as an attribute after calling first_stage()
@@ -314,7 +464,7 @@ print(f"First-stage p-value:     {fit_iv._p_value_1st_stage:.4f}")
 
 ### The Effective F-Statistic
 
-The standard F-statistic can be misleading when there are multiple endogenous regressors or when errors are non-homoskedastic. The **effective F-statistic** (Olea & Pflueger, 2013) is a more robust measure of instrument strength that remains valid under heteroskedasticity:
+The standard F-statistic can be misleading when there are multiple endogenous regressors or when errors are non-homoskedastic. The **effective F-statistic** (@oleapflueger2013) is a more robust measure of instrument strength that remains valid under heteroskedasticity:
 
 $$
 F_{\text{eff}} = \frac{\hat{\pi}' Q_{ZZ} \hat{\pi}}{\text{tr}(\hat{\Sigma} \, Q_{ZZ})}
@@ -322,6 +472,8 @@ $$
 
 where $\hat{\pi}$ are the first-stage coefficients on the excluded instruments, $Q_{ZZ} = Z'Z$, and $\hat{\Sigma}$ is the robust variance-covariance matrix of $\hat{\pi}$.
 
+@oleapflueger2013 provide tabulated critical values for $F_{\text{eff}}$ indexed by the effective degrees of freedom $K_{\text{eff}}$ and the maximum tolerable relative bias $\tau \in \{5\%,\, 10\%,\, 20\%,\, 30\%\}$. Under conditional homoscedasticity and no serial correlation, $K_{\text{eff}}$ reduces to the number of excluded instruments $L$, and with $L = 1$ and $\tau = 10\%$ the critical value is approximately 23.1 — above the conventional rule of thumb of 10. For publication-quality work, $F_{\text{eff}}$ should be compared against the tabulated Montiel Olea-Pflueger critical values for the relevant $K_{\text{eff}}$ and $\tau$. **`pyfixest` reports $F_{\text{eff}}$ only — critical values must be looked up manually.**
+
 The `.IV_Diag()` method computes both the standard F-statistic and the effective F-statistic in one call:
 
 ```{python}
@@ -356,9 +508,11 @@ This appendix documents the design rationale, parameter choices, and verificatio
 
 Every DGP encodes a **true causal effect** as a named constant and introduces an **unobserved confounder** $U$ that creates OVB. The bias follows the formula:
 
+::: {.appendix-math}
 $$
-\text{plim}(\hat{\beta}_{\text{OLS}}) = \beta_{\text{true}} + \underbrace{\frac{\text{Cov}(X,\, U)}{\text{Var}(X)} \cdot \gamma_U}_{\text{OVB}}
+\text{plim}(\hat{\beta}_{\text{OLS}}) = \beta_{\text{true}} + \underbrace{\frac{\text{Cov}(T,\, U)}{\text{Var}(T)} \cdot \gamma_U}_{\text{OVB}}
 $$
+:::
 
 where $\gamma_U$ is the effect of $U$ on $Y$. IV recovers $\beta_{\text{true}}$ because the instrument $Z$ is uncorrelated with $U$ (instrumental unconfoundedness), so $\text{Cov}(Z, U) = 0$.
 
@@ -367,6 +521,7 @@ where $\gamma_U$ is the effect of $U$ on $Y$. IV recovers $\beta_{\text{true}}$
 ### App 1 — `get_ivf_data(N=2000, seed=1234)` {.unlisted}
 
 **Story:** Motherhood penalty. Career ambition is the unobserved confounder.
+
 **Instrument:** IVF treatment success — quasi-random conditional on seeking treatment.
 
 #### Structural equations
@@ -375,20 +530,21 @@ where $\gamma_U$ is the effect of $U$ on $Y$. IV recovers $\beta_{\text{true}}$
 |---|---|
 | `career_ambition` ($U$) | $\sim \mathcal{N}(0,\,1)$ — unobserved |
 | `ivf_success` ($Z$) | $\sim \text{Bernoulli}(0.45)$ — instrument, $\perp U$ |
-| `num_children` ($X$) | $= 1.2 - 0.4\cdot U + 0.8\cdot Z + \mathcal{N}(0,\,0.5)$ |
-| `earnings` ($Y$) | $= 10 + 0.6\cdot U - 0.15\cdot X + \mathcal{N}(0,\,1)$ |
+| `num_children` ($T$) | $= 1.2 - 0.4\cdot U + 0.8\cdot Z + \mathcal{N}(0,\,0.5)$ |
+| `earnings` ($Y$) | $= 10 + 0.6\cdot U - 0.15\cdot T + \mathcal{N}(0,\,1)$ |
 
 #### Named parameters
 
 | Parameter | Value | Role |
 |---|---|---|
 | `true_effect` | $-0.15$ | Structural coefficient: children $\to$ earnings |
-| `ambition_on_children` | $-0.4$ | Confounder $\to$ $X$ (negative: ambition reduces fertility) |
+| `ambition_on_children` | $-0.4$ | Confounder $\to$ $T$ (negative: ambition reduces fertility) |
 | `ambition_on_earnings` | $0.6$ | Confounder $\to$ $Y$ (positive: ambition raises earnings) |
-| `ivf_on_children` | $0.8$ | Instrument $\to$ $X$ (first-stage strength) |
+| `ivf_on_children` | $0.8$ | Instrument $\to$ $T$ (first-stage strength) |
 
 #### OVB derivation
 
+::: {.appendix-math}
 $$
 \text{Cov}(\text{num\_children},\, U) \approx -0.4 \cdot \text{Var}(U) = -0.4
 $$
@@ -400,6 +556,7 @@ $$
 \qquad \Rightarrow \qquad
 \hat\beta_{\text{OLS}} \approx -0.15 + (-0.42) \approx -0.57
 $$
+:::
 
 OLS overstates the penalty roughly 4×. IV recovers $\approx -0.15$.
 
@@ -413,6 +570,7 @@ OLS overstates the penalty roughly 4×. IV recovers $\approx -0.15$.
 ### App 2 — `get_bartik_data(N=300, seed=1234)` {.unlisted}
 
 **Story:** Immigration and local wages. Local demand shock is the unobserved confounder.
+
 **Instrument:** Bartik shift-share — national inflow shocks interacted with historical settlement shares.
 
 #### Structural equations
@@ -422,27 +580,29 @@ OLS overstates the penalty roughly 4×. IV recovers $\approx -0.15$.
 | `local_demand` ($U$) | $\sim \mathcal{N}(0,\,1)$ — unobserved |
 | `bartik_instrument` ($Z$) | $\sim \mathcal{N}(0,\,1)$ — instrument, $\perp U$ |
 | `log_population` | $= 2 + 0.1\cdot U + \mathcal{N}(0,\,0.3)$ — control |
-| `immigration` ($X$) | $= 0.5 + 0.7\cdot Z + 0.9\cdot U + \mathcal{N}(0,\,0.5)$ |
-| `wages` ($Y$) | $= 8 + 0.5\cdot U - 0.3\cdot X + 0.2\cdot\texttt{log\_pop} + \mathcal{N}(0,\,1)$ |
+| `immigration` ($T$) | $= 0.5 + 0.7\cdot Z + 0.9\cdot U + \mathcal{N}(0,\,0.5)$ |
+| `wages` ($Y$) | $= 8 + 0.5\cdot U - 0.3\cdot T + 0.2\cdot\texttt{log\_pop} + \mathcal{N}(0,\,1)$ |
 
 #### Named parameters
 
 | Parameter | Value | Role |
 |---|---|---|
 | `true_effect` | $-0.3$ | Structural coefficient: immigration $\to$ wages |
-| `demand_on_immig` | $0.9$ | Confounder $\to$ $X$ (positive: booming regions attract immigrants) |
+| `demand_on_immig` | $0.9$ | Confounder $\to$ $T$ (positive: booming regions attract immigrants) |
 | `demand_on_wages` | $0.5$ | Confounder $\to$ $Y$ (positive: booming regions pay more) |
-| `bartik_on_immig` | $0.7$ | Instrument $\to$ $X$ (first-stage strength) |
+| `bartik_on_immig` | $0.7$ | Instrument $\to$ $T$ (first-stage strength) |
 
 #### OVB derivation
 
 Both `demand_on_immig` and `demand_on_wages` are positive, so $\text{Cov}(\text{immigration},\, U \mid \text{log\_pop}) > 0$ and $\gamma_U = 0.5 > 0$, giving a **positive OVB**:
 
+::: {.appendix-math}
 $$
 \text{OVB} = 0.5 \cdot \frac{0.9}{\text{Var}(\text{immigration} \mid \text{log\_pop})} > 0
 \qquad \Rightarrow \qquad
 \hat\beta_{\text{OLS}} \approx -0.3 + \text{positive} > -0.3
 $$
+:::
 
 OLS attenuates (or reverses) the negative wage effect. IV recovers $\approx -0.3$.
 
@@ -455,8 +615,10 @@ OLS attenuates (or reverses) the negative wage effect. IV recovers $\approx -0.3
 
 ### App 3 — `get_encouragement_data(N=4000, seed=1234)` {.unlisted}
 
-**Story:** A/B encouragement design with imperfect compliance. No unobserved confounder — endogeneity arises from self-selection into adoption.
+**Story:** A/B encouragement design with imperfect compliance. Unobserved confounders drive self-selection into adoption, making T endogenous despite randomized encouragement.
+
 **Instrument:** `assigned_treatment` — fully randomized.
+
 **Fixed effect:** `user_type` $\in \{0, 1, 2\}$ — absorbed in all three models.
 
 #### Structural equations
@@ -478,6 +640,7 @@ OLS attenuates (or reverses) the negative wage effect. IV recovers $\approx -0.3
 
 #### Wald identity (exact by construction)
 
+::: {.appendix-math}
 $$
 \text{First Stage} = 0.70 - 0.15 = 0.55
 \qquad
@@ -485,6 +648,7 @@ $$
 \qquad
 \text{LATE} = \frac{1.10}{0.55} = 2.0 \checkmark
 $$
+:::
 
 #### Verification checks
 - `fit_fs.coef()["assigned_treatment"]` $\approx 0.55$.
diff --git a/docs/tutorials/references.bib b/docs/tutorials/references.bib
index b4d9864fc..408eab7fc 100644
--- a/docs/tutorials/references.bib
+++ b/docs/tutorials/references.bib
@@ -24,6 +24,26 @@ @article{lundborg2017
   year    = {2017}
 }
 
+@article{oleapflueger2013,
+  title   = {A Robust Test for Weak Instruments},
+  author  = {Montiel Olea, José Luis and Pflueger, Carolin},
+  journal = {Journal of Business \& Economic Statistics},
+  volume  = {31},
+  number  = {3},
+  pages   = {358--369},
+  year    = {2013}
+}
+
+@incollection{stockyogo2005,
+  title     = {Testing for Weak Instruments in Linear {IV} Regression},
+  author    = {Stock, James H. and Yogo, Motohiro},
+  booktitle = {Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg},
+  editor    = {Andrews, Donald W. K. and Stock, James H.},
+  publisher = {Cambridge University Press},
+  pages     = {80--108},
+  year      = {2005}
+}
+
 @article{borusyak2022,
   title   = {Quasi-Experimental Shift-Share Research Designs},
   author  = {Borusyak, Kirill and Hull, Peter and Jaravel, Xavier},

From f8e73d867a0163f4d546285056497fb7c1ac0502 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Wed, 11 Mar 2026 14:55:51 +0000
Subject: [PATCH 7/7] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 docs/tutorials/instrumental-variables.qmd | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/tutorials/instrumental-variables.qmd b/docs/tutorials/instrumental-variables.qmd
index d093d33a6..3fcc2d636 100644
--- a/docs/tutorials/instrumental-variables.qmd
+++ b/docs/tutorials/instrumental-variables.qmd
@@ -299,14 +299,14 @@ pf.coefplot([fit_ols_b, fit_iv_b], keep="immigration")
 
 **Estimating the effect of feature adoption on revenue when users don't comply with treatment assignment.**
 
-A tech company runs an A/B test: half of users are *encouraged* (shown a banner) to try a 
-new feature. But not everyone who sees the banner actually adopts the feature, and some 
+A tech company runs an A/B test: half of users are *encouraged* (shown a banner) to try a
+new feature. But not everyone who sees the banner actually adopts the feature, and some
 control users discover it on their own. The simple intent-to-treat (ITT) comparison ---
-contrasting encouraged and non-encouraged groups --- estimates the effect of encouragement, 
+contrasting encouraged and non-encouraged groups --- estimates the effect of encouragement,
 not of adoption itself, and these two quantities differ whenever compliance is imperfect.
 
-To recover the effect of actual adoption, we partition users into four compliance types: 
-*compliers* who adopt if and only if encouraged, *always-takers* who adopt regardless, 
+To recover the effect of actual adoption, we partition users into four compliance types:
+*compliers* who adopt if and only if encouraged, *always-takers* who adopt regardless,
 *never-takers* who never adopt, and *defiers* who do the opposite of their assignment. We
 then invoke **monotonicity**: no user is a defier, meaning encouragement can only weakly
 increase the probability of adoption.