Skip to content

Fix crash in IV estimation when demeaning does not converge#639

Open
adamaltmejd wants to merge 1 commit intolrberge:masterfrom
adamaltmejd:fix/iv-nan-ssr-check
Open

Fix crash in IV estimation when demeaning does not converge#639
adamaltmejd wants to merge 1 commit intolrberge:masterfrom
adamaltmejd:fix/iv-nan-ssr-check

Conversation

@adamaltmejd
Copy link
Copy Markdown

Summary

  • When the demeaning algorithm does not converge (e.g. with varying slopes and weights), my_res$ssr is NaN. The IV first-stage checks my_res$ssr < 1e-10 then produce NA, causing the if() to error with "missing value where TRUE/FALSE needed".
  • Wrapping in isTRUE() lets execution continue so the existing convergence warning surfaces, consistent with the non-IV code path.
  • Two identical blocks in estimation.R (with and without fixed effects) are both fixed.

Note

Hard to reproduce without specific data that triggers demeaning non-convergence in the IV first stage — but the fix is minimal and safe (isTRUE() is a no-op when the value is already TRUE/FALSE).

🤖 Generated with Claude Code

When the demeaning algorithm does not converge (e.g. with varying slopes
and weights), SSR can be NaN. The bare comparisons `my_res$ssr < 1e-10`
then produce NA, causing `if (error_endo_no_variation || error_inst_no_expl)`
to error with "missing value where TRUE/FALSE needed".

Wrapping in isTRUE() lets execution continue so the existing convergence
warning surfaces, consistent with the non-IV code path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 19, 2026 10:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an IV estimation crash when the demeaning algorithm fails to converge and produces NaN/NA SSR values, aligning IV behavior with the non-IV path (warning instead of hard error).

Changes:

  • Wrap IV first-stage SSR-based error checks in isTRUE() to prevent if() from receiving NA.
  • Apply the fix in both duplicated IV first-stage blocks (with and without fixed effects).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +1352 to 1355
# NaN ssr arises from demeaning non-convergence; isTRUE guards against NA in the if()
error_endo_no_variation = isTRUE(my_res$ssr < 1e-10)
error_inst_no_expl = isTRUE(abs(my_res$ssr - my_res$ssr_no_inst) < 1e-10)
if(error_endo_no_variation || error_inst_no_expl){
@lrberge
Copy link
Copy Markdown
Owner

lrberge commented Mar 19, 2026

Hi Adam, I don't understand: $ssr should never be NA. Did you actually witness this case?

@adamaltmejd
Copy link
Copy Markdown
Author

Sorry already deleted the files setup that was creating it but I was running into a problem that seems to have been caused by it. Below are my initial debugging and here is Claude's answer:

To clarify: $ssr is NaN, not NA. It happens when the demeaning algorithm doesn't converge (in my case: IV with varying slopes and weights, hitting the
10k iteration limit). The non-converged residuals are NaN, so $ssr becomes NaN.
The problem is that NaN < 1e-10 evaluates to NA in R, so the if(error_endo_no_variation || error_inst_no_expl) fails with "missing value where TRUE/FALSE needed."
Without IV, the same non-convergence just produces a warning and NaN estimates — which is the behavior isTRUE() restores for the IV path.

#errors
feols(xpd(capital_income_t10_t14_log ~ i(field):parent_degree_in_j | prio + id_margin + id_round + ag_pooled[cutoff_distance, above_cutoff * cutoff_distance] | enrolled:i(field) +
enrolled:i(field):parent_degree_in_j~above_cutoff:i(field) +above_cutoff:i(field):parent_degree_in_j), data = dt, weights = w)
# NOTES: 50,831 observations removed because of NA values (LHS: 50,831).
#        1/0/0/0 fixed-effect singleton was removed (1 observation).
# Error in if (error_endo_no_variation || error_inst_no_expl) { :
#   missing value where TRUE/FALSE needed

# also errors
# moving fe to coef
feols(xpd(capital_income_t10_t14_log ~ id_round + i(field):parent_degree_in_j | prio + id_margin + ag_pooled[cutoff_distance, above_cutoff * cutoff_distance] | enrolled:i(field) +
enrolled:i(field):parent_degree_in_j~above_cutoff:i(field) +above_cutoff:i(field):parent_degree_in_j), data = dt, weights = w)


# works
# without weights
feols(xpd(capital_income_t10_t14_log ~ i(field):parent_degree_in_j | prio + id_margin + id_round + ag_pooled[cutoff_distance, above_cutoff * cutoff_distance] | enrolled:i(field) +
enrolled:i(field):parent_degree_in_j~above_cutoff:i(field) +above_cutoff:i(field):parent_degree_in_j), data = dt)
# slightly fewer fixed effects
feols(xpd(capital_income_t10_t14_log ~ i(field):parent_degree_in_j | prio + id_margin + ag_pooled[cutoff_distance, above_cutoff * cutoff_distance] | enrolled:i(field) +
enrolled:i(field):parent_degree_in_j~above_cutoff:i(field) +above_cutoff:i(field):parent_degree_in_j), data = dt, weights = w)


# hints

# without iv it warns about non-convergence
feols(xpd(capital_income_t10_t14_log ~ above_cutoff:i(field) +above_cutoff:i(field):parent_degree_in_j + i(field):parent_degree_in_j | prio + id_margin + id_round +
ag_pooled[cutoff_distance, above_cutoff * cutoff_distance]), data = dt, weights = w)
# NOTES: 50,831 observations removed because of NA values (LHS: 50,831).
#        1/0/0/0 fixed-effect singleton was removed (1 observation).
# OLS estimation, Dep. Var.: capital_income_t10_t14_log
# Observations: 28,849
# Weights: w
# Fixed-effects: prio: 17,  id_margin: 11,  id_round: 60,  ag_pooled: 30
# Varying slopes: cutoff_distance (ag_pooled): 30,  above_cutoff * cutoff_distance (ag_pooled): 30
# Standard-errors: NA (not-available)
#                                     Estimate Std. Error t value Pr(>|t|)
# above_cutoff:field::Agriculture          NaN        NaN     NaN       NA
# above_cutoff:field::Business             NaN        NaN     NaN       NA
# above_cutoff:field::Health               NaN        NaN     NaN       NA
# above_cutoff:field::Humanities           NaN        NaN     NaN       NA
# above_cutoff:field::Law                  NaN        NaN     NaN       NA
# above_cutoff:field::Medicine             NaN        NaN     NaN       NA
# above_cutoff:field::Natural science      NaN        NaN     NaN       NA
# above_cutoff:field::Services             NaN        NaN     NaN       NA
# ... 25 coefficients remaining (display them with summary() or use argument n)
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


# # Evaluations: lhs: 6713, rhs: 10000, 5603, 93, 34, 71, 62, 99, 73 -- tol: 1e-06, iter: 10000
# Warning messages:
# 1: In feols(xpd(capital_income_t10_t14_log ~ 0 + above_cutoff:i(field) +  :
#   There seems to be a convergence problem due to the presence of variables with varying slopes. The precision of the estimates may not be great. As a workaround, and if there are not
too many slopes, you can use the variables with varying slopes as regular variables using the function i (see ?i). Or use a lower 'fixef.tol' (sometimes it works)?
# 2: The demeaning algorithm did not converge, the results are not reliable. (tol: 1e-06, iter: 10000)

@lrberge
Copy link
Copy Markdown
Owner

lrberge commented Mar 19, 2026

I don't think the issue is the ssr. Non convergence should not imply NA coefficients or residuals, I think there's a bug in the demeaning step or right after it.

Could you share this example?

@adamaltmejd
Copy link
Copy Markdown
Author

Make sense, I'll try with some more debugging. Data is PII and I can't use Claude directly so its not easy, but I'll see if i can generate a reprex...

@adamaltmejd
Copy link
Copy Markdown
Author

Ran some more diagnostics on the actual data (PII, can't share). 5 tests on the same data with different configurations:

Full diagnostic log (fixest 0.14.1)
# fixest NaN SSR diagnostic
# fixest 0.14.1
# R version 4.5.3 (2026-03-11)

## Setup
- Data rows after filtering & bandwidth: 259112
- Non-NA outcome obs: 82579
- Weights: uniform kernel, inverse frequency by id
- FEs: prio + id_margin + id_round + ag_pooled[cutoff_distance, above_cutoff * cutoff_distance]
- IV formula: c(capital_income_t10_t14_log) ~ i(field):parent_degree_in_j |
    prio + id_margin + id_round + ag_pooled[cutoff_distance, above_cutoff * cutoff_distance] |
    enrolled:i(field) + enrolled:i(field):parent_degree_in_j ~
    above_cutoff:i(field) + above_cutoff:i(field):parent_degree_in_j

## Test 1: OLS (no IV), same FEs, with weights
# WARNING: There seems to be a convergence problem due to the presence
#   of variables with varying slopes. [...]
$ssr             = NaN
is.nan($ssr)     = TRUE
NaN coefficients = 33 / 33
NaN residuals    = 28849 / 28849

## Test 2: IV, same FEs, no weights
$ssr             = 129590.676381825
is.nan($ssr)     = FALSE
NaN coefficients = 0 / 33
NaN residuals    = 0 / 28849

## Test 3: Manual IV first stage, with weights
feols(enrolled ~ above_cutoff:i(field) + above_cutoff:i(field):parent_degree_in_j
    + i(field):parent_degree_in_j | prio + id_margin + id_round
    + ag_pooled[cutoff_distance, above_cutoff * cutoff_distance],
    data = data, weights = w)
# WARNING: There seems to be a convergence problem [...]
$ssr             = 14294.481078608
is.nan($ssr)     = FALSE
NaN coefficients = 0 / 33
NaN residuals    = 0 / 79678

## Test 4: Full IV model with weights (the original crash)
# ERROR: missing value where TRUE/FALSE needed

## Test 5: IV with weights, fewer FEs (no id_round)
$ssr             = 116473.081749209
is.nan($ssr)     = FALSE

What this tells us:

  • Test 1 confirms that the outcome variable's demeaning does not converge with these FEs + weights → $ssr = NaN, all residuals NaN. The non-IV path handles this gracefully (warning, NaN estimates, no crash).
  • Test 3 shows the first stage alone converges fine (SSR is numeric, 0 NaN residuals), even with the same FEs and weights. So the first-stage demeaning is not the problem by itself.
  • Test 4 crashes. In the IV code path, cpp_demean is called jointly on the outcome and the IV variables. The outcome's demeaning fails (same as Test 1), producing NaN residuals for y_demean. These NaN values then propagate through cpp_iv_products into the first-stage result's $ssr, which makes my_res$ssr < 1e-10 evaluate to NA, crashing the if().

So the first-stage SSR shouldn't normally be NaN on its own — the NaN originates from the outcome's non-convergent demeaning propagating through the joint computations. The isTRUE() guard prevents the crash and lets the existing convergence warning surface, matching the non-IV behavior from Test 1. Whether the deeper issue (NaN propagation from the outcome into the first-stage products) should also be addressed upstream is a separate question.


Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants