Skip to content

Fix edge cases, add validation, and implement CI#3

Open
econbernardo wants to merge 13 commits intomainfrom
fix/edge-cases-and-validation
Open

Fix edge cases, add validation, and implement CI#3
econbernardo wants to merge 13 commits intomainfrom
fix/edge-cases-and-validation

Conversation

@econbernardo
Copy link
Owner

Bug fixes: Crash on degenerate bootstrap, progress bar count, BCa division by zero

New features: random_state, cov_type, compute_all_bca(), special character support

Quality: Input validation, workflow checks, 42 pytest tests, GitHub Actions CI (Python 3.9–3.12)

econbernardo and others added 13 commits January 31, 2026 23:51
When all bootstrap estimates fall on one side of beta_hat, p_star becomes
0 or 1, causing norm.ppf() to return -inf or +inf. This propagated through
the BCa/BC formulas and caused np.quantile() to crash with:
"ValueError: Quantiles must be in the range [0, 1]"

Added explicit check for p_star == 0 or p_star == 1 in both BCa and BC
branches, raising a clear ValueError explaining the issue and suggesting
to increase n_sim or check for data issues.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The tqdm progress bar was showing "999/999" for n_sim=1000 because the
first iteration was done outside the loop. Added total=n_sim and initial=1
parameters to tqdm so it correctly displays "1000/1000" at completion.

Applied same fix to jackknife progress bar.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added validation in __init__:
- df: must be a non-empty pandas DataFrame
- dependent_var: must be a string present in df columns
- independent_vars: must be a non-empty list of strings, all in df columns
- alpha: must be a number strictly between 0 and 1

Added validation in perform_bootstrap:
- n_sim: must be a positive integer

All validations raise TypeError or ValueError with clear messages.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use block-based ASCII characters " ▖▘▝▗▚▞█" for a smoother
progress bar appearance in both bootstrap and jackknife methods.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added random_state parameter to __init__ that accepts:
- None (default): uses a new numpy Generator with random seed
- int: seeds a new numpy Generator for reproducibility
- numpy.random.Generator: uses the provided generator directly

Bootstrap sampling now uses the internal RNG (self._rng) instead of
the global numpy random state, enabling reproducible results when
the same random_state is provided.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Removed .tolist() conversion in perform_bootstrap() and perform_jackknife().
Distributions are now stored as dictionaries mapping variable names to
numpy arrays instead of Python lists.

This improves memory efficiency for large n_sim values by avoiding
Python object overhead per element. The bca_estimate() method already
uses np.asarray() so it works seamlessly with numpy arrays.

Updated docstrings to reflect the new return types.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added RuntimeError checks to ensure proper method call order:
- perform_bootstrap() now requires run_regression() to be called first
- perform_jackknife() now requires run_regression() to be called first

This prevents confusing errors when methods are called out of order
and makes the expected workflow explicit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New method that computes BCa confidence intervals for all regression
coefficients at once, returning a pandas DataFrame with columns:
- coef: original OLS coefficient
- bias_corrected: bias-corrected coefficient
- ci_low: lower bound of confidence interval
- ci_high: upper bound of confidence interval

Includes workflow validation (requires run_regression, perform_bootstrap,
and perform_jackknife to be called first). Supports all CI types
(BCa, BC, perc).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added cov_type parameter to __init__ that accepts 'HC0', 'HC1', 'HC2',
or 'HC3' for different heteroscedasticity-consistent covariance
estimators. Default remains 'HC0' for backward compatibility.

HC1-HC3 provide small-sample corrections that typically result in
wider confidence intervals, which can be more appropriate for
smaller datasets.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The BCa formula computes alpha values using:
  alpha = norm.cdf(z0 + (z + z0) / (1 - ahat * (z + z0)))

When ahat * (z + z0) equals 1, division by zero occurs. This can
happen with extremely skewed jackknife distributions (|ahat| ≈ 0.51).

Added explicit check for zero denominators with a clear error message
suggesting to use CI_type='BC' or 'perc' as alternatives. Also
refactored to compute denominators once and reuse them.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Variable names containing special characters (spaces, operators,
parentheses, etc.) could break the statsmodels formula parser or
cause unexpected behavior.

Added quote_if_needed() helper that wraps variable names in Q()
only when they contain problematic characters. Normal variable
names remain unquoted for backward compatibility.

Supported special characters: space, +, -, *, /, (, ), [, ], {, }, :, ~, ^

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add comprehensive test suite with 42 tests covering:
  - Input validation (15 tests)
  - Workflow enforcement (5 tests)
  - Basic functionality (5 tests)
  - compute_all_bca method (4 tests)
  - Reproducibility with random_state (3 tests)
  - cov_type parameter (2 tests)
  - Edge cases (4 tests)
  - Special characters in variable names (4 tests)

- Add GitHub Actions workflow that:
  - Runs on push/PR to main
  - Tests Python 3.9, 3.10, 3.11, 3.12
  - Generates coverage reports

- Add pyproject.toml with pytest configuration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@econbernardo
Copy link
Owner Author

waiting for checks to pass

@econbernardo
Copy link
Owner Author

Go over examples before merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant