Fix edge cases, add validation, and implement CI#3
Open
econbernardo wants to merge 13 commits intomainfrom
Open
Fix edge cases, add validation, and implement CI#3econbernardo wants to merge 13 commits intomainfrom
econbernardo wants to merge 13 commits intomainfrom
Conversation
When all bootstrap estimates fall on one side of beta_hat, p_star becomes 0 or 1, causing norm.ppf() to return -inf or +inf. This propagated through the BCa/BC formulas and caused np.quantile() to crash with: "ValueError: Quantiles must be in the range [0, 1]" Added explicit check for p_star == 0 or p_star == 1 in both BCa and BC branches, raising a clear ValueError explaining the issue and suggesting to increase n_sim or check for data issues. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The tqdm progress bar was showing "999/999" for n_sim=1000 because the first iteration was done outside the loop. Added total=n_sim and initial=1 parameters to tqdm so it correctly displays "1000/1000" at completion. Applied same fix to jackknife progress bar. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added validation in __init__: - df: must be a non-empty pandas DataFrame - dependent_var: must be a string present in df columns - independent_vars: must be a non-empty list of strings, all in df columns - alpha: must be a number strictly between 0 and 1 Added validation in perform_bootstrap: - n_sim: must be a positive integer All validations raise TypeError or ValueError with clear messages. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use block-based ASCII characters " ▖▘▝▗▚▞█" for a smoother progress bar appearance in both bootstrap and jackknife methods. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added random_state parameter to __init__ that accepts: - None (default): uses a new numpy Generator with random seed - int: seeds a new numpy Generator for reproducibility - numpy.random.Generator: uses the provided generator directly Bootstrap sampling now uses the internal RNG (self._rng) instead of the global numpy random state, enabling reproducible results when the same random_state is provided. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Removed .tolist() conversion in perform_bootstrap() and perform_jackknife(). Distributions are now stored as dictionaries mapping variable names to numpy arrays instead of Python lists. This improves memory efficiency for large n_sim values by avoiding Python object overhead per element. The bca_estimate() method already uses np.asarray() so it works seamlessly with numpy arrays. Updated docstrings to reflect the new return types. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added RuntimeError checks to ensure proper method call order: - perform_bootstrap() now requires run_regression() to be called first - perform_jackknife() now requires run_regression() to be called first This prevents confusing errors when methods are called out of order and makes the expected workflow explicit. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New method that computes BCa confidence intervals for all regression coefficients at once, returning a pandas DataFrame with columns: - coef: original OLS coefficient - bias_corrected: bias-corrected coefficient - ci_low: lower bound of confidence interval - ci_high: upper bound of confidence interval Includes workflow validation (requires run_regression, perform_bootstrap, and perform_jackknife to be called first). Supports all CI types (BCa, BC, perc). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added cov_type parameter to __init__ that accepts 'HC0', 'HC1', 'HC2', or 'HC3' for different heteroscedasticity-consistent covariance estimators. Default remains 'HC0' for backward compatibility. HC1-HC3 provide small-sample corrections that typically result in wider confidence intervals, which can be more appropriate for smaller datasets. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The BCa formula computes alpha values using: alpha = norm.cdf(z0 + (z + z0) / (1 - ahat * (z + z0))) When ahat * (z + z0) equals 1, division by zero occurs. This can happen with extremely skewed jackknife distributions (|ahat| ≈ 0.51). Added explicit check for zero denominators with a clear error message suggesting to use CI_type='BC' or 'perc' as alternatives. Also refactored to compute denominators once and reuse them. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Variable names containing special characters (spaces, operators,
parentheses, etc.) could break the statsmodels formula parser or
cause unexpected behavior.
Added quote_if_needed() helper that wraps variable names in Q()
only when they contain problematic characters. Normal variable
names remain unquoted for backward compatibility.
Supported special characters: space, +, -, *, /, (, ), [, ], {, }, :, ~, ^
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add comprehensive test suite with 42 tests covering: - Input validation (15 tests) - Workflow enforcement (5 tests) - Basic functionality (5 tests) - compute_all_bca method (4 tests) - Reproducibility with random_state (3 tests) - cov_type parameter (2 tests) - Edge cases (4 tests) - Special characters in variable names (4 tests) - Add GitHub Actions workflow that: - Runs on push/PR to main - Tests Python 3.9, 3.10, 3.11, 3.12 - Generates coverage reports - Add pyproject.toml with pytest configuration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Owner
Author
|
waiting for checks to pass |
Owner
Author
|
Go over examples before merging |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug fixes: Crash on degenerate bootstrap, progress bar count, BCa division by zero
New features:
random_state,cov_type,compute_all_bca(), special character supportQuality: Input validation, workflow checks, 42 pytest tests, GitHub Actions CI (Python 3.9–3.12)