Replace legacy Draper–Smith stepwisefit with a MATLAB-compatible stepwisefit #345

Pasta-coder · 2025-12-19T10:38:56Z

This PR replaces the legacy stepwisefit implementation with a ground-up, well-tested implementation aimed at closer MATLAB parity and better diagnostics. The main user-visible changes and new capabilities are:

Core behavior

Full stepwise selection based on conditional p-values (forward-add then backward-remove cycle) using robust regress calls for candidate evaluation.

Selection performed on optionally standardized predictors (Scale = "on"), while final reported coefficients are on the original data scale.

Missing-value handling: rows with any NaN in X or y are ignored and recorded in stats.wasnan.

Name–Value interface (MATLAB-like)

Supports the following Name–Value options (case-insensitive):

"InModel" — logical vector specifying initial model.
"Keep" — logical vector of predictors that must remain in/out.
"PEnter" — scalar threshold for entering variables (default 0.05).
"PRemove" — scalar threshold for removing variables (default max(PEnter, 0.1)).
"Scale" — "on" / "off" (selection only).
"MaxIter" — positive integer, max number of stepwise iterations.
"Display" — "on" / "off" (accepted for interface compatibility).

Validates inputs (types, lengths) and errors early on invalid options.

Diagnostics & outputs

Returns the full MATLAB-style output set:

b — p×1 coefficient vector (estimates for included predictors and conditional refit for excluded predictors).
se — standard errors (NaN where df insufficient).
pval — two-sided p-values.
`finalmodel — logical row vector indicating selected predictors.
stats — structure with detailed diagnostics:
`source, df0, dfe, SStotal, SSresid, fstat, pval, rmse, xr, yr, B, SE, TSTAT, PVAL, covb, intercept, wasnan.
nextstep — placeholder scalar (currently 0).
history — structure summarizing final model; includes in, df0, rmse, B (B currently a final-state snapshot).
Computes covb (covariance) for intercept + selected predictors and fills a (p+1)x(p+1) matrix with NaN elsewhere to preserve shape.
Computes xr — residuals for predictors not in the final model (orthogonalized w.r.t. the final model), suitable for diagnostics.

Robustness & edge cases

Guards regress calls and degenerate fits (singular designs) to avoid crashes.
Handles small degrees-of-freedom (df <= 0) by returning NaNs for SE/p-values rather than throwing.
Validates Keep and InModel vector lengths and types.

Tests added

Comprehensive BISTs covering:
MATLAB-parity numeric examples (hald and synthetic inline datasets).
Input validation tests (invalid Name–Value, mismatched lengths, Keep behavior).
Shape and content checks for stats fields, covb, xr orthogonality, RMSE/F-statistic consistency.
Scale on/off contract test (shape-preserving).
Intercept-only behavior, small-n robustness, and general shape/contract tests for outputs.

Rationale & design notes

The goal was practical MATLAB compatibility: match MATLAB’s observable behaviour and numeric contract for common usages (selection logic, outputs, statistical fields) rather than reproducing every internal implementation detail.

Selection uses conditional p-values computed from regress fits. Conditional refits for excluded predictors are performed so b, se, and pval match the documented MATLAB contract (estimates for excluded predictors come from fitting the final model plus that predictor).

Scale is applied only during selection to replicate MATLAB’s behavior: selected terms are those that would be chosen if columns were standardized, but final reported statistics are produced on the original scale.

history is intentionally conservative in Phase 1: a final-state summary is provided and history.B stores last-step coefficients. This avoids producing large, possibly inconsistent arrays before we are confident of consistent per-step behavior between Octave and MATLAB.

What remains / future scope (recommended next phases)

I split remaining work into prioritized phases so reviewers and maintainers can assess risk and scope.

Phase 2

nextstep logic — compute the recommended next action (index of variable to add/remove) rather than returning 0. Tests should assert expected next-step values for synthetic multi-step cases.

Full per-step history — build history.in as an (nsteps × p) logical array and history.B as p×nsteps coefficients matrix (excluding intercept in MATLAB convention). This provides parity for users who inspect model evolution.

Positional argument support — accept legacy positional calls stepwisefit(X,y,penter,premove,method) and [] defaults in positional slots, to match older MATLAB usage patterns.

Display output formatting — optionally print step messages matching MATLAB when Display='on'.

Phase 3

Exact tie-breaking determinism — ensure deterministic choice when candidate p-values are numerically equal; add tests to enforce parity with MATLAB in deterministic examples.

Single-precision input support — maintain input type semantics when single is provided.

Small performance improvements — avoid redundant regress calls where possible; microbenchmarks and targeted caching.

Backwards compatibility & risk

The public function signature remains stepwisefit(X, y, varargin). This is a breaking change only with regards to older code that relied on the previous, less featureful Octave implementation. The new implementation is intentionally stricter on input validation to avoid silent misbehavior.

Tests and early runtime checks are included to reduce the risk of regressions elsewhere in the package.

Performance: the new implementation performs many regress calls during selection. For very high-dimensional datasets this may be slower than the legacy implementation — performance profiling and optimizations will be pursued in follow-up PR(s) if necessary.

…errors

Pasta-coder · 2025-12-21T22:24:40Z

open for feedbacks and discussions

pr0m1th3as · 2025-12-23T11:42:29Z

In code commented lines should always use ## instead of %.
Do not use try ... catch blocks in BISTs.

pr0m1th3as · 2025-12-23T11:45:11Z

Use the private pairedArgs function to parse paired arguments and declare their default values. You can study a working example of its use in the latest grpstats function starting at line 240.

Pasta-coder · 2025-12-23T12:46:14Z

Thanks for feedback , will update accordingly.

Pasta-coder · 2025-12-23T15:30:29Z

Replace manual Name–Value parsing with pairedArgs and centralized defaults
Convert all code comments to Octave-style '##'
Validate Scale and Display options explicitly
Remove try/catch usage from BISTs and use fail() for error tests
Minor style and consistency cleanups

No numerical or behavioral changes intended.
All tests pass.

pr0m1th3as · 2025-12-23T17:39:55Z

At the very end of the file, add error tests for all input validation. It should be like

%!error <error message> stepwisefit (syntax that causes the specific error being tested)

See other functions to get an idea.

Pasta-coder · 2025-12-23T18:10:21Z

added remaining input validation and corresponding tests

Pasta-coder added 19 commits December 14, 2025 20:45

stepwisefit: add baseline tests documenting current Octave behavior

78bcf8f

stepwisefit: initialize documented outputs to avoid undefined return …

a4bb012

…errors

stepwisefit: validate scalar penter before calling regress

80ac34e

stepwisefit: BISTs for input validation

6537985

stepwisefit: remove unwanted BISTs

2f2fb52

stepwisefit: cleanup

c02a785

stepwisefit1 :wrapper around stepwisefit

be7e7e7

stepwisefit1 :textinfo update

6e8a480

stepwisefit1 : added Name–Value parsing + InModel support

f74817b

stepwisefit1 : minor input fix

2c47d11

stepwisefit1 : stats field mapping + BISTs addition

7f05e38

stepwisefit1 : history + BIST

9fa8961

stepwisefit1 : name-value

1302352

stepwisefit1 : algo implementation of ,,+BISTs

091895f

stepwisefit1 : textinfo update

b3a2ae0

stepwisefit : migration_1

0a20896

stepwisefit : migration_1

4e72dc6

stepwisefit : migration_2

b394b3c

stepwisefit : migration_3

4fee699

Pasta-coder changed the title ~~Stepwisefit new~~ Replace legacy Draper–Smith stepwisefit with a MATLAB-compatible stepwisefit Dec 21, 2025

stepwisefit : textinfo update

5602dd8

Pasta-coder marked this pull request as ready for review December 21, 2025 22:24

stepwisefit: use pairedArgs, fix comment style, clean up tests

14a641d

stepwisefit: input validation and tests

a4d9bca

small fixes

2ba21c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace legacy Draper–Smith stepwisefit with a MATLAB-compatible stepwisefit #345

Replace legacy Draper–Smith stepwisefit with a MATLAB-compatible stepwisefit #345

Uh oh!

Pasta-coder commented Dec 19, 2025 •

edited

Loading

Uh oh!

Pasta-coder commented Dec 21, 2025

Uh oh!

pr0m1th3as commented Dec 23, 2025

Uh oh!

pr0m1th3as commented Dec 23, 2025

Uh oh!

Pasta-coder commented Dec 23, 2025

Uh oh!

Pasta-coder commented Dec 23, 2025

Uh oh!

pr0m1th3as commented Dec 23, 2025

Uh oh!

Pasta-coder commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Replace legacy Draper–Smith stepwisefit with a MATLAB-compatible stepwisefit #345

Are you sure you want to change the base?

Replace legacy Draper–Smith stepwisefit with a MATLAB-compatible stepwisefit #345

Uh oh!

Conversation

Pasta-coder commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pasta-coder commented Dec 21, 2025

Uh oh!

pr0m1th3as commented Dec 23, 2025

Uh oh!

pr0m1th3as commented Dec 23, 2025

Uh oh!

Pasta-coder commented Dec 23, 2025

Uh oh!

Pasta-coder commented Dec 23, 2025

Uh oh!

pr0m1th3as commented Dec 23, 2025

Uh oh!

Pasta-coder commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pasta-coder commented Dec 19, 2025 •

edited

Loading