-
Notifications
You must be signed in to change notification settings - Fork 48
Add CLI outcome columns option #233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds the --outcome_columns CLI argument to allow explicit selection of which columns should be treated as outcomes, addressing the need to exclude certain columns from outcome statistics without removing them from the input data.
Key Changes:
- Added
--outcome_columnsoptional CLI argument with validation - Modified outcome column inference in
process_batchto preserve DataFrame column order when using defaults - Added comprehensive test coverage for explicit selection, default inference with order preservation, and missing column validation
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| balance/cli.py | Added --outcome_columns argument, helper methods has_outcome_columns() and outcome_columns(), input validation for user-specified outcome columns, and modified process_batch to use list comprehension to preserve column order during default inference |
| tests/test_cli.py | Added helper methods _make_cli(), _make_batch_df(), and _recording_sample_cls(), plus three tests covering default inference with order preservation, explicit column selection, and missing column validation |
| CHANGELOG.md | Documented the new --outcome_columns feature under "New Features" section |
talgalili
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super cool.
Could you please make sure that if outcome columns are specified, all the remaining columns not in Id, weights, covars or outcome, would go to ignore column (which you've added before). And test that it works?
talgalili
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks 👍
|
@talgalili has imported this pull request. If you are a Meta employee, you can view this in D89977120. |
|
@talgalili merged this pull request in 54e0b34. |
--outcome_columnsto the CLI to explicitly choose outcome columns and preserved ordered defaults when omitted.Why?