-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add date range anomalies check and respective tests #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new data quality check to detect date columns with suspiciously wide date ranges that may indicate data quality issues such as incorrect date parsing or data entry errors.
Key Changes
- New
check_date_range_anomaliesfunction that identifies date columns where the range between minimum and maximum dates exceeds a configurable threshold (default 50 years) - Comprehensive test suite covering edge cases including string dates, NaN values, custom thresholds, and column-specific checking
- Integration with the pandas accessor's report functionality via the
date_anomaliescheck
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/lintdata/checks.py | Implements the check_date_range_anomalies function to detect date columns with ranges exceeding the threshold |
| src/lintdata/accessor.py | Adds threshold_years parameter and integrates the new check into the report method |
| tests/test_checks.py | Adds 11 comprehensive test cases covering normal ranges, wide ranges, string dates, custom thresholds, edge cases, and multiple columns |
| CHANGELOG.md | Documents the new feature in version 0.8.0 |
| README.md | Marks the Date Range Anomalies Check as completed |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "date_anomalies": lambda: checks.check_date_range_anomalies( | ||
| self._df, columns=future_date_columns, threshold_years=threshold_years | ||
| ), |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The date_anomalies check is incorrectly using future_date_columns instead of having its own dedicated parameter for specifying columns to check. This reuses the parameter intended for the future_dates check, which could lead to confusion and unexpected behavior.
The report function should either:
- Add a new parameter like
date_anomaly_columnsspecifically for this check, or - Pass
Noneto let the function auto-detect date columns
For consistency with other checks like negative_values (which has negative_value_columns), option 1 is recommended.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Added
check_date_range_anomalies: New check to identify date columns with values that fall outside a specified range (e.g., too old or too recent).Changed
threshold_yearsparameter tocheck_date_range_anomaliesto specify the number of years for anomaly detection.