Skip to content

Conversation

@patelheet30
Copy link
Owner

Added

  • check_date_range_anomalies: New check to identify date columns with values that fall outside a specified range (e.g., too old or too recent).

Changed

  • Added threshold_years parameter to check_date_range_anomalies to specify the number of years for anomaly detection.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new data quality check to detect date columns with suspiciously wide date ranges that may indicate data quality issues such as incorrect date parsing or data entry errors.

Key Changes

  • New check_date_range_anomalies function that identifies date columns where the range between minimum and maximum dates exceeds a configurable threshold (default 50 years)
  • Comprehensive test suite covering edge cases including string dates, NaN values, custom thresholds, and column-specific checking
  • Integration with the pandas accessor's report functionality via the date_anomalies check

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/lintdata/checks.py Implements the check_date_range_anomalies function to detect date columns with ranges exceeding the threshold
src/lintdata/accessor.py Adds threshold_years parameter and integrates the new check into the report method
tests/test_checks.py Adds 11 comprehensive test cases covering normal ranges, wide ranges, string dates, custom thresholds, edge cases, and multiple columns
CHANGELOG.md Documents the new feature in version 0.8.0
README.md Marks the Date Range Anomalies Check as completed

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +154 to +156
"date_anomalies": lambda: checks.check_date_range_anomalies(
self._df, columns=future_date_columns, threshold_years=threshold_years
),
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date_anomalies check is incorrectly using future_date_columns instead of having its own dedicated parameter for specifying columns to check. This reuses the parameter intended for the future_dates check, which could lead to confusion and unexpected behavior.

The report function should either:

  1. Add a new parameter like date_anomaly_columns specifically for this check, or
  2. Pass None to let the function auto-detect date columns

For consistency with other checks like negative_values (which has negative_value_columns), option 1 is recommended.

Copilot uses AI. Check for mistakes.
@patelheet30 patelheet30 merged commit 5f91840 into main Nov 26, 2025
4 checks passed
@patelheet30 patelheet30 deleted the feat/date-range-anomalies branch November 26, 2025 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants