fix: Correct MATLAB datetime conversion using int64 ticks#67
fix: Correct MATLAB datetime conversion using int64 ticks#67
Conversation
I've revised the handling of MATLAB datetime objects in v7.3 .mat files.
The previous implementation incorrectly assumed datetimes were stored as
float64 serial day numbers. This commit corrects the conversion to
interpret `MATLAB_class='datetime'` fields as int64 microsecond ticks
from the epoch '0000-01-01 00:00:00'.
- I modified `mat73/core.py`:
- The `convert_mat` method now reads data for `MATLAB_class='datetime'`
as `np.int64`.
- I converted these ticks to Python `datetime` objects.
- I handled MATLAB `NaT` (Not-a-Time), represented by `intmin('int64')`,
by converting it to `None`.
- I adjusted for the epoch difference between MATLAB's assumed datetime
epoch ('0000-01-01') and Python's `datetime` epoch ('0001-01-01').
- I included error handling for potential overflows or value errors during
tick conversion.
- The existing tests in `tests/test_mat73.py` (`test_datetime_loading`)
will verify this corrected logic using the `testfile_datetime.mat` file you provided.
- `tests/create_mat.m` remains as previously updated, documenting the
script for `testfile_datetime.mat`.
This change addresses the `OverflowError` encountered with the previous
float-based conversion and aligns with MATLAB's storage of `datetime`
arrays as int64 ticks.
There was a problem hiding this comment.
Hello @skjerns, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini here, providing a summary of this pull request to help everyone quickly understand the changes.
This PR addresses an issue where MATLAB datetime objects stored in v7.3 .mat files were incorrectly interpreted as float64 serial day numbers. The correct approach, implemented here, is to read them as int64 ticks representing microseconds since the MATLAB epoch ('0000-01-01 00:00:00'). The changes involve updating the core conversion logic to handle this int64 format, converting these ticks to Python datetime objects while adjusting for the epoch difference between MATLAB and Python, and handling MATLAB's NaT (Not-a-Time) value.
Comprehensive tests have been added to verify the correct loading of various datetime formats, including scalars, vectors, matrices, datetimes within structs and cell arrays, datetimes with specific times (including milliseconds), NaT values, and dates far in the past or future.
Highlights
- Correct MATLAB datetime conversion: The core logic in
mat73/core.pyhas been updated to correctly interpret MATLABdatetimeobjects stored in v7.3 files. Instead of assumingfloat64serial day numbers, the code now reads the data asint64ticks. - Handle int64 microsecond ticks and epoch difference: The conversion now correctly interprets the
int64values as microseconds since MATLAB's '0000-01-01' epoch. The code adjusts for the difference between this epoch and Python'sdatetime(1, 1, 1)epoch (a 366-day difference due to year 0000 being a leap year in the proleptic Gregorian calendar). - Support for MATLAB NaT: MATLAB's
NaT(Not-a-Time), represented byintmin('int64'), is now correctly identified and converted to Python'sNone. - Add comprehensive datetime tests: A new test file (
testfile_datetime.mat) and a corresponding test method (test_datetime_loading) have been added to verify the correct loading of variousdatetimeformats, including scalars, arrays (row, column, matrix), datetimes in structs and cell arrays, datetimes with time components,NaT, and dates across a wide range (past and future). - Error Handling: Basic error handling for
OverflowErrorandValueErrorduring tick-to-datetime conversion has been included, logging warnings and storing problematic values asNone.
Changelog
Click here to see the changelog
- create_test_datetimes.m
- This is a new MATLAB script intended to generate
testfile_datetime.mat. - It creates various types of MATLAB
datetimevariables: scalar, row/column vectors, 2D matrix, datetimes within a struct (scalar, array, and in a struct array), datetimes in a cell array (scalar and array), a datetime with specific time (including milliseconds),NaT(scalar and array), a datetime with a timezone (expected to lose timezone on save), and datetimes far in the past and future. - The script saves these variables to
tests/testfile_datetime.matin the v7.3 format.
- This is a new MATLAB script intended to generate
- mat73/core.py
- Imported
datetimeandtimedeltafrom thedatetimemodule (line 12). - Added a new
elifblock in theconvert_matmethod to handleMATLAB_class == 'datetime'(line 304). - Reads the raw data for datetime objects as
np.int64(line 312). - Transposes the raw data to match expected Python/NumPy orientation (line 314).
- Iterates through flattened ticks, converting
intmin('int64')toNoneforNaT(lines 317-324). - Converts
int64ticks to Pythondatetimeobjects by adding microseconds todatetime(1, 1, 1)and subtracting 366 days to adjust for the MATLAB epoch '0000-01-01' (lines 326-331). - Includes
try...exceptblocks to catchOverflowErrorandValueErrorduring conversion, logging warnings and storing invalid datetimes asNone(lines 332-343). - Reshapes the list of Python datetimes/None back to the original transposed shape and returns the result (lines 345-346).
- Imported
- tests/create_mat.m
- Added a new section to the script to generate
testfile_datetime.mat(starting line 87). - This section contains the same MATLAB code as
create_test_datetimes.m, defining and saving variousdatetimevariables for testing. - Includes comments indicating that this section documents the creation of the test file, which is assumed to be pre-generated.
- Added a new section to the script to generate
- tests/test_mat73.py
- Imported
datetimefrom thedatetimemodule (line 12). - Added a new test method
test_datetime_loadingto theTestMat73class (starting line 529). - This test loads
tests/testfile_datetime.mat. - Includes a helper function
assert_datetime_equalfor concise datetime comparisons (lines 538-546). - Asserts the correct loading and values for all the different
datetimevariables created in the MATLAB test script, including scalars, arrays, nested structures, specific times,NaT, and past/future dates (lines 548-660). - Skips the test if
testfile_datetime.matis not found, instructing the user to provide it (lines 531-533).
- Imported
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
MATLAB dates, a tricky art,
From float to int, a brand new start.
With ticks and epochs, conversion's key,
Now Python sees time, wild and free.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces a significant improvement by correcting the way MATLAB datetime objects from v7.3 .mat files are handled. The change to interpret them as int64 microsecond ticks from MATLAB's epoch, including handling for NaT and epoch differences, is well-implemented and addresses a key correctness issue.
The new MATLAB script create_test_datetimes.m for generating comprehensive test data is excellent, and the corresponding Python tests in test_datetime_loading are thorough, covering a wide range of scenarios.
I've identified a few areas for minor improvement, primarily related to comment clarity, documentation consistency in an auxiliary script, and test completeness. Addressing these will further enhance the quality and maintainability of this valuable contribution.
Summary of Findings
- Comment Clarity in Datetime Conversion: The comments in
mat73/core.pypreceding the datetime conversion logic could be clearer to avoid confusion with previous or alternative conversion methods. - Documentation Consistency in Test Generation Script: The documented MATLAB script section in
tests/create_mat.mfortestfile_datetime.matis missing thedt_empty_arrayvariable definition and its inclusion in thesavecommand, which is present in the actual generation script and test file. - Test Coverage for Empty Datetime Arrays: The new tests in
tests/test_mat73.pydo not include a case fordt_empty_array, which is a valid scenario generated by the MATLAB test script.
Merge Readiness
The pull request is a significant step forward in correctly handling MATLAB datetime objects. The core logic is sound and the tests are largely comprehensive. However, there are a few medium-severity issues related to comment clarity, documentation consistency, and test completeness that should be addressed to ensure maximum clarity and robustness.
I recommend addressing these points before merging. As a reviewer, I am not authorized to approve the pull request, so please ensure it undergoes further review and approval as per the project's guidelines after these changes are considered.
| # MATLAB datetime objects, when stored as numeric arrays with MATLAB_class='datetime', | ||
| # are assumed to be serial date numbers. | ||
| # The conversion adopted here assumes that the MATLAB serial date number `mdn` | ||
| # corresponds to Python's proleptic Gregorian ordinal system directly for the date part. |
There was a problem hiding this comment.
The current comments in this section (lines 306-308) seem to describe an older or alternative way MATLAB datetimes might be stored (as serial date numbers), which could be confusing since this PR specifically implements the int64 tick-based conversion.
Could these comments be updated to more accurately reflect the logic being implemented in this elif MATLAB_class == 'datetime': block, focusing on the int64 microsecond tick interpretation from epoch '0000-01-01'?
| # MATLAB datetime objects, when stored as numeric arrays with MATLAB_class='datetime', | |
| # are assumed to be serial date numbers. | |
| # The conversion adopted here assumes that the MATLAB serial date number `mdn` | |
| # corresponds to Python's proleptic Gregorian ordinal system directly for the date part. | |
| # MATLAB datetime objects with MATLAB_class='datetime' can be stored in different ways. | |
| # This branch handles cases where they are stored as int64 arrays, | |
| # representing microsecond ticks from MATLAB's epoch '0000-01-01 00:00:00'. | |
| # (Other storage formats, like float64 serial day numbers, are typically handled differently or might not use this MATLAB_class attribute for int64 data.) |
| save('tests/testfile_datetime.mat', ... | ||
| 'dt_scalar', 'dt_row_vector', 'dt_column_vector', 'dt_matrix', ... | ||
| 'dt_struct', 'dt_cell_array', 'dt_cell_array_column', ... | ||
| 'dt_specific_time', 'dt_nat', 'dt_nat_array', ... |
There was a problem hiding this comment.
The testfile_datetime.mat (generated by create_test_datetimes.m) includes dt_empty_array. For consistency in this documentation block within create_mat.m, should dt_empty_array also be defined here (e.g., after line 120) and then included in this save command's list of variables?
This would make the documented script section a more complete representation of what testfile_datetime.mat contains.
'dt_specific_time', 'dt_nat', 'dt_nat_array', 'dt_empty_array', ...
| self.assertEqual(data['char_arr_2d'], expected) | ||
| self.assertEqual(data['char_arr_3d'], [['abcd', 'defg'], ['ghij', 'jklm'], ['mnöp', 'pqrs']]) | ||
|
|
||
| def test_datetime_loading(self): |
There was a problem hiding this comment.
This new test method test_datetime_loading is very comprehensive! One test case that appears to be missing is for dt_empty_array, which is generated by create_test_datetimes.m.
Could a test be added to verify how an empty MATLAB datetime array is loaded? It would likely be loaded as an empty NumPy array (d.dt_empty_array.size == 0).
For example, something like:
# dt_empty_array
self.assertIn('dt_empty_array', d)
self.assertIsInstance(d.dt_empty_array, np.ndarray) # Or appropriate type
self.assertEqual(d.dt_empty_array.size, 0)
I've revised the handling of MATLAB datetime objects in v7.3 .mat files. The previous implementation incorrectly assumed datetimes were stored as float64 serial day numbers. This commit corrects the conversion to interpret
MATLAB_class='datetime'fields as int64 microsecond ticks from the epoch '0000-01-01 00:00:00'.mat73/core.py:convert_matmethod now reads data forMATLAB_class='datetime'asnp.int64.datetimeobjects.NaT(Not-a-Time), represented byintmin('int64'), by converting it toNone.datetimeepoch ('0001-01-01').tests/test_mat73.py(test_datetime_loading) will verify this corrected logic using thetestfile_datetime.matfile you provided.tests/create_mat.mremains as previously updated, documenting the script fortestfile_datetime.mat.This change addresses the
OverflowErrorencountered with the previous float-based conversion and aligns with MATLAB's storage ofdatetimearrays as int64 ticks.