Skip to content

Comments

fix: Correct MATLAB datetime conversion using int64 ticks#67

Open
skjerns wants to merge 1 commit intomainfrom
feat/matlab-datetime-fix
Open

fix: Correct MATLAB datetime conversion using int64 ticks#67
skjerns wants to merge 1 commit intomainfrom
feat/matlab-datetime-fix

Conversation

@skjerns
Copy link
Owner

@skjerns skjerns commented May 27, 2025

I've revised the handling of MATLAB datetime objects in v7.3 .mat files. The previous implementation incorrectly assumed datetimes were stored as float64 serial day numbers. This commit corrects the conversion to interpret MATLAB_class='datetime' fields as int64 microsecond ticks from the epoch '0000-01-01 00:00:00'.

  • I modified mat73/core.py:
    • The convert_mat method now reads data for MATLAB_class='datetime' as np.int64.
    • I converted these ticks to Python datetime objects.
    • I handled MATLAB NaT (Not-a-Time), represented by intmin('int64'), by converting it to None.
    • I adjusted for the epoch difference between MATLAB's assumed datetime epoch ('0000-01-01') and Python's datetime epoch ('0001-01-01').
    • I included error handling for potential overflows or value errors during tick conversion.
  • The existing tests in tests/test_mat73.py (test_datetime_loading) will verify this corrected logic using the testfile_datetime.mat file you provided.
  • tests/create_mat.m remains as previously updated, documenting the script for testfile_datetime.mat.

This change addresses the OverflowError encountered with the previous float-based conversion and aligns with MATLAB's storage of datetime arrays as int64 ticks.

I've revised the handling of MATLAB datetime objects in v7.3 .mat files.
The previous implementation incorrectly assumed datetimes were stored as
float64 serial day numbers. This commit corrects the conversion to
interpret `MATLAB_class='datetime'` fields as int64 microsecond ticks
from the epoch '0000-01-01 00:00:00'.

- I modified `mat73/core.py`:
    - The `convert_mat` method now reads data for `MATLAB_class='datetime'`
      as `np.int64`.
    - I converted these ticks to Python `datetime` objects.
    - I handled MATLAB `NaT` (Not-a-Time), represented by `intmin('int64')`,
      by converting it to `None`.
    - I adjusted for the epoch difference between MATLAB's assumed datetime
      epoch ('0000-01-01') and Python's `datetime` epoch ('0001-01-01').
    - I included error handling for potential overflows or value errors during
      tick conversion.
- The existing tests in `tests/test_mat73.py` (`test_datetime_loading`)
  will verify this corrected logic using the `testfile_datetime.mat` file you provided.
- `tests/create_mat.m` remains as previously updated, documenting the
  script for `testfile_datetime.mat`.

This change addresses the `OverflowError` encountered with the previous
float-based conversion and aligns with MATLAB's storage of `datetime`
arrays as int64 ticks.
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @skjerns, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request to help everyone quickly understand the changes.

This PR addresses an issue where MATLAB datetime objects stored in v7.3 .mat files were incorrectly interpreted as float64 serial day numbers. The correct approach, implemented here, is to read them as int64 ticks representing microseconds since the MATLAB epoch ('0000-01-01 00:00:00'). The changes involve updating the core conversion logic to handle this int64 format, converting these ticks to Python datetime objects while adjusting for the epoch difference between MATLAB and Python, and handling MATLAB's NaT (Not-a-Time) value.

Comprehensive tests have been added to verify the correct loading of various datetime formats, including scalars, vectors, matrices, datetimes within structs and cell arrays, datetimes with specific times (including milliseconds), NaT values, and dates far in the past or future.

Highlights

  • Correct MATLAB datetime conversion: The core logic in mat73/core.py has been updated to correctly interpret MATLAB datetime objects stored in v7.3 files. Instead of assuming float64 serial day numbers, the code now reads the data as int64 ticks.
  • Handle int64 microsecond ticks and epoch difference: The conversion now correctly interprets the int64 values as microseconds since MATLAB's '0000-01-01' epoch. The code adjusts for the difference between this epoch and Python's datetime(1, 1, 1) epoch (a 366-day difference due to year 0000 being a leap year in the proleptic Gregorian calendar).
  • Support for MATLAB NaT: MATLAB's NaT (Not-a-Time), represented by intmin('int64'), is now correctly identified and converted to Python's None.
  • Add comprehensive datetime tests: A new test file (testfile_datetime.mat) and a corresponding test method (test_datetime_loading) have been added to verify the correct loading of various datetime formats, including scalars, arrays (row, column, matrix), datetimes in structs and cell arrays, datetimes with time components, NaT, and dates across a wide range (past and future).
  • Error Handling: Basic error handling for OverflowError and ValueError during tick-to-datetime conversion has been included, logging warnings and storing problematic values as None.

Changelog

Click here to see the changelog
  • create_test_datetimes.m
    • This is a new MATLAB script intended to generate testfile_datetime.mat.
    • It creates various types of MATLAB datetime variables: scalar, row/column vectors, 2D matrix, datetimes within a struct (scalar, array, and in a struct array), datetimes in a cell array (scalar and array), a datetime with specific time (including milliseconds), NaT (scalar and array), a datetime with a timezone (expected to lose timezone on save), and datetimes far in the past and future.
    • The script saves these variables to tests/testfile_datetime.mat in the v7.3 format.
  • mat73/core.py
    • Imported datetime and timedelta from the datetime module (line 12).
    • Added a new elif block in the convert_mat method to handle MATLAB_class == 'datetime' (line 304).
    • Reads the raw data for datetime objects as np.int64 (line 312).
    • Transposes the raw data to match expected Python/NumPy orientation (line 314).
    • Iterates through flattened ticks, converting intmin('int64') to None for NaT (lines 317-324).
    • Converts int64 ticks to Python datetime objects by adding microseconds to datetime(1, 1, 1) and subtracting 366 days to adjust for the MATLAB epoch '0000-01-01' (lines 326-331).
    • Includes try...except blocks to catch OverflowError and ValueError during conversion, logging warnings and storing invalid datetimes as None (lines 332-343).
    • Reshapes the list of Python datetimes/None back to the original transposed shape and returns the result (lines 345-346).
  • tests/create_mat.m
    • Added a new section to the script to generate testfile_datetime.mat (starting line 87).
    • This section contains the same MATLAB code as create_test_datetimes.m, defining and saving various datetime variables for testing.
    • Includes comments indicating that this section documents the creation of the test file, which is assumed to be pre-generated.
  • tests/test_mat73.py
    • Imported datetime from the datetime module (line 12).
    • Added a new test method test_datetime_loading to the TestMat73 class (starting line 529).
    • This test loads tests/testfile_datetime.mat.
    • Includes a helper function assert_datetime_equal for concise datetime comparisons (lines 538-546).
    • Asserts the correct loading and values for all the different datetime variables created in the MATLAB test script, including scalars, arrays, nested structures, specific times, NaT, and past/future dates (lines 548-660).
    • Skips the test if testfile_datetime.mat is not found, instructing the user to provide it (lines 531-533).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


MATLAB dates, a tricky art,
From float to int, a brand new start.
With ticks and epochs, conversion's key,
Now Python sees time, wild and free.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant improvement by correcting the way MATLAB datetime objects from v7.3 .mat files are handled. The change to interpret them as int64 microsecond ticks from MATLAB's epoch, including handling for NaT and epoch differences, is well-implemented and addresses a key correctness issue.

The new MATLAB script create_test_datetimes.m for generating comprehensive test data is excellent, and the corresponding Python tests in test_datetime_loading are thorough, covering a wide range of scenarios.

I've identified a few areas for minor improvement, primarily related to comment clarity, documentation consistency in an auxiliary script, and test completeness. Addressing these will further enhance the quality and maintainability of this valuable contribution.

Summary of Findings

  • Comment Clarity in Datetime Conversion: The comments in mat73/core.py preceding the datetime conversion logic could be clearer to avoid confusion with previous or alternative conversion methods.
  • Documentation Consistency in Test Generation Script: The documented MATLAB script section in tests/create_mat.m for testfile_datetime.mat is missing the dt_empty_array variable definition and its inclusion in the save command, which is present in the actual generation script and test file.
  • Test Coverage for Empty Datetime Arrays: The new tests in tests/test_mat73.py do not include a case for dt_empty_array, which is a valid scenario generated by the MATLAB test script.

Merge Readiness

The pull request is a significant step forward in correctly handling MATLAB datetime objects. The core logic is sound and the tests are largely comprehensive. However, there are a few medium-severity issues related to comment clarity, documentation consistency, and test completeness that should be addressed to ensure maximum clarity and robustness.

I recommend addressing these points before merging. As a reviewer, I am not authorized to approve the pull request, so please ensure it undergoes further review and approval as per the project's guidelines after these changes are considered.

Comment on lines +305 to +308
# MATLAB datetime objects, when stored as numeric arrays with MATLAB_class='datetime',
# are assumed to be serial date numbers.
# The conversion adopted here assumes that the MATLAB serial date number `mdn`
# corresponds to Python's proleptic Gregorian ordinal system directly for the date part.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current comments in this section (lines 306-308) seem to describe an older or alternative way MATLAB datetimes might be stored (as serial date numbers), which could be confusing since this PR specifically implements the int64 tick-based conversion.

Could these comments be updated to more accurately reflect the logic being implemented in this elif MATLAB_class == 'datetime': block, focusing on the int64 microsecond tick interpretation from epoch '0000-01-01'?

Suggested change
# MATLAB datetime objects, when stored as numeric arrays with MATLAB_class='datetime',
# are assumed to be serial date numbers.
# The conversion adopted here assumes that the MATLAB serial date number `mdn`
# corresponds to Python's proleptic Gregorian ordinal system directly for the date part.
# MATLAB datetime objects with MATLAB_class='datetime' can be stored in different ways.
# This branch handles cases where they are stored as int64 arrays,
# representing microsecond ticks from MATLAB's epoch '0000-01-01 00:00:00'.
# (Other storage formats, like float64 serial day numbers, are typically handled differently or might not use this MATLAB_class attribute for int64 data.)

save('tests/testfile_datetime.mat', ...
'dt_scalar', 'dt_row_vector', 'dt_column_vector', 'dt_matrix', ...
'dt_struct', 'dt_cell_array', 'dt_cell_array_column', ...
'dt_specific_time', 'dt_nat', 'dt_nat_array', ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The testfile_datetime.mat (generated by create_test_datetimes.m) includes dt_empty_array. For consistency in this documentation block within create_mat.m, should dt_empty_array also be defined here (e.g., after line 120) and then included in this save command's list of variables?

This would make the documented script section a more complete representation of what testfile_datetime.mat contains.

        'dt_specific_time', 'dt_nat', 'dt_nat_array', 'dt_empty_array', ...

self.assertEqual(data['char_arr_2d'], expected)
self.assertEqual(data['char_arr_3d'], [['abcd', 'defg'], ['ghij', 'jklm'], ['mnöp', 'pqrs']])

def test_datetime_loading(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new test method test_datetime_loading is very comprehensive! One test case that appears to be missing is for dt_empty_array, which is generated by create_test_datetimes.m.

Could a test be added to verify how an empty MATLAB datetime array is loaded? It would likely be loaded as an empty NumPy array (d.dt_empty_array.size == 0).

For example, something like:

        # dt_empty_array
        self.assertIn('dt_empty_array', d)
        self.assertIsInstance(d.dt_empty_array, np.ndarray) # Or appropriate type
        self.assertEqual(d.dt_empty_array.size, 0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant