Skip to content

Some differences in mxOPAQUE CLASS and Object metadata, particularly for Strings #31

@foreverallama

Description

@foreverallama

Hi!
I was looking at how objects are stored in MAT files, and came across this repository and your explanation in MatFileHandler/objects.md. When trying to replicate, I observed some small differences in the object metadata.

Some general differences I observed:

  • Instead of 6 offsets, there are 8 offsets in the list.
  • The fieldContentsID in Region 4 is numbered starting from 0.
  • Region 3 has a slightly different structure as follows: (classID, 0, 0, X, Y, objectID). I'll get into what X and Y are doing below

Some differences for string class:

  • Region 3 had a slightly different structure. As expected, it was a block of six 32-bit integers, one block for each object with data as (classID, 0, 0, X, 0, objectID). X here I will call as stringObjectID, as it started with 1 and incremented for each string object in the file. On the contrary, for user defined classes X is set to 0. Instead, the fifth field (Y as mentioned earlier) is set starting from 1 and incrementing for every user-defined class in the file. I also tried with datetime which was using the Y field. My guess is these two fields are some type of internal identifiers for certain categories of objects.
  • Region 2 was always empty for user defined classes. However, for string, Region 2 was present, and structured exactly the same as Region 4 would be, i.e., three 32-bit integers with the format (fieldID, 1, fieldContentsID). Only one field for each string object is present. No String related data was present in Region 4. In some examples, fieldID was set to 5, and in others it was set to 1. Need to take up a few more examples for this.
  • The fieldContents cell for strings was interesting. The array flag for this was set as mxUINT64_CLASS with dimensions [1, (5+k)], where k depends on the length of the string. The first four 64-bit integers was set as [1,2,1,1], and the fifth integer specified the number of characters in the string. The next k columns contained the actual string contents, which is null terminated and padded to 8-byte blocks. However, the content was stored as UTF-16 characters within these 64-bit columns. Hence, each column essentially stores 4 characters.

I was looking for some help to decode what's happening with strings here, or if there's something else I could be missing. I'm looking to incorporate more objects/examples to help break this down.

The input data I used:

astring1 = "Hello!"
dt = datetime('today')
obj3 = myclass(30)
obj6 = myclass(myclass(5))
stringVar2 = "Goodbye!"

% myclass is defined with 3 properties - message, num, aeroplanes
% myclass.message is set to the input argument

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions