Quantcast
Channel: Debugging – Ofek's Visual C++ stuff
Viewing all articles
Browse latest Browse all 16

Matlab’s mxArray Internals

$
0
0

Everything in Matlab is a Matrix. The scalar 4 is a 1×1 matrix with the single value 4. The string ‘asdf’ is a 4×1 (not a typo – it is in fact a column vector) matrix, with the 4 char values ‘a’, ‘s’, ‘d’, ‘f’, etc. When writing MEX functions in C/C++ (MEX = Matlab Extension), or when feeding data to Matlab-compiled components it is revealed that the underlying unified type for (almost) all Matlab data is the C type mxArray.

It seems that in the distant past (10Y+) Mathworks did deploy headers with the real type definition, but today mxArray is a completely opaque type – it is passed around only via pointers, and its only public declaration is a forward one, in matrix.h:

/**
 * Forward declaration for mxArray
 */
typedef struct mxArray_tag mxArray;

The only serious attempt I’m aware of to reverse mxArray’s layout is this 2000 user-group posting by Peter Boettcher. Twelve years later Peter Li published this work – which is very partial, relies on ‘circumstantial’ evidence and not on investigation of disassembly, and contains multiple errors. The memory layout of mxArray changed considerably since both works, and it is high time for a new investigation.

Below I hope to do more than spill out the results – rather lay out the complete way to improve this work and reproduce it in the future. (when mxArray’s layout changes and re-reversing is needed).

The Setup

Matlab installation includes the useful file  [matlabroot]/extern/examples/mex/explore.c, which demonstrates usage of most mxArray-poking API. To use it –

  1. Build it from the matlab prompt:
    >> mex -g ‘C:\Program Files\MATLAB\R2017a\extern\examples\mex\explore.c’
    (assuming matlabroot is ‘C:\Program Files\MATLAB\R2017a).
    This would create explore.mexw64 in your current folder – make sure you have write permissions to it. The -g switch generates debug symbols, to enable you to step through the generated code and watch variable contents in VS.
  2. Open explore.c in visual studio, set a breakpoint somewhere early at mexFunction():
  3. Attach to the running instance of matlab.
  4. From the matlab command prompt, create a variable of the type you wish to investigate and explore() it. E.g.:
    >> A=rand(3); explore(A)
  5. Step in VS and investigate the underlying mxArrays as detailed below. Repeat for various types.
    Expect to get obscure exceptions once in a while – they seem benign and handled internally by the jvm.

Start Simple

The first mx function called is mxGetNumberOfDimensions. It is the simplest getter possible, with two instructions:

This means that the mxArray member holding the number of dimensions is located at offset 24 (18 hex) from the mxArray start.

Similarly, mxGetPr disassembly shows that the real part of the data is at offset 56 (38 hex):

And the imaginary part of the data at offset 64 (40 hex):

mxIsSparse is an itzy bit more complicated:

Meaning, at offset 36 (24h) is a bit mask of size at least 8 (1 byte). The 5th bit from the right is a ‘sparse’ flag.

mxGetClassID has a little bit more complexity, but the basic location of the info is immediate:

The ClassID data is at offset 8 and it is of type mxClassID. The value 10h is the maximum allowed and values above it are subtracted by 17 before returning. Note that values above 16 are mentioned in matrix.h, but it seems these are not values you’d meet every day:

typedef enum {
    mxUNKNOWN_CLASS = 0,
...
    mxUINT64_CLASS,
    mxFUNCTION_CLASS, // = 16,
    mxOPAQUE_CLASS,
    mxOBJECT_CLASS, /* keep the last real item in the list */
#if defined(_LP64) || defined(_WIN64)
    mxINDEX_CLASS = mxUINT64_CLASS,
#else
    mxINDEX_CLASS = mxUINT32_CLASS,
#endif
    /* TEMPORARY AND NASTY HACK UNTIL mxSPARSE_CLASS IS COMPLETELY ELIMINATED */
    mxSPARSE_CLASS = mxVOID_CLASS /* OBSOLETE! DO NOT USE */
} mxClassID;

And so on and so forth. Much of mxArray’s contents are as apparent as in these examples, but not all. Hopefully these examples are enough to get a feel for the type of investigation required.

Conclusions (some, anyway)

  1. mxArray’s contents are very dense, and many of the members are unions – i.e., have different interpretations in different contexts. For one, dimensions: when number_of_dims (offset 24) is 2, they’re stored as two size_t members – which I called rowdim and coldim. When number_of_dims is greater than 2, the first member (rowdim) is actually a pointer to a heap array of dims, of size number_of_dims.
    The main data pointers (pData, pimag_data) are much larger unions – they can point to an array of anything, from doubles to complete mxArrays, depending on the mxClassID.
  2. Peter Boettcher’s 2000 work describes how Matlab uses ‘crosslinks’ between copies of the same variable to implement copy-on-write semantics. Some time later in the millennium Mathworks decided to trade space for speed, and added bi-directional such links, to save traversing the list of copies when one is changed: the layout as listed now includes both forward and backward crosslinks.
  3. Structs are rather complex. For a single struct –
    1. pData points to an array of mxArrays, containing the field values.
    2. The field names are accessed via 3 indirections:
      1. pimag_data points to a type I called struct_field_info.
      2. a member of struct_field_info points to another type I called struct_field_info_tag.
      3. a member of struct_field_info_tag (‘field_names’) is an array of pointers to null terminated ascii strings, holding the field names in order.
  4. Sparse matrices: the raw data is referenced by pData/pimag_data as for usual matrix. The data’s location is governed by the arrays irptr/jcptr, used to implement the Compressed-Sparse-Column storage and accessible directly via mxGetIr and mxGetJc. The size of these arrays (nnz) is stored at the member I called nelements_allocated.

Results

I imagine that even in the parts I did get right, the real mxArray headers greatly differ from my own. Since I’m interested only in watching mxArrays and not modifying them directly (as you should be too), I did not express, say, pData as a large union but rather as a single void*. The place where the content type manifests itself is inside the natvis file, in sections such as –

      <!--Dense Matrices-->
      <ArrayItems Condition="classID!=2 &amp;&amp; classID!=4 &amp;&amp; dataflags.sparse==false">
        <Direction>Backward</Direction>
        <Rank>number_of_dims</Rank>
        <Size Condition="number_of_dims==2">(&amp;rowdim)[$i]</Size>
        <Size Condition="number_of_dims!=2">((size_t*)rowdim)[$i]</Size>
        <ValuePointer Condition="classID==1">(mxArray_tag*)pData</ValuePointer>
        <ValuePointer Condition="classID==6">(double*)pData</ValuePointer>
        <ValuePointer Condition="classID==7">(float*)pData</ValuePointer>
        <ValuePointer Condition="classID==8">(char*)pData</ValuePointer>
        <ValuePointer Condition="classID==9">(unsigned char*)pData</ValuePointer>
        <ValuePointer Condition="classID==10">(short*)pData</ValuePointer>
        <ValuePointer Condition="classID==11">(unsigned short*)pData</ValuePointer>
        <ValuePointer Condition="classID==12">(int*)pData</ValuePointer>
        <ValuePointer Condition="classID==13">(unsigned int*)pData</ValuePointer>
        <ValuePointer Condition="classID==14">(__int64*)pData</ValuePointer>
        <ValuePointer Condition="classID==15">(unsigned __int64*)pData</ValuePointer>
      </ArrayItems>

Here’s a simple example of the resulting watches:

double_expand

 

While this work is partial – I stopped where it was useful enough to us – it might be of value as is. It is now on github, you’re welcome to use/improve/report issues. The license is as free as I know how to make it.  If you want to learn a bit more about the natvis syntax, here’s where to do it.

The easiest way to use it is to add mxArrayWatch.h and mxArrayWatch.natvis to your c++ project which uses mxArray’s.

MathWorks Plea

My guess is that you guys decided to hide mxArray’s layout after users wrote code that relied on undocumented specifics, the code broke when you upgraded the layout, and unnecessary burden on your support ensued. I can completely relate to that. However, I’m not sure you have a realistic picture of the price your customers pay.

Let’s take a far more ubiquitous runtime as an example – MS CRT. Microsoft have been exposing nearly all of its source – type internals and logic – for at least 15 years now. STL and CRT types do change layout in major versions, code which illegally relied on internal layout comes crumbling down, I’m sure their support suffers as a result – but I dare guess their support would be burdened considerably more had they not opened the CRT source. In real life, more often than not, you just have to peek in.

Encapsulation is a solid design principle – but when taken to extremes it fails. In general, you really need zero knowledge of internals only when everything 100% succeeds on your first attempt, which is never the case in real life projects. Just the other day we had a nasty crash with memory corruption on mxDestroyArray – and it turned out the issue in the code was that we updated the field ‘costGap’ instead of ‘CostGap’. There is no way in the world we would have been able to debug this without reverse engineering the mxArray layout.

If you take interfacing with C/C++ seriously, I urge you guys to reconsider. Please don’t force the community to reverse your stuff to be able to work with it.


Viewing all articles
Browse latest Browse all 16

Trending Articles