Skip to content

Add coordinate conventions documentation#24

Closed
jungerm2 wants to merge 6 commits intomainfrom
coords
Closed

Add coordinate conventions documentation#24
jungerm2 wants to merge 6 commits intomainfrom
coords

Conversation

@jungerm2
Copy link
Member

@jungerm2 jungerm2 commented Jan 16, 2026

Add some docs for the different camera conventions.

@shantanu-gupta It seems tform_camcoord_gl2bl is using the wrong coordinates actually, converting to/from opencv/opengl would use this matrix opencv_to_opengl = np.array([[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]]) whereas it seems we've swapped the Y/Z axes which is non-standard. See here too.


📚 Documentation preview 📚: https://visionsim--24.org.readthedocs.build/en/24/

@jungerm2
Copy link
Member Author

jungerm2 commented Jan 16, 2026

Hm, I've added an integration to show the docs once built for a PR, but it seems to be only half working for now.
In the meantime here's the added page: https://visionsim--24.org.readthedocs.build/en/24/sections/conventions.html

EDIT: Link was added to the previous comment by workflow 🚀

@jungerm2
Copy link
Member Author

We'll need to fix this coordinate convention discrepancy and update the docs too, they currently render weirdly.

@shantanu-gupta
Copy link
Contributor

shantanu-gupta commented Jan 16, 2026

@jungerm2 That function does not convert OpenGL coordinates to OpenCV's convention, only to Blender's. I think it is correctly written for that case...?

EDIT: Maybe the misconception is that OpenGL and Blender have the same coordinate convention, but they don't. Blender has +Z going upwards but OpenGL has +Y upwards.

@shantanu-gupta
Copy link
Contributor

We'll need to fix this coordinate convention discrepancy and update the docs too, they currently render weirdly.

Will look into this.

@jungerm2
Copy link
Member Author

The documentation you linked to above is for the world coordinate frame, not the camera's local system. In blender, if you change the transforms orientation to be local (there's a dropdown in the header), then select a camera and hit "G" to grab it, then hit say "Z" you'll find that it does line up with the camera's optical axis, and if you the "+1", the camera will back up, confirming that +Z points away from the viewing direction. Same holds for X/Y. Try it.

@shantanu-gupta
Copy link
Contributor

shantanu-gupta commented Jan 26, 2026

Does the transforms.json file represent the orientation that way then? I get the following T_wc matrix (values rounded to 2 digits) for the first frame of the classroom scene in its transforms.json after rendering (the camera is pointing right-side-up with gravity downwards):

...
"transform_matrix":
[[ 0.88, 0.07, 0.46, 2.57 ],
[0.47, -0.14, -0.87, -4.46],
[-0.00, 0.99, -0.16, 1.09],
[0.00, 0.00, 0.00, 1.00]]
...

The gravity vector would be in the [0, -1, 0] direction in the camera frame per OpenGL convention, but gets mapped to the -Z or [0, 0, -1] direction in the world frame by this matrix. That is consistent with the convention written in the tform_camcoord_gl2bl code, and also with the "World frame" description in the NerfStudio link you added at the start.

It might be worth renaming this function if it makes its purpose clearer: we really just want the coordinate conventions on both sides of the transform matrix to be the same, that's all. It can be the "World frame" convention (as it currently implements) or OpenGL, either is fine.

@shantanu-gupta
Copy link
Contributor

shantanu-gupta commented Jan 26, 2026

Updated the code to fix the doc rendering issue, and the description based on my comment above. Also renamed the function itself.

@jungerm2
Copy link
Member Author

I think the confusion is about global/local coordinate frames. In blender, a camera's local coordinate frame follows OpenGL (with -Z pointing away from the camera), but the camera's pose matrix as reported by "transform_matrix" is of course in blender's world frame (otherwise it would be constant) whish does indeed have +Z pointing up as seen in the gizmo here:
image

@jungerm2
Copy link
Member Author

Consider the following:
image

The camera is placed roughly such that it's looking straight down the +Y global axis, which in it's local frame is -Z, so it makes sense that the 3rd column of the camera matrix in -Y no?

I guess I'm still not sure I understand why this conversion is needed?

@shantanu-gupta
Copy link
Contributor

shantanu-gupta commented Jan 28, 2026

I suppose I can walk through my train-of-thought when implementing emulate.IMU.

Let's assume the OpenGL convention for coordinate axes (this choice doesn't matter), so we have +Y up and +Z coming out of the image at the origin. The gravity vector in this frame of reference is g_r_gl := [0, -9.8, 0] in m/s^2, representing a vector in the reference coordinate frame according to the OpenGL convention. [Using the term "reference coordinate frame" to avoid confusion with Blender's "world coordinate frame", which we'll see in a bit.]

At any time, I would need to transform this vector to the local frame of reference to simulate an accelerometer measurement in the camera frame. How do I do that? I take the transform_matrix at that time from the transforms.json. Let's say the camera is also at the origin and has the same canonical orientation, then I would expect the transform_matrix to be T_rc_gl_gl = eye(4), so the gravity vector remains the same. But what will happen with this convention mismatch is that the transforms.json will give me T_rc_w_gl = [[1 0 0 0]; [0 0 -1 0]; [0 1 0 0]; [0 0 0 1]]: we can check that the true gravity vector in Blender's world frame (+Z up) is [0, 0, -9.8], and this matrix will indeed transform the original [0, -9.8, 0] to that, so there's nothing wrong from the viewpoint of the transforms.json generator. But if I don't know about this convention difference or don't explicitly compensate for it, I will interpret the matrix as the camera being rotated to a different orientation. Then the same gravity vector will look like it corresponds to g_c_gl_computed = T_rc(1:3,1:3)' * g_r_gl = [0, 0, 9.8], suggesting that the gravity direction is coming out of the image when it should actually be downwards. Then the IMU data we generate will also be wrong (both accelerometer and gyroscope, since the rotation axis will also get similarly mangled only accelerometer actually, angular velocity in the camera frame should still be fine).

Somewhere we need to explicitly handle this convention difference, so that we do not misinterpret the transform_matrix.

@shantanu-gupta
Copy link
Contributor

shantanu-gupta commented Jan 28, 2026

Although in the end this is not really a detail the end-user needs to care about; we only use this conversion as an implementation detail in emulate.imu. We just need to be clear in the CLI about what coordinate convention is being used when specifying the gravity direction.

@jungerm2 jungerm2 mentioned this pull request Feb 5, 2026
6 tasks
@jungerm2
Copy link
Member Author

This branch has been merged into this PR so I'm closing this issue. I've moved tform_camcoord_gl2bl into imu.py and made it private, linked to this PR and removed the (now empty) pose utils file.

@jungerm2 jungerm2 closed this Feb 12, 2026
@jungerm2 jungerm2 deleted the coords branch February 12, 2026 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants