Skip to content

Conversation

@vkakerbeck
Copy link
Contributor

For a while now, we've been logging "confused" in the stepwise performance column if no semantic sensor is used. This happened because we would set the on-object-map (semantic) to 0's and 1's when we have no semantic sensor data (which became the default last year when @scottcanoe cleaned up that code for the DMC paper). However, when the stepwise_performance logging code looked at those values, it would interpret the 1's as object ID 1 (i.e. the first object in the dataset). So unless the LM actually recognized object 1, it would log "confused" (and incorrectly logged "correct" if the target wasn't actually the 1st object).

This PR fixes this issue (although maybe not in the most beautiful way) but setting the semantic values to a large number that wouldn't be in the semantic_id_to_label dict (setting it to np.inf doesn't work for various reasons and a negative value seemed like it would introduce more confusion). I also added a line that actually correctly logs the overall no_label performance (which previously didn't happen).

@hlee9212 it would be nice to use this updated version in your demo so people aren't confused why the .csv files show "confused" in the second column if Monty correctly recognized the object. I'll try to merge the PR before then.

I spot-checked benchmarks, but since this only affects the stepwise performance logging, which we don't report in the benchmarks, it's not expected to have any effect.

@vkakerbeck vkakerbeck changed the title Fix stepwise performance logging fix!: Fix stepwise performance logging Dec 15, 2025
@jeremyshoemaker jeremyshoemaker added the triaged This issue or pull request was triaged label Dec 15, 2025
Copy link
Contributor

@scottcanoe scottcanoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: ‏It makes me slightly nervous to modify semantic_3d to do this, but I couldn't find anywhere in tbp.monty or my experiment code that distinguishes between nonzero values when use_semantic_sensor is False. And other solutions seem pretty complicated, so I get it.

suggestion: ‏Drop a line in the DepthTo3DTransforms indicating this behavior. Maybe line 387.

thought: I bet habitat returns uint8 for semantic. It's a long shot, but 10_000 > 2*8, so I did wonder whether there's anywhere it could cause overflow. I don't think so though.

@vkakerbeck
Copy link
Contributor Author

Yes, I know. I looked into a lot of other options but the use_semantic_sensor setting is so isolated from the config info that Monty or even the experiment get, I could see another way that wasn't super hacky to check for this. How I am conceptualizing it now is that 10000 basically represents the "unknown_object" ID. I.e. one that would not be in semantic_id_to_label (unless we have 10000 distinct objects).

@tristanls-tbp
Copy link
Contributor

issue: This will become a problem in the future as it assumes that 10 000 is large enough. So, when it isn't we are back to the same problem this pull request is intended to fix. Can we find a permanent solution?

@tristanls-tbp
Copy link
Contributor

note: I'm still having trouble finding where the logging is going wrong. If the desired state is not to log "confused" when the DepthTo3DTransform does not use a semantic sensor, then that seems like it requires a logging configuration to tell the logger the semantic sensor is not being used. Either way there is coupling. However, with a configuration param, the coupling is explicit in the configuration, whereas with this pull request, the coupling is a magic number passing through the data path (in essence, we are changing what Monty observes so that a logger logs correctly).

@vkakerbeck
Copy link
Contributor Author

Currently, we basically set all on-object pixels to semantic-id 1 (id of the first object in the list) when we don't use the semantic sensor. Setting it to a value other than 1 (and one that is not defined for one if the other objects) is already an improvement. I agree that it would be the cleanest to have a check like if use_semantic_sensor is False in the logging code instead (or in addition). But I couldn't find a good way to do this since the transform parameters are so isolated from the rest of the code. If you have a good suggestion, let me know.
For now, I think this is better than what we had before (i.e. still using a magic number but one that doesn't have a double meaning). If we want to be safer I can change the number to be larger, like 9999999999. It will be pretty unlikely that we will ever have an experiment with so many objects. FWIW I initially tried setting it to np.inf but that doesn't work with some of the type conversions later in the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

triaged This issue or pull request was triaged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants