Extract better features RFC #362

OgnjenX · 2025-07-06T18:02:42Z

RFC for initial improvement related to the problem explained here.

…multiscale LBP and depth processing

… RFC - Add Quick Start section for immediate user adoption - Introduce simplified configuration presets (basic, enhanced, complete) - Add simplified feature names mapping (e.g., "lbp_texture" vs "lbp_texture_r2_p16") - Include practical usage examples with minimal and advanced configurations - Enhance TextureFeatureProcessor to support preset-based configuration - Improve risk mitigation strategies for configuration complexity - Maintain backward compatibility while reducing user complexity The RFC now provides a much more user-friendly approach to adopting texture features while preserving the technical depth and comprehensive design.

scottcanoe

Hi @OgnjenX, thank you for this detailed RFC! I think there are some pretty promising ideas here, and I'm looking forward to seeing how this shapes up!

Praise

This RFC is solidly rooted in TBP's goals, and you've done a great job of interpreting the enhanced features objective as well some of our other guidelines (e.g., our preferences regarding deep learning).
Your design choices are principled (composition, easily unit-testable standalone functions, etc.) and well-suited to Monty.
You've clearly thought a lot about how to go about implementing these features, not just in terms of the code, but also in terms of intermediate goals and a roadmap.

Suggestions
I've left a handful of suggestions in the comments, but I wanted to explain a bit more here on a higher-level. The main challenge with this RFC is its length. It's pretty long and structurally complicated, and it'll be more inviting if you can streamline and reorganize it a bit. A few ideas:

Consolidate sections where possible. As an example, see my comments about "Motivation"'s subsections and consolidating the "Research Questions" and "Research Direction Questions" subsections.
There's a good amount of repetition. Composition is referenced 22 times (not including in code samples), and rotational invariance almost as much. I think it's good to stress these points earlier on, but I think we've got it after that.
The team can correct me if I'm wrong on this, but we're generally more focused on concepts than implementation details in the earlier stages of an RFC. Your code look great! But I wonder if there's a way to "hide" some of it so as not to get too distracted by it at this stage. For example, your points about feature dimensionality explosion are a lot more interesting than how you're thinking about configuration strategies. I also assume implementation strategies are subject to a lot of change after back-and-forth discussions.
There are a whole lot of lists. I tend to glaze over them. I'm not sure what to suggest here except maybe push lower priority content into an appendix or something.

Discussion
Here's a handful of questions I had while reading through. Curious to hear everyones thoughts.

How sensitive is LBP to scale? A sensor patch will cover a much larger part of an object's surface if the agent is 20 cm from the object compared with 10 cm. In other words, if we've associated some LBP metrics with a given point when observed from when 20 cm away, what happens when we observe the same point at a distance of 10 cm during inference? It doesn't seem like this would be too bad to test.
Similarly, how do we handle viewing an object's edge? I believe you touched on this, but I just wanted to echo your points about how we need to make some decisions about what to do when some of the image patch is off-object.
In general, I'm very curious to see how LBP works in 3D. It seems really well-suited to 2D images. It may turn out that LBP makes a small amount of difference with 3D datasets but provides a huge boost in 2D. Perhaps even using a non-rotationally-invariant version in 2D would make sense, but I'd have to think about that more.
What are your thoughts on the feasibility of either LBP or depth-based metrics for distant vs surface agents? My intuition is that the sensor patch would have to be surface-agent-close to the object for texture-level variations in depth to show up.

And thanks again for the contribution! Looking forward to keeping tabs on the discussion as it progresses.

scottcanoe · 2025-07-10T03:17:44Z