-
Notifications
You must be signed in to change notification settings - Fork 272
Extract better features RFC #362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Extract better features RFC #362
Conversation
…multiscale LBP and depth processing
… RFC - Add Quick Start section for immediate user adoption - Introduce simplified configuration presets (basic, enhanced, complete) - Add simplified feature names mapping (e.g., "lbp_texture" vs "lbp_texture_r2_p16") - Include practical usage examples with minimal and advanced configurations - Enhance TextureFeatureProcessor to support preset-based configuration - Improve risk mitigation strategies for configuration complexity - Maintain backward compatibility while reducing user complexity The RFC now provides a much more user-friendly approach to adopting texture features while preserving the technical depth and comprehensive design.
… RFC - Add Quick Start section for immediate user adoption - Introduce simplified configuration presets (basic, enhanced, complete) - Add simplified feature names mapping (e.g., "lbp_texture" vs "lbp_texture_r2_p16") - Include practical usage examples with minimal and advanced configurations - Enhance TextureFeatureProcessor to support preset-based configuration - Improve risk mitigation strategies for configuration complexity - Maintain backward compatibility while reducing user complexity The RFC now provides a much more user-friendly approach to adopting texture features while preserving the technical depth and comprehensive design.
scottcanoe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @OgnjenX, thank you for this detailed RFC! I think there are some pretty promising ideas here, and I'm looking forward to seeing how this shapes up!
Praise
- This RFC is solidly rooted in TBP's goals, and you've done a great job of interpreting the enhanced features objective as well some of our other guidelines (e.g., our preferences regarding deep learning).
- Your design choices are principled (composition, easily unit-testable standalone functions, etc.) and well-suited to Monty.
- You've clearly thought a lot about how to go about implementing these features, not just in terms of the code, but also in terms of intermediate goals and a roadmap.
Suggestions
I've left a handful of suggestions in the comments, but I wanted to explain a bit more here on a higher-level. The main challenge with this RFC is its length. It's pretty long and structurally complicated, and it'll be more inviting if you can streamline and reorganize it a bit. A few ideas:
- Consolidate sections where possible. As an example, see my comments about "Motivation"'s subsections and consolidating the "Research Questions" and "Research Direction Questions" subsections.
- There's a good amount of repetition. Composition is referenced 22 times (not including in code samples), and rotational invariance almost as much. I think it's good to stress these points earlier on, but I think we've got it after that.
- The team can correct me if I'm wrong on this, but we're generally more focused on concepts than implementation details in the earlier stages of an RFC. Your code look great! But I wonder if there's a way to "hide" some of it so as not to get too distracted by it at this stage. For example, your points about feature dimensionality explosion are a lot more interesting than how you're thinking about configuration strategies. I also assume implementation strategies are subject to a lot of change after back-and-forth discussions.
- There are a whole lot of lists. I tend to glaze over them. I'm not sure what to suggest here except maybe push lower priority content into an appendix or something.
Discussion
Here's a handful of questions I had while reading through. Curious to hear everyones thoughts.
- How sensitive is LBP to scale? A sensor patch will cover a much larger part of an object's surface if the agent is 20 cm from the object compared with 10 cm. In other words, if we've associated some LBP metrics with a given point when observed from when 20 cm away, what happens when we observe the same point at a distance of 10 cm during inference? It doesn't seem like this would be too bad to test.
- Similarly, how do we handle viewing an object's edge? I believe you touched on this, but I just wanted to echo your points about how we need to make some decisions about what to do when some of the image patch is off-object.
- In general, I'm very curious to see how LBP works in 3D. It seems really well-suited to 2D images. It may turn out that LBP makes a small amount of difference with 3D datasets but provides a huge boost in 2D. Perhaps even using a non-rotationally-invariant version in 2D would make sense, but I'd have to think about that more.
- What are your thoughts on the feasibility of either LBP or depth-based metrics for distant vs surface agents? My intuition is that the sensor patch would have to be surface-agent-close to the object for texture-level variations in depth to show up.
And thanks again for the contribution! Looking forward to keeping tabs on the discussion as it progresses.
| - How should we handle processor lifecycle (creation, updates, cleanup)? | ||
| - Does this approach align well with the existing NoiseMixin integration pattern? | ||
|
|
||
| 2. **Feature Naming Convention**: Are the proposed multi-scale feature names intuitive? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I noticed that TextureFeatureProcessor.extract_features looks for strings that start with depth_. IMO, this casts too wide of a net. You're probably better off selecting prefix and naming convention that's less likely to clash with other features (whether they exist yet or not). Something more like what you did with lbp_* would be safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
| ## Questions for Community | ||
|
|
||
| ### Technical Design Questions | ||
| 1. **Composition Architecture**: Is the TextureFeatureProcessor composition approach optimal? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question/suggestion: Is there a reason to combine LBP- and depth-based texture processing into one interface (i.e., TextureFeatureProcessor)? Though they are both related to textured features, I think there are good reasons to keep them separate.
- I'll expound on this elsewhere, but I'm thinking LBP might actually have the most immediate and clear utility with 2D datasets like MNIST or Omniglot. In that case, we'd like to leave depth-based texture processing out of the equation.
- Correct me if I'm wrong, but LBP- and depth-based methods don't seem to need or make use of each other. I'd suggest making separate processor interfaces to make it easier to mix and match different implementations as needed. I'm guessing others will want to weight in about separation of concerns as well.
I see you've got "Cross-modal RGB-depth texture integration" in Phase 4 of your plan, in which case you might want to have both methods inform each other in some way. Still, I think something like that is a ways off, and you can always experiment with some hybrid approach later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. It is probably better to keep them as separate interfaces.
| - **Single-pixel sampling**: Only extracts RGBA/HSV from the center pixel (1/4096 pixels used) | ||
| - **No texture information**: Cannot distinguish between smooth vs. textured surfaces with identical colors | ||
| - **Basic depth statistics**: Only min/mean depth, missing surface texture information | ||
| - **No rotation invariance**: Current features change with sensor orientation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which ones?
|
|
||
| The documentation specifically requires **rotation invariant** features that can detect textured patterns regardless of sensor orientation. | ||
|
|
||
| ### Research Questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Good questions, but we're still getting a handle on what the main proposal is at this point in the document. There's also another section called Research Direction Questions. I'd recommend consolidating the two sections, and probably holding off on any non-essential discussion points until much later. For example, HTM Spatial Pooling is discussed as a future possibility, but it's not core to the proposal here like LBP or the depth-based method you present.
At this point in the document, I think the main research questions are simply how much improvement we might see with the extracted features you're suggesting, and how robust these features are to noise, scale, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I will improve that in my next commit.
| - **No rotation invariance**: Current features change with sensor orientation | ||
| - **Limited cross-modal integration**: RGB and depth features are processed independently | ||
|
|
||
| ### Problem Statement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick/suggestion: Problem Statement seems to logically precede Current Limitations while also being a bit redundant with the Current Limitations section. In my opinion, the Motivation section doesn't need subsections, and the simplification and removal of redundant content is going to make it easier to hold onto the thread. (Also see the next comment about the Research Questions subsection.)
| 1. **Texture Processing Utilities**: Standalone functions in `src/tbp/monty/frameworks/utils/texture_processing.py` | ||
| 2. **Texture Feature Processor**: Dedicated `TextureFeatureProcessor` class that handles all texture extraction logic | ||
| 3. **Composition Integration**: Sensor modules use composition ("has-a" relationship) to incorporate texture processing | ||
| 4. **Broad Applicability**: Not limited to Habitat-specific sensor modules, works with any RGBA+depth data | ||
| 5. **Clean Architecture**: Avoids multiple inheritance complexity and method resolution order issues entirely | ||
| 6. **Consistent with User Preferences**: Follows the established preference for composition over mixin patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Bullet points 3, 5, and 6 are pretty much saying the same thing. Recommend consolidation.
|
|
||
| ### Architecture Overview | ||
|
|
||
| The implementation follows a **pure composition approach** that can be integrated into any sensor module: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: As a reader, it'd be really helpful if you'd provide more of a high-level description before getting lists or bullet points. For example, this section could start with something like
To utilize the new features suggested in this RFC, we must first enable sensor modules to extract them. We propose a compositional approach wherein each sensor module owns a set of feature-specific data processors that can be configured and utilized as needed. These feature-specific data processors are intended to serve as a simple interface to standalone functions that do not require the attributes or methods of an owning sensor module. As such, we can avoid complex inheritance patterns incurred by a subclass- or mixin-based approach by preferring a compositional strategy.
I find something like the above easier to follow than a list since it walks you through the logic a bit.
…P and depth processing, make rfc more concise
Hi @scottcanoe, thank you for the detailed review. I will also provide my view on the questions you raised:
You're right that this is a key challenge. When the agent moves from 20cm to 10cm, the same sensor patch will capture very different texture granularity, from seeing multiple texture elements to seeing fine details within a single element.
Completely agree, this needs careful handling.
I completely agree with your intuition here. LBP will likely show different effectiveness:
Your intuition perfectly aligns with mine. Surface agents are the primary target: |
RFC for initial improvement related to the problem explained here.