Skip to content

Conversation

@OgnjenX
Copy link
Contributor

@OgnjenX OgnjenX commented Jun 30, 2025

RFC proposal for this feature explained in docs

@OgnjenX OgnjenX changed the title feat: add RFC for unsupervised object ID association in cross-modal learning RFC: Generalize Voting to Associative Connections Jun 30, 2025
@tristanls tristanls added rfc:proposal This issue tracks an RFC proposal triaged This issue or pull request was triaged labels Jul 1, 2025
@nielsleadholm nielsleadholm self-assigned this Jul 3, 2025
@nielsleadholm nielsleadholm self-requested a review July 3, 2025 17:02
Copy link
Contributor

@nielsleadholm nielsleadholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @OgnjenX for putting this together. The majority of my high-level comments are in PR #358 , but I've added a few comments here for points that were only in the RFC.

2. **Probabilistic Vote Mapping**
- Map incoming votes to local object IDs using learned associations
- Weight votes by association confidence
- Handle uncertainty in associations gracefully
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Can you please clarify what this means exactly? What kind of uncertainty are you anticipating?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “uncertainty” refers to noisy or conflicting association signals—e.g., multiple LMs proposing different object IDs with similar confidence, or spurious co-occurrences when objects briefly overlap. In the implementation (UnsupervisedAssociator in src/tbp/monty/frameworks/models/unsupervised_association.py), we address this by maintaining a decayed confidence history (AssociationData.update_confidence() / get_average_confidence()), computing spatial and temporal consistency scores, and combining all of these into a weighted strength via get_association_strength(). An association is only used once its strength clears min_association_threshold, and it keeps decaying if new evidence doesn’t arrive. Instead of locking onto the first co-occurrence, we continuously re-score the link and only forward votes when the accumulated evidence is sufficiently strong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying.


1. **Multi-Modal Hypothesis Clustering**
- Group hypotheses from different LMs based on spatial/temporal consistency
- Use clustering to identify likely same-object hypotheses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Can you please clarify what you mean by clustering here? This is not a term we've used before, so I'm wondering if the proposal to change how votes are processed by LMs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "clustering" in the RFC was a bit of an overstatement—it refers to how we group temporally related associations in the
_calculate_temporal_clustering()
function. Instead of traditional clustering, we analyze how densely associations occur in time. For example, if multiple LMs report the same object ID in quick succession, we consider that a stronger signal than sporadic reports. The implementation in
UnsupervisedAssociator
uses a simple density metric (number of associations per time unit) to score this temporal grouping, which is then factored into the overall association strength. Would you prefer we update the RFC to use "temporal grouping" instead of "clustering" to avoid confusion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. Yes I think temporal consistency is already clear enough, in which case I don't think you need this line.

@OgnjenX
Copy link
Contributor Author

OgnjenX commented Sep 24, 2025

Thanks again @OgnjenX for putting this together. The majority of my high-level comments are in PR #358 , but I've added a few comments here for points that were only in the RFC.

I made changes in this commit that tries to improve the RFC for better clarity, related to the questions you had.

2. **Temporal Sequence Learning**: Dynamic object and scene understanding
3. **Language Grounding**: Associate learned words with grounded objects
4. **Advanced Clustering**: More sophisticated hypothesis grouping algorithms
4. **Richer Hypothesis Grouping**: Explore graph-based or probabilistic grouping atop the learned association strengths once the foundational pipeline is validated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Can you please provide a sentence or two what you mean by graph-based or probabilistic grouping here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that if object A and B appear at the same time, and B and C appear at the same time, maybe it makes sense to keep (small) possibility that A and C belong to the same 'scene' even though they never appeared together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfc:proposal This issue tracks an RFC proposal triaged This issue or pull request was triaged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants