Skip to content

Conversation

@ramyamounir
Copy link
Contributor

@ramyamounir ramyamounir commented Jan 15, 2026

This is the second PR in the implementation project of the burst sampling hypothesis updater. The PR mainly introduces two main features:

  • Decoupling the resampling parameters: In the past, we kept the parameters coupled such that the hypothesis space size stays the same, i.e., we resample the same number of deleted hypotheses every step. Here we decouple this behavior such that the hypothesis space size can change over time. Additionally, a HypothesesSelection container class in introduced to refer to a subset of hypotheses between different parts of the code.
  • Burst sampling: The decoupling of parameters allowed for redesigning the heuristics for deciding on which hypotheses are to be deleted, resampled, or maintained. We delete hypotheses if they don't accumulate enough evidence for a few steps (as measured by their smoothed evidence slope metric). The sampling happens in bursts with a set step duration. If the best hypothesis is not able to accumulate enough evidence, we trigger a burst of hypotheses sampling for all objects. This mechanism can cause some unlikely object hypothesis spaces to have zero hypotheses, some modifications to the code (specifically, LM, GSG, and the channel mapper) were applied to handle this scenario.

Additional, more detailed, context can be found in this PR description

Benchmarks

We will not be updating any benchmarks (other than unsupervised inference) because this PR doesn't set ResamplingHypothesesUpdater. That said, I still ran the benchmarks to make sure we are getting the same benefits we saw in the feature repo. Results of the benchmark runs are tagged with PR#700

base_config_10distinctobj_dist_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 99.29 100 0.71
used_mlh (%) 0.71 0 -0.71
match_steps 35 31 -4
rotation_error (deg) 10.69 8.67 -2.02
runtime (min) 2 2 0
episode_runtime (sec) 9 7 -2

base_config_10distinctobj_surf_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 99.29 100 0.71
used_mlh (%) 0 0 0
match_steps 27 28 1
rotation_error (deg) 11.96 3.65 -8.31
runtime (min) 2 2 0
episode_runtime (sec) 10 12 2

randrot_noise_10distinctobj_dist_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 97 99 2
used_mlh (%) 5 1 -4
match_steps 55 37 -18
rotation_error (deg) 25.14 12.83 -12.31
runtime (min) 5 3 -2
episode_runtime (sec) 31 22 -9

randrot_noise_10distinctobj_dist_on_distm

Metric Baseline Proposed Δ Result
percent_correct (%) 98 100 2
used_mlh (%) 3 1 -2
match_steps 38 35 -3
rotation_error (deg) 15.11 17.65 2.54
runtime (min) 4 3 -1
episode_runtime (sec) 24 19 -5

randrot_noise_10distinctobj_surf_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 100 100 0
used_mlh (%) 1 1 0
match_steps 28 29 1
rotation_error (deg) 25.14 12.35 -12.79
runtime (min) 3 4 1
episode_runtime (sec) 21 24 3

randrot_10distinctobj_surf_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 100 100 0
used_mlh (%) 0 0 0
match_steps 28 28 0
rotation_error (deg) 13.46 9.27 -4.19
runtime (min) 2 2 0
episode_runtime (sec) 12 13 1

randrot_noise_10distinctobj_5lms_dist_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 100 100 0
used_mlh (%) 0 0 0
match_steps 52 55 3
rotation_error (deg) 49.54 44.85 -4.69
runtime (min) 5 5 0
episode_runtime (sec) 42 41 -1

base_10simobj_surf_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 92.14 98.57 6.43
used_mlh (%) 10.71 3.57 -7.14
match_steps 83 56 -27
rotation_error (deg) 11 3.96 -7.04
runtime (min) 6 5 -1
episode_runtime (sec) 31 24 -7

randrot_noise_10simobj_dist_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 78 85 7
used_mlh (%) 36 24 -12
match_steps 201 176 -25
rotation_error (deg) 21.46 30.88 9.42
runtime (min) 10 10 0
episode_runtime (sec) 85 83 -2

randrot_noise_10simobj_surf_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 90 97 7
used_mlh (%) 32 32 0
match_steps 166 165 -1
rotation_error (deg) 27.15 19.94 -7.21
runtime (min) 15 19 4
episode_runtime (sec) 127 149 22

randomrot_rawnoise_10distinctobj_surf_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 65 70 5
used_mlh (%) 83 75 -8
match_steps 13 13 0
rotation_error (deg) 101.6 83.67 -17.93
runtime (min) 4 4 0
episode_runtime (sec) 6 7 1

base_10multi_distinctobj_dist_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 85 71.43 -13.57
used_mlh (%) 12.86 21.43 8.57
match_steps 27 28 1
rotation_error (deg) 25.3 12.13 -13.17
runtime (min) 3 4 1
episode_runtime (sec) 1 2 1

base_77obj_dist_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 90.91 96.54 5.63
used_mlh (%) 12.99 5.63 -7.36
match_steps 102 73 -29
rotation_error (deg) 14.89 10.24 -4.65
runtime (min) 21 15 -6
episode_runtime (sec) 61 41 -20

base_77obj_surf_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 98.7 99.57 0.87
used_mlh (%) 5.19 4.33 -0.86
match_steps 53 41 -12
rotation_error (deg) 6.7 1.56 -5.14
runtime (min) 15 12 -3
episode_runtime (sec) 37 29 -8

randrot_noise_77obj_dist_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 92.21 93.51 1.3
used_mlh (%) 26.84 17.75 -9.09
match_steps 174 140 -34
rotation_error (deg) 34.01 30.58 -3.43
runtime (min) 41 34 -7
episode_runtime (sec) 137 112 -25

randrot_noise_77obj_surf_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 94.37 96.54 2.17
used_mlh (%) 21.65 21.21 -0.44
match_steps 111 111 0
rotation_error (deg) 33.9 23.37 -10.53
runtime (min) 34 33 -1
episode_runtime (sec) 114 110 -4

randrot_noise_77obj_5lms_dist_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 93.51 96.1 2.59
used_mlh (%) 0 0 0
match_steps 72 68 -4
rotation_error (deg) 56.34 58.43 2.09
runtime (min) 13 15 2
episode_runtime (sec) 126 150 24

unsupervised_inference_distinctobj_dist_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 97 99 2
match_steps 99 99 0
runtime (min) 14 11 -3
episode_runtime (sec) 8 6 -2

unsupervised_inference_distinctobj_surf_agent

Metric Baseline Proposed Δ Result
percent_correct (%) 97 100 3
match_steps 99 99 0
runtime (min) 25 19 -6
episode_runtime (sec) 15 11 -4

fix: type hinting fix (optional None)
Copy link
Contributor

@jeremyshoemaker jeremyshoemaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is currently a Draft, but I started reading through it and had some comments that I didn't want to lose, so I just reviewed it anyway.

We can go through some of these when we pair later.

class HypothesesUpdater(Protocol):
def pre_step(self) -> None:
"""Runs once per step before updating the hypotheses."""
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really more of a style thing, so it's not critical. The Python docs aren't helpful on this point either, because they show this style inconsistently.

The usual way to have a method or function that does nothing is to use pass and not .... That said, a method or function that has a docstring doesn't need a body at all. You only need pass when there's no docstring, because there has to be at least one statement inside a block. So these ... could just be removed completely.

available_graph_ids.append(graph_id)
available_graph_evidences.append(np.max(self.evidence[graph_id]))

return available_graph_ids, np.array(available_graph_evidences)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this inconsistency when I loaded up this branch in PyCharm. The type of the second value in this tuple is inconsistent in this function. Up on line 706, we return a list, but here we return an np.array. I think the one on 706 should be changed to match, since lists and np.arrays behave slightly differently.

"""Return evidence for each pose on each graph (pointer)."""
return self.evidence

def hyp_evidences_for_object(self, object_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good method to add, since it actually makes all the places it's used longer than they would be without really adding anything. It also adds in an additional function call which isn't free.

self.evidence[object_id]
# vs
self.hyp_evidences_for_object(object_id)

If this method name is more descriptive, then it might make sense to change the name of self.evidence, but I don't know that it adds much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my notes below in the goal_state_generation changes, since that makes things more complicated.

self.previous_mlh = self.current_mlh
self.current_mlh = self._calculate_most_likely_hypothesis()

self.hypotheses_updater.post_step()
Copy link
Contributor

@jeremyshoemaker jeremyshoemaker Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we use this pre_ and post_ pattern in a lot of places already, but this really would be better as a context manager using __enter__ and __exit__. The __exit__ method also has the advantage of being called when an exception is raised allowing action to be taken if needed.

Then the body of this method would be something like:

with self.hypotheses_updater:
    thread_list = []
    # The rest of the body
    ...

# We only displace existing hypotheses since the newly resampled hypotheses
# should not be affected by the displacement from the last sensory input.
if existing_count > 0:
if len(hypotheses_selection.maintain_ids) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did this elsewhere, so let's be consistent.

Suggested change
if len(hypotheses_selection.maintain_ids) > 0:
if len(hypotheses_selection.maintain_ids):

).inv()
object_possible_poses = self.possible_poses[self.primary_target]
if not len(object_possible_poses):
return -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a sentinel value we're checking for elsewhere? Shouldn't the minimum possible pose error be 0?


def __len__(self) -> int:
"""Returns the total number of hypotheses in the selection."""
return int(self._maintain_mask.size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't ndarray.size already return an int?

a.size returns a standard arbitrary precision Python integer.
~ Docs for ndarray.size

# put in range(-1, 1)
scaled_evidences[graph_id] = (scaled_evidences[graph_id] - 0.5) * 2
if len(evidences[graph_id]):
graph_evidences = evidences[graph_id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not change this for loop to iterate over evidences.items()? That way we don't have to look up the value multiple times, but we still have both the key and the value.

max_evidence = max(max_evidence, np.max(graph_evidences))

for graph_id in evidences.keys():
graph_evidences = evidences[graph_id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing. If we iterate over evidence.items() we get the graph_id and the graph_evidences at the same time.

Comment on lines 171 to 173
self._resampling_multiplier(rlm)
self._resampling_multiplier_maximum(rlm, pose_defined=True)
self._resampling_multiplier_maximum(rlm, pose_defined=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned that the only reason we're doing all of these in a single test is because of concerns about setup costs for the experiment.

I think we should refactor this class to make these individual methods into test_ methods, and move the contents of setUp and get_pretrained_resampling_lm to the setUpClass classmethod. That way, the experiment gets set up only once for this whole class, and then each individual test can test what they need to.

This will shake out any unintended dependencies between these tests, and make it easier to run them in isolation if we're ever debugging a specific issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramyamounir
Copy link
Contributor Author

Thanks @jeremyshoemaker, I addressed the comments we discussed, except for the additional tests. I'll be working on those next.

@ramyamounir
Copy link
Contributor Author

@jeremyshoemaker, I've added a few unit tests that specifically cover the burst sampling behavior. We can discuss if we need to add more when we pair next week.

@ramyamounir ramyamounir marked this pull request as ready for review January 17, 2026 15:18
@ramyamounir ramyamounir added rfc:implementation This issue tracks an RFC implementation triaged This issue or pull request was triaged labels Jan 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfc:implementation This issue tracks an RFC implementation triaged This issue or pull request was triaged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants