feat!: Burst sampling in ResamplingHypothesesUpdater #700

ramyamounir · 2026-01-15T21:59:01Z

This is the second PR in the implementation project of the burst sampling hypothesis updater. The PR mainly introduces two main features:

Decoupling the resampling parameters: In the past, we kept the parameters coupled such that the hypothesis space size stays the same, i.e., we resample the same number of deleted hypotheses every step. Here we decouple this behavior such that the hypothesis space size can change over time. Additionally, a HypothesesSelection container class in introduced to refer to a subset of hypotheses between different parts of the code.
Burst sampling: The decoupling of parameters allowed for redesigning the heuristics for deciding on which hypotheses are to be deleted, resampled, or maintained. We delete hypotheses if they don't accumulate enough evidence for a few steps (as measured by their smoothed evidence slope metric). The sampling happens in bursts with a set step duration. If the best hypothesis is not able to accumulate enough evidence, we trigger a burst of hypotheses sampling for all objects. This mechanism can cause some unlikely object hypothesis spaces to have zero hypotheses, some modifications to the code (specifically, LM, GSG, and the channel mapper) were applied to handle this scenario.

Additional, more detailed, context can be found in this PR description

Benchmarks

We will not be updating any benchmarks (other than unsupervised inference) because this PR doesn't set ResamplingHypothesesUpdater. That said, I still ran the benchmarks to make sure we are getting the same benefits we saw in the feature repo. Results of the benchmark runs are tagged with PR#700

base_config_10distinctobj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	99.29	100	0.71	✅
used_mlh (%)	0.71	0	-0.71	✅
match_steps	35	31	-4	✅
rotation_error (deg)	10.69	8.67	-2.02	✅
runtime (min)	2	2	0	⬜
episode_runtime (sec)	9	7	-2	✅

base_config_10distinctobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	99.29	100	0.71	✅
used_mlh (%)	0	0	0	⬜
match_steps	27	28	1	❌
rotation_error (deg)	11.96	3.65	-8.31	✅
runtime (min)	2	2	0	⬜
episode_runtime (sec)	10	12	2	❌

randrot_noise_10distinctobj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	97	99	2	✅
used_mlh (%)	5	1	-4	✅
match_steps	55	37	-18	✅
rotation_error (deg)	25.14	12.83	-12.31	✅
runtime (min)	5	3	-2	✅
episode_runtime (sec)	31	22	-9	✅

randrot_noise_10distinctobj_dist_on_distm

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	98	100	2	✅
used_mlh (%)	3	1	-2	✅
match_steps	38	35	-3	✅
rotation_error (deg)	15.11	17.65	2.54	❌
runtime (min)	4	3	-1	✅
episode_runtime (sec)	24	19	-5	✅

randrot_noise_10distinctobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	100	100	0	⬜
used_mlh (%)	1	1	0	⬜
match_steps	28	29	1	❌
rotation_error (deg)	25.14	12.35	-12.79	✅
runtime (min)	3	4	1	❌
episode_runtime (sec)	21	24	3	❌

randrot_10distinctobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	100	100	0	⬜
used_mlh (%)	0	0	0	⬜
match_steps	28	28	0	⬜
rotation_error (deg)	13.46	9.27	-4.19	✅
runtime (min)	2	2	0	⬜
episode_runtime (sec)	12	13	1	❌

randrot_noise_10distinctobj_5lms_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	100	100	0	⬜
used_mlh (%)	0	0	0	⬜
match_steps	52	55	3	❌
rotation_error (deg)	49.54	44.85	-4.69	✅
runtime (min)	5	5	0	⬜
episode_runtime (sec)	42	41	-1	✅

base_10simobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	92.14	98.57	6.43	✅
used_mlh (%)	10.71	3.57	-7.14	✅
match_steps	83	56	-27	✅
rotation_error (deg)	11	3.96	-7.04	✅
runtime (min)	6	5	-1	✅
episode_runtime (sec)	31	24	-7	✅

randrot_noise_10simobj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	78	85	7	✅
used_mlh (%)	36	24	-12	✅
match_steps	201	176	-25	✅
rotation_error (deg)	21.46	30.88	9.42	❌
runtime (min)	10	10	0	⬜
episode_runtime (sec)	85	83	-2	✅

randrot_noise_10simobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	90	97	7	✅
used_mlh (%)	32	32	0	⬜
match_steps	166	165	-1	✅
rotation_error (deg)	27.15	19.94	-7.21	✅
runtime (min)	15	19	4	❌
episode_runtime (sec)	127	149	22	❌

randomrot_rawnoise_10distinctobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	65	70	5	✅
used_mlh (%)	83	75	-8	✅
match_steps	13	13	0	⬜
rotation_error (deg)	101.6	83.67	-17.93	✅
runtime (min)	4	4	0	⬜
episode_runtime (sec)	6	7	1	❌

base_10multi_distinctobj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	85	71.43	-13.57	❌
used_mlh (%)	12.86	21.43	8.57	❌
match_steps	27	28	1	❌
rotation_error (deg)	25.3	12.13	-13.17	✅
runtime (min)	3	4	1	❌
episode_runtime (sec)	1	2	1	❌

base_77obj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	90.91	96.54	5.63	✅
used_mlh (%)	12.99	5.63	-7.36	✅
match_steps	102	73	-29	✅
rotation_error (deg)	14.89	10.24	-4.65	✅
runtime (min)	21	15	-6	✅
episode_runtime (sec)	61	41	-20	✅

base_77obj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	98.7	99.57	0.87	✅
used_mlh (%)	5.19	4.33	-0.86	✅
match_steps	53	41	-12	✅
rotation_error (deg)	6.7	1.56	-5.14	✅
runtime (min)	15	12	-3	✅
episode_runtime (sec)	37	29	-8	✅

randrot_noise_77obj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	92.21	93.51	1.3	✅
used_mlh (%)	26.84	17.75	-9.09	✅
match_steps	174	140	-34	✅
rotation_error (deg)	34.01	30.58	-3.43	✅
runtime (min)	41	34	-7	✅
episode_runtime (sec)	137	112	-25	✅

randrot_noise_77obj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	94.37	96.54	2.17	✅
used_mlh (%)	21.65	21.21	-0.44	✅
match_steps	111	111	0	⬜
rotation_error (deg)	33.9	23.37	-10.53	✅
runtime (min)	34	33	-1	✅
episode_runtime (sec)	114	110	-4	✅

randrot_noise_77obj_5lms_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	93.51	96.1	2.59	✅
used_mlh (%)	0	0	0	⬜
match_steps	72	68	-4	✅
rotation_error (deg)	56.34	58.43	2.09	❌
runtime (min)	13	15	2	❌
episode_runtime (sec)	126	150	24	❌

unsupervised_inference_distinctobj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	97	99	2	✅
match_steps	99	99	0	⬜
runtime (min)	14	11	-3	✅
episode_runtime (sec)	8	6	-2	✅

unsupervised_inference_distinctobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	97	100	3	✅
match_steps	99	99	0	⬜
runtime (min)	25	19	-6	✅
episode_runtime (sec)	15	11	-4	✅

…paces (#5)

fix: type hinting fix (optional None)

…er_node`

jeremyshoemaker

I know this is currently a Draft, but I started reading through it and had some comments that I didn't want to lose, so I just reviewed it anyway.

We can go through some of these when we pair later.

jeremyshoemaker · 2026-01-16T16:40:26Z

src/tbp/monty/frameworks/models/evidence_matching/hypotheses_updater.py

 class HypothesesUpdater(Protocol):
+    def pre_step(self) -> None:
+        """Runs once per step before updating the hypotheses."""
+        ...


This is really more of a style thing, so it's not critical. The Python docs aren't helpful on this point either, because they show this style inconsistently.

The usual way to have a method or function that does nothing is to use pass and not .... That said, a method or function that has a docstring doesn't need a body at all. You only need pass when there's no docstring, because there has to be at least one statement inside a block. So these ... could just be removed completely.

jeremyshoemaker · 2026-01-16T16:43:07Z

src/tbp/monty/frameworks/models/evidence_matching/learning_module.py

+                available_graph_ids.append(graph_id)
+                available_graph_evidences.append(np.max(self.evidence[graph_id]))
+
+        return available_graph_ids, np.array(available_graph_evidences)


I noticed this inconsistency when I loaded up this branch in PyCharm. The type of the second value in this tuple is inconsistent in this function. Up on line 706, we return a list, but here we return an np.array. I think the one on 706 should be changed to match, since lists and np.arrays behave slightly differently.

jeremyshoemaker · 2026-01-16T16:45:51Z

src/tbp/monty/frameworks/models/evidence_matching/learning_module.py

        """Return evidence for each pose on each graph (pointer)."""
        return self.evidence

+    def hyp_evidences_for_object(self, object_id):


I don't think this is a good method to add, since it actually makes all the places it's used longer than they would be without really adding anything. It also adds in an additional function call which isn't free.

self.evidence[object_id] # vs self.hyp_evidences_for_object(object_id)

If this method name is more descriptive, then it might make sense to change the name of self.evidence, but I don't know that it adds much.

See my notes below in the goal_state_generation changes, since that makes things more complicated.

jeremyshoemaker · 2026-01-16T16:54:57Z

src/tbp/monty/frameworks/models/evidence_matching/learning_module.py

        self.previous_mlh = self.current_mlh
        self.current_mlh = self._calculate_most_likely_hypothesis()

+        self.hypotheses_updater.post_step()


I know we use this pre_ and post_ pattern in a lot of places already, but this really would be better as a context manager using __enter__ and __exit__. The __exit__ method also has the advantage of being called when an exception is raised allowing action to be taken if needed.

Then the body of this method would be something like:

with self.hypotheses_updater: thread_list = [] # The rest of the body ...

jeremyshoemaker · 2026-01-16T17:06:52Z

src/tbp/monty/frameworks/models/evidence_matching/resampling_hypotheses_updater.py

            # We only displace existing hypotheses since the newly resampled hypotheses
            # should not be affected by the displacement from the last sensory input.
-            if existing_count > 0:
+            if len(hypotheses_selection.maintain_ids) > 0:


You did this elsewhere, so let's be consistent.

Suggested change

if len(hypotheses_selection.maintain_ids) > 0:

if len(hypotheses_selection.maintain_ids):

jeremyshoemaker · 2026-01-16T17:40:41Z

src/tbp/monty/frameworks/models/mixins/no_reset_evidence.py

-        ).inv()
+        object_possible_poses = self.possible_poses[self.primary_target]
+        if not len(object_possible_poses):
+            return -1


Is this a sentinel value we're checking for elsewhere? Shouldn't the minimum possible pose error be 0?

jeremyshoemaker · 2026-01-16T17:53:28Z

src/tbp/monty/frameworks/utils/evidence_matching.py

+
+    def __len__(self) -> int:
+        """Returns the total number of hypotheses in the selection."""
+        return int(self._maintain_mask.size)


Doesn't ndarray.size already return an int?

a.size returns a standard arbitrary precision Python integer.
~ Docs for ndarray.size

jeremyshoemaker · 2026-01-16T17:56:14Z

src/tbp/monty/frameworks/utils/graph_matching_utils.py

-            # put in range(-1, 1)
-            scaled_evidences[graph_id] = (scaled_evidences[graph_id] - 0.5) * 2
+            if len(evidences[graph_id]):
+                graph_evidences = evidences[graph_id]


Why not change this for loop to iterate over evidences.items()? That way we don't have to look up the value multiple times, but we still have both the key and the value.

jeremyshoemaker · 2026-01-16T17:57:41Z

src/tbp/monty/frameworks/utils/graph_matching_utils.py

+                max_evidence = max(max_evidence, np.max(graph_evidences))
+
        for graph_id in evidences.keys():
+            graph_evidences = evidences[graph_id]


Same thing. If we iterate over evidence.items() we get the graph_id and the graph_evidences at the same time.

jeremyshoemaker · 2026-01-16T18:08:42Z

tests/unit/frameworks/models/evidence_matching/resampling_hypotheses_updater_test.py

+        self._resampling_multiplier(rlm)
+        self._resampling_multiplier_maximum(rlm, pose_defined=True)
+        self._resampling_multiplier_maximum(rlm, pose_defined=False)


You mentioned that the only reason we're doing all of these in a single test is because of concerns about setup costs for the experiment.

I think we should refactor this class to make these individual methods into test_ methods, and move the contents of setUp and get_pretrained_resampling_lm to the setUpClass classmethod. That way, the experiment gets set up only once for this whole class, and then each individual test can test what they need to.

This will shake out any unintended dependencies between these tests, and make it easier to run them in isolation if we're ever debugging a specific issue.

See https://docs.python.org/3/library/unittest.html#unittest.TestCase.setUpClass

amend

ramyamounir · 2026-01-17T00:48:35Z

Thanks @jeremyshoemaker, I addressed the comments we discussed, except for the additional tests. I'll be working on those next.

ramyamounir · 2026-01-17T15:17:13Z

@jeremyshoemaker, I've added a few unit tests that specifically cover the burst sampling behavior. We can discuss if we need to add more when we pair next week.

ramyamounir added 30 commits August 15, 2025 07:10

feat: decouple resampling params (#1)

efa8696

Merge branch 'main' into dev

fb16234

Merge branch 'main' into dev

ce1c37d

Merge branch 'main' into dev

fcb98a2

Merge branch 'main' into dev

9e72e94

Merge branch 'main' into dev

0b9ec0e

Merge branch 'main' into dev

823c741

Merge branch 'main' into dev

03daf45

Merge branch 'main' into dev

cea1465

Merge branch 'main' into dev

867b6c0

Merge branch 'main' into dev

7f07e97

Merge branch 'main' into dev

54fe5cd

feat!: add support for minimum maintained hypotheses (#2)

d84b031

Merge branch 'main' into dev

93f4e47

feat: symmetry remapping fix for consistent ids (#3)

d3da576

Merge branch 'main' into dev

0dff3d7

refactor: adjust default value of evidence_slope_threshold to 0.3

7a65bcb

Merge branch 'main' into dev

0227614

refactor: update gsg and learning module to handle empty hypothesis s…

262216f

…paces (#5)

Merge branch 'main' into dev

66d1bad

tests: provide init_hyp_space to sample count tests

f650dba

docs: update docstring to add evidence_slope_threshold expected range

ab2c18f

refactor: update last_possible_hypotheses remapping

a123357

chore: added type hinting to _check_for_symmetry

65119de

fix: type hinting fix (optional None)

refactor: move object_id check in symmetry logic

da1dff6

refactor: remove unneccessary variable

732c084

style: ruff RET504

9855df4

docs: fix range of evidence_slope_threshold in docstring

3d1fcee

docs: add comment about new_informed being divisible by `num_hyps_p…

97a92fa

…er_node`

Merge branch 'main' into dev

085a1e1

ramyamounir added 6 commits January 6, 2026 08:59

Merge branch 'main' into dev

554b1df

Merge branch 'main' into dev

b816f6e

refactor: post main merge fixes

c7a714e

style: rename graph_id to next_graph_id for ruff

9038e54

chore: update copyrights notice

0483ff2

Merge branch 'main' into dev

0f87feb

jeremyshoemaker reviewed Jan 16, 2026

View reviewed changes

ramyamounir added 14 commits January 16, 2026 15:54

test: mock out max_slope

251d259

refactor: return consistent array type

2013708

refactor: {pre,post}_step into context manager

8df6016

amend

style: remove the >0 in List>0

dd61950

fix: incorrect return type

483bbf7

chore: typo fix

55994a0

fix: remove unused variables

7f68bc6

fix: telemetry rotation as euler not scipy Rotation

6747e5a

refactor: remove unnecessary int

a4269fa

refactor: iterate over dict items, instead of keys

d8f1fbe

test: use unittest setUpClass

55ebe4d

refactor: inline get_all_evidences()

9f500bd

refactor: inline hyp_evidence_for_object

9d878ee

Merge branch 'main' into dev

6547f4e

ramyamounir added 2 commits January 17, 2026 10:11

chore: add the hypothesis library to pyproject.toml dev packages

3afb8c6

test: add new unit tests around burst sampling specifically

3b70077

ramyamounir marked this pull request as ready for review January 17, 2026 15:18

ramyamounir requested a review from jeremyshoemaker January 17, 2026 15:18

ramyamounir assigned jeremyshoemaker Jan 17, 2026

ramyamounir added rfc:implementation This issue tracks an RFC implementation triaged This issue or pull request was triaged labels Jan 17, 2026

	if len(hypotheses_selection.maintain_ids) > 0:
	if len(hypotheses_selection.maintain_ids):

feat!: Burst sampling in ResamplingHypothesesUpdater #700

Are you sure you want to change the base?

feat!: Burst sampling in ResamplingHypothesesUpdater #700

Uh oh!

Conversation

ramyamounir commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

base_config_10distinctobj_dist_agent

base_config_10distinctobj_surf_agent

randrot_noise_10distinctobj_dist_agent

randrot_noise_10distinctobj_dist_on_distm

randrot_noise_10distinctobj_surf_agent

randrot_10distinctobj_surf_agent

randrot_noise_10distinctobj_5lms_dist_agent

base_10simobj_surf_agent

randrot_noise_10simobj_dist_agent

randrot_noise_10simobj_surf_agent

randomrot_rawnoise_10distinctobj_surf_agent

base_10multi_distinctobj_dist_agent

base_77obj_dist_agent

base_77obj_surf_agent

randrot_noise_77obj_dist_agent

randrot_noise_77obj_surf_agent

randrot_noise_77obj_5lms_dist_agent

unsupervised_inference_distinctobj_dist_agent

unsupervised_inference_distinctobj_surf_agent

Uh oh!

jeremyshoemaker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremyshoemaker Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramyamounir commented Jan 17, 2026

Uh oh!

ramyamounir commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ramyamounir commented Jan 15, 2026 •

edited

Loading

jeremyshoemaker Jan 16, 2026 •

edited

Loading