-
Notifications
You must be signed in to change notification settings - Fork 272
refactor!: extract TouchObject positioning procedure harder #356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor!: extract TouchObject positioning procedure harder #356
Conversation
|
📚 Documentation Preview ✅ A preview of the documentation changes in this PR is available for maintainers at: Last updated: 2025-06-28 00:19 UTC |
|
Thanks for opening this @tristanls , just to check, have you added a catch for the infinite while loop? Is it possible to highlight what has changed vs. #339 , if anything? I feel like it would be worth introducing that fix in this PR, otherwise our experiments will take much longer to run. It can be something relatively simple, like not allowing more than 100 steps for any particular attempt to touch object again. |
vkakerbeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the logic looks good to me. I just left one through on incrementing the counter.
@nielsleadholm I think the termination check is the PositioningProcedureTruncated exception raised after 32 steps. @tristanls what's the reason you chose 32 steps? It looks like we now have significantly more time out episodes (% used mlh, second column). Could it be that 32 steps is not enough? Also, why are the runtimes still almost double or at least 1.5x in many of the experiments?
| Useful for positioning surface agents that "touch" the object. | ||
| Called at the beginning of each episode and after "jump" has been initiated | ||
| by a model-based policy. In addition, it can be called when the surface agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? I thought if after a jump we are not on the object, the jump is reversed instead of trying to find the object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, this docstring does not contradict what you said. TouchObject is called at the beginning of each episode. If the jump is successful, the SurfacePolicy takes over, but after that, it can fall off the object and invoke TouchObject.
I think and after "jump" has been initiated by a model-based policy could be removed, since SurfacePolicy takes over and has to fall off object before TouchObject is invoked again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was referring to the and after "jump" has been initiated by a model-based policy part which, as far as I understand, doesn't invoke TouchObject by itself.
| forward_distance=move_forward_amount, | ||
| ) | ||
| ) | ||
| self._step_count += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this counter only be incremented when we are searching the object (not if we already found it and are just moving forward)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the detail you are highlighting. What would the alternative be?
This counter is not incremented when we are done and MoveForward
if depth_at_center < 1.0:
distance = (
depth_at_center
- self._desired_object_distance
- state[self.agent_id]["sensors"][f"{self._sensor_id}.depth"][
"position"
][2]
)
logger.debug(f"Move to touch visible object, forward by {distance}")
self._terminated_and_succeeded = True
return PositioningProcedureResult(
actions=[MoveForward(agent_id=self.agent_id, distance=distance)]
)I don't think it would make a difference to increment self._step_count if we are done and MoveForward, as success is checked before the step count.
if self._terminated_and_succeeded:
return PositioningProcedureResult(success=True, terminated=True)
if self._step_count >= self._max_steps:
return PositioningProcedureResult(terminated=True, truncated=True)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, you are right, the counter is already not incremented if we are just moving towards the object
| # Check if we have poor visualization of the object | ||
| if ( | ||
| self.processed_observations.get_feature_by_name("object_coverage") < 0.1 | ||
| or self.attempting_to_find_object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason we can remove this second check? As far as I understand self.attempting_to_find_object is not just set based on the object coverage but also the depth at the center of the patch. Is this now already accounted for somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this now replaced with _terminated_and_succeeded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.attempting_to_find_object was always going to be removed and only served to implement TouchObject logic outside of TouchObject.
No, it is not replaced by _terminated_and_succeeded.
self.attempting_to_find_object was introduced back in #310. Its purpose was to ensure that the touch_object actions sequence always ends in MoveForward. This check ensured that SurfacePolicy would not interrupt the touch_object code from running to completion. SurfacePolicy can no longer interrupt TouchObject as their execution paths no longer interweave, so this check has no purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah okay, thanks
|
Thanks Viviane, yeah it is strange that we are both getting more time-outs, but also that the run-times are still so much longer. These are essentially opposing outcomes, i.e. setting One possible sanity check to consider is, make all the changes as they are, except that the touch-object positioning procedure reaches in and temporarily iterates the internal counter of Monty. This would replicate the same counter (and therefore termination) process as before the separation out of this policy. That would help determine to what degree the issue is solely due to the counter / infinite while loop, vs. there being an additional issue. |
Yes, @vkakerbeck is correct, the The reason for 32: In
We already ran the above experiment. This is the #351 benchmark results, where the runtimes are unaffected. The #353 is a bit closer, so I'm running those benchmarks, but I expect the runtimes to be unaffected since all steps are being counted. #353 benchmarks (no runtime change)base_config_10distinctobj_surf_agent,100.00,2.14,27,18.04,3,11 - https://wandb.ai/thousand-brains-project/Monty/runs/785fi9ty/overview -base_config_10distinctobj_surf_agent,100.00,0.00,28,14.01,2,11
+base_config_10distinctobj_surf_agent,100.00,2.14,27,18.04,3,11randrot_noise_10distinctobj_surf_agent,99.00,4.00,27,29.18,3,17 - https://wandb.ai/thousand-brains-project/Monty/runs/cs0rgzxc/overview -randrot_noise_10distinctobj_surf_agent,100.00,1.00,29,20.42,3,21
+randrot_noise_10distinctobj_surf_agent,99.00,4.00,27,29.18,3,17randrot_10distinctobj_surf_agent,100.00,2.00,27,19.11,2,9 - https://wandb.ai/thousand-brains-project/Monty/runs/8uokgnpq/overview -randrot_10distinctobj_surf_agent,100.00,0.00,28,17.23,2,10
+randrot_10distinctobj_surf_agent,100.00,2.00,27,19.11,2,9base_10simobj_surf_agent,94.29,10.71,53,14.32,5,19 -base_10simobj_surf_agent,93.57,8.57,70,13.98,6,27
+base_10simobj_surf_agent,94.29,10.71,53,14.32,5,19randrot_noise_10simobj_surf_agent,90.00,41.00,147,27.18,14,100 - https://wandb.ai/thousand-brains-project/Monty/runs/b67ufq3n/overview -randrot_noise_10simobj_surf_agent,93.00,34.00,172,20.94,16,137
+randrot_noise_10simobj_surf_agent,90.00,41.00,147,27.18,14,100randomrot_rawnoise_10distinctobj_surf_agent,TBD - https://wandb.ai/thousand-brains-project/Monty/runs/ovqaz05c/overview -base_77obj_surf_agent,98.70,6.49,56,12.94,12,34
+base_77obj_surf_agent,98.27,8.66,43,9.20,10,27randrot_noise_77obj_surf_agent,TBD - https://wandb.ai/thousand-brains-project/Monty/runs/eh8zah6j/overview Here's what I'm thinking... The Solutions: I think there are at least three possible solutions to the increased runtime and not counting all the steps.
# TODO: Note that a PositioningProcedure is not a MotorPolicy, so
# using it here to keep the agent in touch with the surface is
# questionable. However, in order for the SurfacePolicy to
# take over this responsibility, the main MontyExperiment loop
# between Monty, MotorSystem, and DataLoaders needs to be updated
# first. Once invoking the MotorSystem includes `observation` as
# an argument, e.g. motor_system(observation), then the
# SurfacePolicy will be capable of implementing logic similar to
# the TouchObject positioning procedure instead of raising
# ObjectNotVisible and relying on the dataloader to handle
# this.
|
|
Thanks for all the details. I have a couple of clarifying questions to make sure I understand:
|
|
@vkakerbeck randomrot_rawnoise_10distinctobj_surf_agent improved accuracy and decreased number of time outs. To me, this indictes random variation and not something being wrong. The formula for |
|
Re. the results of Re. the sanity check I suggested, it's not clear to me that that is the exact experiment I'm suggesting. To clarify more what I mean, the following step counters are iterated in the old code due to the use of MontyBase
...
pass_features_directly_to_motor_system()
...
self.total_steps += 1
self.episode_steps += 1and MontyObjectRecognitionExperiment
...
run_episode_steps()
...
# loader_step increments everytime enumerate is called
for loader_step, observation in enumerate(self.dataloader):
...
if loader_step >= (self.max_total_steps):
logger.info(f"Terminated due to maximum episode steps : {loader_step}")
self.model.deal_with_time_out()
return loader_stepBelow is an outline of what I'm suggesting we try as a temporary replication of the old step counting, but with everything else using the new code: Firstly, set your new MontyObjectRecognitionExperiment
# Initialize a temp_touch_object_counter at start of episode
self.temp_touch_object_counter = 0
...
run_episode_steps()
...
if self.temp_touch_object_counter + loader_step >= (self.max_total_steps):
logger.info(f"Terminated due to maximum episode steps : {loader_step}")
self.model.deal_with_time_out()
return loader_stepTouchObject
...
positioning_call()
...
# Reach into and modify step counters manually
monty_class.total_steps += 1
monty_class.episode_steps += 1
monty_experiment.temp_touch_object_counter += 1With the above setup, and if there are no other outstanding issues with the code, then we should see similar results (episode run time and time-out percentages) to our older experiments. Otherwise, I agree with your description that the fact that these meta-counters no longer consider the touch-object steps is the reason the runtime is increasing. That said, I'm still concerned about the increase in time-outs. As with Friday, happy to jump on a call and discuss if that's helpful! |

This pull request completes the extraction of the
TouchObjectpositioning procedure fromSurfacePolicy.Note
This pull request replaces #339, now that we've identified the infinite loop introduced in that pull request.
Additionally, it cleans up incremental changes made in #347, #348, #349, and #350.
Now, when
SurfacePolicy.dynamic_callcannot touch/see the object, and raisesObjectNotVisible, the data loader will invoke theTouchObjectpositioning procedure, placing the agent back on the object and return the observation from the object.Notably, this doesn't feel right as we use a privileged (due to
"view_finder"use) positioning procedure to keep a sensor in touch with the object. The future solution here will be to implement a similar touch object logic internally to theSurfacePolicy. This may be easier after refactoring the main Monty loop between the data loader, model, and the motor system to be more reinforcement-learning-like.The algorithmic change is that all the positioning actions within the
TouchObjectpositioning procedure are not passed to Monty for processing (except for the last observation when the procedure finds the object). This means Monty is not internally incrementing any steps while theTouchObjectpositioning procedure is running. This also means that Monty will not give up on an episode if too many steps are taken without seeing the object. TheTouchObjectpositioning procedure has amax_stepsparameter (defaults to32) that, when exceeded, will truncate the positioning procedure. The data loader'sself.touch_object(...)will check if the result is truncated, and if it is, it will raisePositioningProcedureTruncated. This error will be caught by theMontyObjectRecognitionExperiment, which will then give up on the episode and proceed to the next one:I skipped most of the typing changes from #339 in this pull request. I'll make a follow-up PR with those separately. (I usually would keep them together as new code in this pull request isn't typed well, but I want to get this pull request merged sooner rather than later, so choosing to have less to review this time 😄).
Benchmarks
compared with 7e57ee9
base_config_10distinctobj_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/wm24o9i6/overview
base_config_10distinctobj_surf_agent,100.00,7.86,27,22.91,5,10 - https://wandb.ai/thousand-brains-project/Monty/runs/n81dwsph/overview
randrot_noise_10distinctobj_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/d857a5ka/overview
randrot_noise_10distinctobj_dist_on_distm ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/yqpmth97/overview
randrot_noise_10distinctobj_surf_agent,100.00,14.00,26,42.34,5,16 - https://wandb.ai/thousand-brains-project/Monty/runs/d9v25e0r/overview
randrot_10distinctobj_surf_agent,100.00,9.00,26,24.50,4,9 - https://wandb.ai/thousand-brains-project/Monty/runs/o2579e1f/overview
randrot_noise_10distinctobj_5lms_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/r5mpkuqd/overview
base_10simobj_surf_agent,93.57,17.86,42,15.98,11,18 - https://wandb.ai/thousand-brains-project/Monty/runs/fgcmqcfh/overview
randrot_noise_10simobj_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/ivgji1tw/overview
randrot_noise_10simobj_surf_agent,91.00,44.00,121,24.26,22,93 - https://wandb.ai/thousand-brains-project/Monty/runs/kdmoe5hw/overview
randomrot_rawnoise_10distinctobj_surf_agent,68.00,72.00,15,89.35,6,7 - https://wandb.ai/thousand-brains-project/Monty/runs/v7ut2z2b/overview
base_10multi_distinctobj_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/qm83e74l/overview
base_77obj_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/auvyhgx2/overview
base_77obj_surf_agent,97.40,11.69,35,9.23,17,23 - https://wandb.ai/thousand-brains-project/Monty/runs/x6kvmbqs/overview
randrot_noise_77obj_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/85jd2gdb/overview
randrot_noise_77obj_surf_agent,91.77,33.77,90,41.58,34,82 - https://wandb.ai/thousand-brains-project/Monty/runs/moxm8tki/overview
randrot_noise_77obj_5lms_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/u7xofaow/overview