Skip to content

Conversation

@tristanls
Copy link
Contributor

@tristanls tristanls commented Jun 27, 2025

This pull request completes the extraction of the TouchObject positioning procedure from SurfacePolicy.

Note

This pull request replaces #339, now that we've identified the infinite loop introduced in that pull request.

Additionally, it cleans up incremental changes made in #347, #348, #349, and #350.

Now, when SurfacePolicy.dynamic_call cannot touch/see the object, and raises ObjectNotVisible, the data loader will invoke the TouchObject positioning procedure, placing the agent back on the object and return the observation from the object.

# DataLoader
def __next__(self):
    # ...
    try:
        self._action = self.motor_system()
    except ObjectNotVisible:
        self.touch_object(
            sensor_id="view_finder",
            desired_object_distance=self.motor_system._policy.desired_object_distance,
        )
        self.motor_system._state[self.motor_system._policy.agent_id][
            "motor_only_step"
        ] = True
    else:
        # ...
    return self._observation

Notably, this doesn't feel right as we use a privileged (due to "view_finder" use) positioning procedure to keep a sensor in touch with the object. The future solution here will be to implement a similar touch object logic internally to the SurfacePolicy. This may be easier after refactoring the main Monty loop between the data loader, model, and the motor system to be more reinforcement-learning-like.

The algorithmic change is that all the positioning actions within the TouchObject positioning procedure are not passed to Monty for processing (except for the last observation when the procedure finds the object). This means Monty is not internally incrementing any steps while the TouchObject positioning procedure is running. This also means that Monty will not give up on an episode if too many steps are taken without seeing the object. The TouchObject positioning procedure has a max_steps parameter (defaults to 32) that, when exceeded, will truncate the positioning procedure. The data loader's self.touch_object(...) will check if the result is truncated, and if it is, it will raise PositioningProcedureTruncated. This error will be caught by the MontyObjectRecognitionExperiment, which will then give up on the episode and proceed to the next one:

# TouchObject
    def positioning_call(
        self, observation: Observations, state: MotorSystemState
    ) -> PositioningProcedureResult:
        if self._terminated_and_succeeded:
            return PositioningProcedureResult(success=True, terminated=True)

        if self._step_count >= self._max_steps: # **truncating condition**
            return PositioningProcedureResult(terminated=True, truncated=True)
        
        # ...

# DataLoader
def touch_object(...) -> None:
    # ...
    result = positioning_procedure.positioning_call(
        self._observation, self.motor_system._state
    )
    while not result.terminated and not result.truncated:
        for action in result.actions:
            self._observation, proprio_state = self.dataset[action]
            self.motor_system._state = MotorSystemState(proprio_state)

        result = positioning_procedure.positioning_call(
            self._observation, self.motor_system._state
        )

    if result.truncated: # **truncation check and raise**
        raise PositioningProcedureTruncated

    if not result.success:
        raise ObjectNotVisible

# MontyObjectRecognitionExperiment
    def run_episode_steps(self) -> int:
        loader_step = 0
        try:
            for loader_step, observation in enumerate(self.dataloader):
                # ...
        except PositioningProcedureTruncated: # **catch error from enumerate(self.dataloader)**
            logger.info("Terminated due to PositioningProcedureTruncated")
            self.model.deal_with_time_out()
        else:
            # handle case where spiral policy calls StopIterator in motor policy
            self.model.set_is_done()

        return loader_step

I skipped most of the typing changes from #339 in this pull request. I'll make a follow-up PR with those separately. (I usually would keep them together as new code in this pull request isn't typed well, but I want to get this pull request merged sooner rather than later, so choosing to have less to review this time 😄).

Benchmarks

compared with 7e57ee9

base_config_10distinctobj_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/wm24o9i6/overview
base_config_10distinctobj_surf_agent,100.00,7.86,27,22.91,5,10 - https://wandb.ai/thousand-brains-project/Monty/runs/n81dwsph/overview

-base_config_10distinctobj_surf_agent,100.00,0.00,28,14.01,2,11
+base_config_10distinctobj_surf_agent,100.00,7.86,27,22.91,5,10

randrot_noise_10distinctobj_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/d857a5ka/overview
randrot_noise_10distinctobj_dist_on_distm ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/yqpmth97/overview
randrot_noise_10distinctobj_surf_agent,100.00,14.00,26,42.34,5,16 - https://wandb.ai/thousand-brains-project/Monty/runs/d9v25e0r/overview

-randrot_noise_10distinctobj_surf_agent,100.00,1.00,29,20.42,3,21
+randrot_noise_10distinctobj_surf_agent,100.00,14.00,26,42.34,5,16

randrot_10distinctobj_surf_agent,100.00,9.00,26,24.50,4,9 - https://wandb.ai/thousand-brains-project/Monty/runs/o2579e1f/overview

-randrot_10distinctobj_surf_agent,100.00,0.00,28,17.23,2,10
+randrot_10distinctobj_surf_agent,100.00,9.00,26,24.50,4,9

randrot_noise_10distinctobj_5lms_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/r5mpkuqd/overview
base_10simobj_surf_agent,93.57,17.86,42,15.98,11,18 - https://wandb.ai/thousand-brains-project/Monty/runs/fgcmqcfh/overview

-base_10simobj_surf_agent,93.57,8.57,70,13.98,6,27
+base_10simobj_surf_agent,93.57,17.86,42,15.98,11,18

randrot_noise_10simobj_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/ivgji1tw/overview
randrot_noise_10simobj_surf_agent,91.00,44.00,121,24.26,22,93 - https://wandb.ai/thousand-brains-project/Monty/runs/kdmoe5hw/overview

-randrot_noise_10simobj_surf_agent,93.00,34.00,172,20.94,16,137
+randrot_noise_10simobj_surf_agent,91.00,44.00,121,24.26,22,93

randomrot_rawnoise_10distinctobj_surf_agent,68.00,72.00,15,89.35,6,7 - https://wandb.ai/thousand-brains-project/Monty/runs/v7ut2z2b/overview

-randomrot_rawnoise_10distinctobj_surf_agent,64.00,77.00,16,106.51,4,7
+randomrot_rawnoise_10distinctobj_surf_agent,68.00,72.00,15,89.35,6,7

base_10multi_distinctobj_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/qm83e74l/overview

base_77obj_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/auvyhgx2/overview
base_77obj_surf_agent,97.40,11.69,35,9.23,17,23 - https://wandb.ai/thousand-brains-project/Monty/runs/x6kvmbqs/overview

-base_77obj_surf_agent,98.70,6.49,56,12.94,12,34
+base_77obj_surf_agent,97.40,11.69,35,9.23,17,23

randrot_noise_77obj_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/85jd2gdb/overview
randrot_noise_77obj_surf_agent,91.77,33.77,90,41.58,34,82 - https://wandb.ai/thousand-brains-project/Monty/runs/moxm8tki/overview

-randrot_noise_77obj_surf_agent,93.94,23.38,115,34.80,32,113
+randrot_noise_77obj_surf_agent,91.77,33.77,90,41.58,34,82

randrot_noise_77obj_5lms_dist_agent ✅ - https://wandb.ai/thousand-brains-project/Monty/runs/u7xofaow/overview

@github-actions
Copy link

github-actions bot commented Jun 28, 2025

📚 Documentation Preview

✅ A preview of the documentation changes in this PR is available for maintainers at:
https://thousandbrainsproject.readme.io/v0.7-tristanls-touch-object-again

Last updated: 2025-06-28 00:19 UTC

@nielsleadholm
Copy link
Contributor

Thanks for opening this @tristanls , just to check, have you added a catch for the infinite while loop? Is it possible to highlight what has changed vs. #339 , if anything? I feel like it would be worth introducing that fix in this PR, otherwise our experiments will take much longer to run. It can be something relatively simple, like not allowing more than 100 steps for any particular attempt to touch object again.

Copy link
Contributor

@vkakerbeck vkakerbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the logic looks good to me. I just left one through on incrementing the counter.

@nielsleadholm I think the termination check is the PositioningProcedureTruncated exception raised after 32 steps. @tristanls what's the reason you chose 32 steps? It looks like we now have significantly more time out episodes (% used mlh, second column). Could it be that 32 steps is not enough? Also, why are the runtimes still almost double or at least 1.5x in many of the experiments?

Useful for positioning surface agents that "touch" the object.
Called at the beginning of each episode and after "jump" has been initiated
by a model-based policy. In addition, it can be called when the surface agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? I thought if after a jump we are not on the object, the jump is reversed instead of trying to find the object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, this docstring does not contradict what you said. TouchObject is called at the beginning of each episode. If the jump is successful, the SurfacePolicy takes over, but after that, it can fall off the object and invoke TouchObject.

I think and after "jump" has been initiated by a model-based policy could be removed, since SurfacePolicy takes over and has to fall off object before TouchObject is invoked again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was referring to the and after "jump" has been initiated by a model-based policy part which, as far as I understand, doesn't invoke TouchObject by itself.

forward_distance=move_forward_amount,
)
)
self._step_count += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this counter only be incremented when we are searching the object (not if we already found it and are just moving forward)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the detail you are highlighting. What would the alternative be?

This counter is not incremented when we are done and MoveForward

        if depth_at_center < 1.0:
            distance = (
                depth_at_center
                - self._desired_object_distance
                - state[self.agent_id]["sensors"][f"{self._sensor_id}.depth"][
                    "position"
                ][2]
            )
            logger.debug(f"Move to touch visible object, forward by {distance}")

            self._terminated_and_succeeded = True
            return PositioningProcedureResult(
                actions=[MoveForward(agent_id=self.agent_id, distance=distance)]
            )

I don't think it would make a difference to increment self._step_count if we are done and MoveForward, as success is checked before the step count.

        if self._terminated_and_succeeded:
            return PositioningProcedureResult(success=True, terminated=True)

        if self._step_count >= self._max_steps:
            return PositioningProcedureResult(terminated=True, truncated=True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, you are right, the counter is already not incremented if we are just moving towards the object

# Check if we have poor visualization of the object
if (
self.processed_observations.get_feature_by_name("object_coverage") < 0.1
or self.attempting_to_find_object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason we can remove this second check? As far as I understand self.attempting_to_find_object is not just set based on the object coverage but also the depth at the center of the patch. Is this now already accounted for somewhere else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this now replaced with _terminated_and_succeeded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.attempting_to_find_object was always going to be removed and only served to implement TouchObject logic outside of TouchObject.

Screenshot 2025-06-30 at 17 34 37

No, it is not replaced by _terminated_and_succeeded.

self.attempting_to_find_object was introduced back in #310. Its purpose was to ensure that the touch_object actions sequence always ends in MoveForward. This check ensured that SurfacePolicy would not interrupt the touch_object code from running to completion. SurfacePolicy can no longer interrupt TouchObject as their execution paths no longer interweave, so this check has no purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay, thanks

@nielsleadholm
Copy link
Contributor

nielsleadholm commented Jun 30, 2025

Thanks Viviane, yeah it is strange that we are both getting more time-outs, but also that the run-times are still so much longer. These are essentially opposing outcomes, i.e. setting max_steps higher should reduce the number of time-outs, but would make the run-times even longer. It feels like there might be a second issue we are missing...

One possible sanity check to consider is, make all the changes as they are, except that the touch-object positioning procedure reaches in and temporarily iterates the internal counter of Monty. This would replicate the same counter (and therefore termination) process as before the separation out of this policy. That would help determine to what degree the issue is solely due to the counter / infinite while loop, vs. there being an additional issue.

@tristanls
Copy link
Contributor Author

tristanls commented Jun 30, 2025

@nielsleadholm I think the termination check is the PositioningProcedureTruncated exception raised after 32 steps.

Yes, @vkakerbeck is correct, the PositioningProcedureTruncated error is the infinite-loop terminating mechanism.

The reason for 32: In policy_test.py I observed that it takes ~14 steps to get back on object as OrientHorizontal all miss the object. The closest power of two was 16, but I felt like two steps did not have enough margin, so I selected the next power of two: 32.

One possible sanity check to consider is, make all the changes as they are, except that the touch-object positioning procedure reaches in and temporarily iterates the internal counter of Monty. This would replicate the same counter (and therefore termination) process as before the separation out of this policy. That would help determine to what degree the issue is solely due to the counter / infinite while loop, vs. there being an additional issue.

We already ran the above experiment. This is the #351 benchmark results, where the runtimes are unaffected. The #353 is a bit closer, so I'm running those benchmarks, but I expect the runtimes to be unaffected since all steps are being counted.

#353 benchmarks (no runtime change)

base_config_10distinctobj_surf_agent,100.00,2.14,27,18.04,3,11 - https://wandb.ai/thousand-brains-project/Monty/runs/785fi9ty/overview

-base_config_10distinctobj_surf_agent,100.00,0.00,28,14.01,2,11
+base_config_10distinctobj_surf_agent,100.00,2.14,27,18.04,3,11

randrot_noise_10distinctobj_surf_agent,99.00,4.00,27,29.18,3,17 - https://wandb.ai/thousand-brains-project/Monty/runs/cs0rgzxc/overview

-randrot_noise_10distinctobj_surf_agent,100.00,1.00,29,20.42,3,21
+randrot_noise_10distinctobj_surf_agent,99.00,4.00,27,29.18,3,17

randrot_10distinctobj_surf_agent,100.00,2.00,27,19.11,2,9 - https://wandb.ai/thousand-brains-project/Monty/runs/8uokgnpq/overview

-randrot_10distinctobj_surf_agent,100.00,0.00,28,17.23,2,10
+randrot_10distinctobj_surf_agent,100.00,2.00,27,19.11,2,9

base_10simobj_surf_agent,94.29,10.71,53,14.32,5,19

-base_10simobj_surf_agent,93.57,8.57,70,13.98,6,27
+base_10simobj_surf_agent,94.29,10.71,53,14.32,5,19

randrot_noise_10simobj_surf_agent,90.00,41.00,147,27.18,14,100 - https://wandb.ai/thousand-brains-project/Monty/runs/b67ufq3n/overview

-randrot_noise_10simobj_surf_agent,93.00,34.00,172,20.94,16,137
+randrot_noise_10simobj_surf_agent,90.00,41.00,147,27.18,14,100

randomrot_rawnoise_10distinctobj_surf_agent,TBD - https://wandb.ai/thousand-brains-project/Monty/runs/ovqaz05c/overview
base_77obj_surf_agent,98.27,8.66,43,9.20,10,27 - https://wandb.ai/thousand-brains-project/Monty/runs/oru77rl7/overview

-base_77obj_surf_agent,98.70,6.49,56,12.94,12,34
+base_77obj_surf_agent,98.27,8.66,43,9.20,10,27

randrot_noise_77obj_surf_agent,TBD - https://wandb.ai/thousand-brains-project/Monty/runs/eh8zah6j/overview

Here's what I'm thinking...

The policy_test.py change is a good intuition pump for what I think is happening.
Screenshot 2025-06-30 at 16 46 18
Previously, all steps were counted, so when we had max_total_steps = 5_000 (or max_eval_steps, argument applies to any step limit), then we ran ~5_000 total steps. Now, we could run ~20_000 total steps (~5_000 counted). The TouchObject positioning procedure can run many uncounted steps. For example, in the policy_test.py change, we see that for every ~10 steps, there are ~40 uncounted steps. So, in cases where we end up running for a long time (because we don't definitively identify the object and keep falling off of it (maybe caused by random sequence change demonstrated in #351)), we could run many more uncounted steps than counted, bloating the runtime.

Solutions: I think there are at least three possible solutions to the increased runtime and not counting all the steps.

  • In the long term, improve the SurfacePolicy to not rely on a positioning procedure, as highlighted in the TODO:
                # TODO: Note that a PositioningProcedure is not a MotorPolicy, so
                #       using it here to keep the agent in touch with the surface is
                #       questionable. However, in order for the SurfacePolicy to
                #       take over this responsibility, the main MontyExperiment loop
                #       between Monty, MotorSystem, and DataLoaders needs to be updated
                #       first. Once invoking the MotorSystem includes `observation` as
                #       an argument, e.g. motor_system(observation), then the
                #       SurfacePolicy will be capable of implementing logic similar to
                #       the TouchObject positioning procedure instead of raising
                #       ObjectNotVisible and relying on the dataloader to handle
                #       this.
  • Implement a TouchObject policy (not procedure) that can be swapped in for SurfacePolicy when it falls off the object. However, I'm uncertain what part of Monty would do this.
  • Improve SurfacePolicy itself so that it does not fall off the object.

@vkakerbeck
Copy link
Contributor

Thanks for all the details. I have a couple of clarifying questions to make sure I understand:

  • For the selection of 32 steps, it sounds like you based that on the number of steps needed to get back on the object in the unit test. Why don't you base it on 720/rotation_degrees (hardcoded here) = 24 + n_random_steps? That would correspond to the 24 steps it takes to go through the vertical and horizontal search, and then add some buffer for random search too. 32 seems like it could still be a good overall number, given that rotation_degrees is set to 30. I just wanted to check if I understand it right and why you don't use a formula dependent on that variable.
  • In the chore: demonstrate initializing TouchObject alters benchmarks #351 benchmark results, I see some small random variation but no consistent increase in runtimes or in time-outs of episodes. But in this PR here, both the number of timeouts and the runtimes increased, which, like Niels pointed out, seems counterintuitive. I understand that runtimes would increase if we are making a bunch of steps that are not being counted towards the max_total_steps. However, I don't understand why we would get significantly more time-out episodes (reaching max_total_steps). How do you explain the increase in time-out episodes (second column)? It is also way higher than in the chore: demonstrate inline TouchObject.positioning_call benchmarks #353 results you shared.

@tristanls
Copy link
Contributor Author

tristanls commented Jul 1, 2025

@vkakerbeck randomrot_rawnoise_10distinctobj_surf_agent improved accuracy and decreased number of time outs. To me, this indictes random variation and not something being wrong.

The formula for max_steps is a good idea 👍.

@nielsleadholm
Copy link
Contributor

nielsleadholm commented Jul 1, 2025

Re. the results of randomrot_rawnoise_10distinctobj_surf_agent, this is a very noisy experiment with a high number of time-outs (nearly all objects), due to the nature of the raw noise, so I'm not surprised that it could be susceptible to some noise variation. However it is very strange that all the other experiments would see a significant increase in time-outs, generally by a factor of 10%.

Re. the sanity check I suggested, it's not clear to me that that is the exact experiment I'm suggesting. To clarify more what I mean, the following step counters are iterated in the old code due to the use of motor_only_step:

MontyBase
...
pass_features_directly_to_motor_system()
...
self.total_steps += 1
self.episode_steps += 1

and

MontyObjectRecognitionExperiment
...
run_episode_steps()
...
# loader_step increments everytime enumerate is called
for loader_step, observation in enumerate(self.dataloader):
	...
	if loader_step >= (self.max_total_steps):
	    logger.info(f"Terminated due to maximum episode steps : {loader_step}")
	    self.model.deal_with_time_out()
	    return loader_step

Below is an outline of what I'm suggesting we try as a temporary replication of the old step counting, but with everything else using the new code:

Firstly, set your new max_steps to infinity/take it out of the equation, so that only the MontyExperiment will terminate the episode. Then, implement the following:

MontyObjectRecognitionExperiment
# Initialize a temp_touch_object_counter at start of episode
self.temp_touch_object_counter = 0
...
run_episode_steps()
...
if self.temp_touch_object_counter + loader_step >= (self.max_total_steps):
    logger.info(f"Terminated due to maximum episode steps : {loader_step}")
    self.model.deal_with_time_out()
    return loader_step
TouchObject
...
positioning_call()
...
# Reach into and modify step counters manually
monty_class.total_steps += 1
monty_class.episode_steps += 1
monty_experiment.temp_touch_object_counter += 1

With the above setup, and if there are no other outstanding issues with the code, then we should see similar results (episode run time and time-out percentages) to our older experiments.

Otherwise, I agree with your description that the fact that these meta-counters no longer consider the touch-object steps is the reason the runtime is increasing. That said, I'm still concerned about the increase in time-outs.

As with Friday, happy to jump on a call and discuss if that's helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

triaged This issue or pull request was triaged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants