Hello, my name is Eran and I am Research Engineer -- (https://www.linkedin.com/in/eran-ben-artzy/)
I wonder why you used an autoregressive model in stage 2 and trained it on a closed skill tokens codebook,
and didnt train it directly on the action them self, kinda like Giene 2 model or vjepa,
you still can use the disecrete property of a codebook, while output continues space of actions (robot state)
such that your loss would be against the actions state and not skill-tokens field.
I would really love to understand.