-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Thanks for sharing this great work!
We are currently working on a related direction in the language domain with a framework we call Continuous Autoregressive Language Models (CALM, https://arxiv.org/abs/2510.27688). In CALM, an autoencoder learns a mapping from a chunk of discrete tokens to a single continuous vector. A subsequent generative model then learns to autoregressively predict the sequence of these vectors.
A key difference is the model input: the predicted vector is first passed through the autoencoder to be reconstructed into discrete tokens, and the embeddings of these tokens form the input for the next step.
My question is: in a setup like this, do you think the benefits of SphereAR would still apply? In other words, do you think SphereAR's improvements comes primarily from providing a more stable latent vector as input to the next step, or does it also make the latent vectors themselves more learnable for a generative model to predict?
Any thoughts would be very helpful. Thanks!