ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation

News

[2025.5.21] We released the ALToLLM-8B, available here.
[2025.5.22] We released the paper.

Abstract

While humans effortlessly draw visual objects and shapes by adaptively allocating attention based on their complexity, existing multimodal large language models (MLLMs) remain constrained by rigid token representations. Bridging this gap, we propose ALTo, an adaptive length tokenizer for autoregressive mask generation. To achieve this, a novel token length predictor is designed, along with a length regularization term and a differentiable token chunking strategy. We further build ALToLLM that seamlessly integrates ALTo into MLLM. Preferences on the trade-offs between mask quality and efficiency is implemented by group relative policy optimization (GRPO). Experiments demonstrate that ALToLLM achieves state-of-the-art performance with adaptive token cost on popular segmentation benchmarks.

Installation

conda env create -f environment.yml

Demo

Run inference_altollm.py to generate a segmentation mask for an object in an image.

Training

You can train your own models based on our ALTo and ALToLLM Hugging Face models.

First, run:

export PYTHONPATH="${PYTHONPATH}:$(pwd)"
mkdir -p "runs"

To train ALTo, download the SA1B dataset from here and prepare the data in the same format as example/sa1b.jsonl. You can either download our pretrained ALTo model and continue training from it, or start training from scratch using the TiTok and SAM models.

For stage 1 training, run:

torchrun \
    --nproc_per_node=8 \
    --master_port=29501 \
    trainers/main_multi_nodes.py \
    config=config/config_alto_stage1.py

For stage 1.5 training, run:

torchrun \
    --nproc_per_node=8 \
    --master_port=29501 \
    trainers/main_multi_nodes.py \
    config=config/config_alto_stage1_5.py

To train ALToLLM, prepare your data in the same format as example/anns/seg_data_with_mask.jsonl.

Important keys contained in the JSONL files:

- "image": Source image.
- "mask": Mask image.
- "conversations": Conversations between human and GPT. The mask placeholder is <ALTo_Start><TOK_0>...<ALTo_End> for full-length mask generation and <ALTo_Start><TOK_1>...<ALTo_End> for adaptive-length mask generation.

For stage 2 training, run bash scripts/train_altollm_stage2_sft.sh to train ALToLLM.

For stage 3 training, run bash scripts/train_altollm_stage3_grpo.sh to train ALToLLM using GRPO.

Evaluation

Follow the evaluation pipeline in EVALUATE.md.

Citation

If you find this project useful in your research, please consider citing:

@article{wang2025alto,
  title={ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation},
  author={Wang, Lingfeng and Lin, Hualing and Chen, Senda and Wang, Tao and Cheng, Changxu and Zhong, Yangyang and Zheng, Dong and Zhao, Wuyue},
  journal={arXiv preprint arXiv:2505.16495},
  year={2025}
}

Acknowledgement

This project is built with reference to InternVL, TiTok and HiMTok.

License

Copyright 2025-UniUbi.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation

News

Abstract

Installation

Demo

Training

Evaluation

Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
eval		eval
example		example
imgs		imgs
internvl		internvl
net		net
scripts		scripts
trainers		trainers
.gitignore		.gitignore
EVALUATE.md		EVALUATE.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
inference_altollm.py		inference_altollm.py

Folders and files

Latest commit

History

Repository files navigation

ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation

News

Abstract

Installation

Demo

Training

Evaluation

Citation

Acknowledgement

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages