🔬 True Multimodal In-Context Learning Needs Attention to the Visual Context

This repo contains code of the paper True Multimodal In-Context Learning Needs Attention to the Visual Context (COLM 2025).

✅ Installation

git clone https://github.com/chenxshuo/true-micl.git
cd true-micl

# this script will install torch 2.5.1 + cu121, make sure to modify to your desired version
bash setup_venv.sh

⚠️ Important: Unsloth version must match your PyTorch, CUDA, and GPU architecture

Unsloth must be installed with the correct tag depending on your hardware and environment. For example:

✅ Our working configuration:

GPU: A100 (Ampere)
CUDA: 12.1
PyTorch: 2.5.1

Use the following:

uv pip install --no-deps 'unsloth[cu121-ampere-torch251] @ git+https://github.com/unslothai/unsloth.git'

If you use a different GPU or CUDA version, refer to Unsloth install guide and adjust accordingly.

📦 Dataset Preparation

huggingface-cli login
hf download ShuoChen99/TrueMICL --repo-type dataset --local-dir dataset --quiet

Please download the required datasets from our data release and rename them as dataset/ folder. For example:

dataset/
├── operator_induction/
│   ├── support.json
│   └── query.json
├── sudoku/
│   ├── ...
├── shapes_count/
│   ├── ...
...

🧪 Running the Code

Running Training and Inference This project provides a unified shell script run_main.sh for both training and inference. For example, to run model inference on Clock Math:

source .venv/bin/activate
bash run.sh infer clock # baseline inference
bash run.sh dara_infer clock # load pre-trained dara 
bash run.sh lora_infer clock # load pre-trained lora

Mode	Description
`infer`	Run inference with the base model with randomly chosen 4-shot.
`dara_infer`	Run inference with a DARA model.
`lora_infer`	Run inference with a LoRA fine-tuned model.
`dara_finetune`	DARA training mode.
`lora_finetune`	Fine-tune with LoRA. Only part of the parameters are trainable.

✅ Output and Evaluation

After inference, results are saved under ./results/ in JSON format.

To calculate accuracy:

python check_accuracy.py ./results/your_result_file.json

📁 Project Structure

.
├── ckpt/                         # Your LoRA checkpoints
│   ├── ...
├── dataset/                      # Place downloaded datasets here
│   ├── operator_induction/
│   ├── sudoku/
│   └── ...
├── qwen2_vl_replacement/    # model override modules
├── check_accuracy.py
├── data_processing.py
├── modeling_qwen2_vl.py
├── pip_requirements.txt
├── qwen2_finetune_new_model.py
├── run.sh                  # Entry script for training/inference
└── samples_of_training_data.json # sample json for training

📚 References

BibTeX

@article{chen2025true,
  title={True Multimodal In-Context Learning Needs Attention to the Visual Context},
  author={Chen, Shuo and Liu, Jianzhe and Han, Zhen and Xia, Yan and Cremers, Daniel and Torr, Philip and Tresp, Volker and Gu, Jindong},
  journal={arXiv preprint arXiv:2507.15807},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔬 True Multimodal In-Context Learning Needs Attention to the Visual Context

✅ Installation

⚠️ Important: Unsloth version must match your PyTorch, CUDA, and GPU architecture

📦 Dataset Preparation

🧪 Running the Code

✅ Output and Evaluation

📁 Project Structure

📚 References

BibTeX

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
qwen2_vl_replacement		qwen2_vl_replacement
.gitignore		.gitignore
README.md		README.md
check_accuracy.py		check_accuracy.py
data_processing.py		data_processing.py
fire.sh		fire.sh
qwen2_finetune_new_model.py		qwen2_finetune_new_model.py
requirements.txt		requirements.txt
run.sh		run.sh
samples_of_training_data.json		samples_of_training_data.json
setup_venv.sh		setup_venv.sh

chenxshuo/true-micl

Folders and files

Latest commit

History

Repository files navigation

🔬 True Multimodal In-Context Learning Needs Attention to the Visual Context

✅ Installation

⚠️ Important: Unsloth version must match your PyTorch, CUDA, and GPU architecture

📦 Dataset Preparation

🧪 Running the Code

✅ Output and Evaluation

📁 Project Structure

📚 References

BibTeX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages