STaR (Slow-Thinking Table Reasoning) is a novel slow-thinking model that can achieve effective and stable table reasoning. It enables effective multi-step reasoning through a two-stage training framework (SFT + RFT) and improves reasoning stability via trajectory-level uncertainty quantification.
- 🧠 Effective Multi-Step Reasoning: Two-stage training framework with SFT warm-up and reinforced fine-tuning (RFT)
- 📈 Difficulty-Aware RL: Reinforcement learning mechanism that progressively handles complex reasoning
- 🎯 Stable Reasoning: Trajectory-level uncertainty quantification fusing token-level confidence with answer-level consistency
- 🚀 Strong Generalization: State-of-the-art in-domain performance and excellent out-of-domain generalization
Table reasoning with large language models (LLMs) plays a critical role in building intelligent systems capable of understanding and analyzing tabular data. Despite recent progress, existing methods still face key limitations: their reasoning processes lacks depth and explicit multi-step reasoning, often relying solely on implicit language model understanding. In addition, their reasoning processes suffer from instability, primarily caused by model uncertainty.
In this work, we propose STaR, a novel slow-thinking model that can achieve effective and stable table reasoning. To enable effective multi-step reasoning, we design a two-stage training framework consisting of supervised fine-tuning (SFT) warm-up followed by reinforced fine-tuning (RFT). Specifically, in the SFT stage, we construct a high-quality dataset through automatic self-verification. In the RFT stage, we introduce a difficulty-aware reinforcement learning mechanism to further enhance reasoning capabilities. Furthermore, to improve reasoning stability, we introduce trajectory-level uncertainty quantification, which fuses token-level confidence with answer-level consistency, enabling the selection of better reasoning trajectories. Extensive experiments demonstrate that STaR-8B achieves state-of-the-art performance on in-domain benchmarks and exhibits strong generalization to out-of-domain datasets, highlighting its potential for enhancing both effectiveness and stability in table reasoning.
STaR/
├── 📂 data/ # Datasets
│ ├── STaR-sft.parquet # SFT training data
│ ├── STaR-train-easy.parquet # Easy training samples
│ ├── STaR-train-hard.parquet # Hard training samples
│ ├── STaR-train-all.parquet # All training samples
│ ├── STaR-eval.parquet # Evaluation data
│ └── STaR-test.parquet # Test data
├── 📂 model/ # Pre-trained models
├── 📂 sh/ # Training & evaluation scripts
├── 📂 verl/ # VERL framework
├── 📂 checkpoints/ # Model checkpoints
├── 📄 reward.py # Reward function
├── 📄 eval-by-trajectory.py # Evaluation script
└── 📄 requirements.txt # Dependencies
Requirements: Python 3.10+ and CUDA GPUs
# 1️⃣ Clone the repository
git clone https://github.com/zhjai/STaR.git
cd STaR
# 2️⃣ Install Python dependencies
pip install -r requirements.txt
# 3️⃣ Install verl in editable mode
cd verl
pip install -e .
cd ..Download the datasets from Hugging Face and place them in the data/ folder:
| Dataset | Description | Link |
|---|---|---|
| STaR-Datasets | Full training & evaluation data |
Download the base models and place them in the model/ folder:
| Model | Parameters | Link |
|---|---|---|
| Qwen3-0.6B | 0.6B | |
| Qwen3-8B | 8B |
Our trained model weights are available on Hugging Face:
| Model | Parameters | Link |
|---|---|---|
| STaR-0.6B | 0.6B | |
| STaR-8B | 8B |
Training scripts are located in the sh/ directory. Adjust paths and hyperparameters as needed.
# Qwen3-0.6B
bash sh/STaR-sft-qwen3-0.6b.sh
# Qwen3-8B
bash sh/STaR-sft-qwen3-8b.sh# Qwen3-0.6B
bash sh/STaR-sft-stage1-qwen3-0.6b.sh
# Qwen3-8B
bash sh/STaR-sft-stage1-qwen3-8b.sh# Qwen3-0.6B
bash sh/STaR-sft-stage1-stage2-qwen3-0.6b.sh
# Qwen3-8B
bash sh/STaR-sft-stage1-stage2-qwen3-8b.shbash sh/STaR-eval.shpython eval-by-trajectory.pyIf you find this work useful, please cite our paper:
Note: The citation on Google Scholar may still display the old title. The correct title is: STaR: Towards Effective and Stable Table Reasoning via Slow-Thinking Large Language Models
- This work builds on the excellent VERL framework
- Base models from Qwen team at Alibaba
- Thanks to the open-source community for tools and datasets
⭐ Star this repo if you find it helpful! ⭐
Made with ❤️ for the research community

