⭐ STaR: Towards Effective and Stable Table Reasoning via Slow-Thinking Large Language Models

A Novel Slow-Thinking Model for Effective and Stable Table Reasoning

📌 Overview

STaR (Slow-Thinking Table Reasoning) is a novel slow-thinking model that can achieve effective and stable table reasoning. It enables effective multi-step reasoning through a two-stage training framework (SFT + RFT) and improves reasoning stability via trajectory-level uncertainty quantification.

✨ Key Features

🧠 Effective Multi-Step Reasoning: Two-stage training framework with SFT warm-up and reinforced fine-tuning (RFT)
📈 Difficulty-Aware RL: Reinforcement learning mechanism that progressively handles complex reasoning
🎯 Stable Reasoning: Trajectory-level uncertainty quantification fusing token-level confidence with answer-level consistency
🚀 Strong Generalization: State-of-the-art in-domain performance and excellent out-of-domain generalization

📋 Abstract

Table reasoning with large language models (LLMs) plays a critical role in building intelligent systems capable of understanding and analyzing tabular data. Despite recent progress, existing methods still face key limitations: their reasoning processes lacks depth and explicit multi-step reasoning, often relying solely on implicit language model understanding. In addition, their reasoning processes suffer from instability, primarily caused by model uncertainty.

In this work, we propose STaR, a novel slow-thinking model that can achieve effective and stable table reasoning. To enable effective multi-step reasoning, we design a two-stage training framework consisting of supervised fine-tuning (SFT) warm-up followed by reinforced fine-tuning (RFT). Specifically, in the SFT stage, we construct a high-quality dataset through automatic self-verification. In the RFT stage, we introduce a difficulty-aware reinforcement learning mechanism to further enhance reasoning capabilities. Furthermore, to improve reasoning stability, we introduce trajectory-level uncertainty quantification, which fuses token-level confidence with answer-level consistency, enabling the selection of better reasoning trajectories. Extensive experiments demonstrate that STaR-8B achieves state-of-the-art performance on in-domain benchmarks and exhibits strong generalization to out-of-domain datasets, highlighting its potential for enhancing both effectiveness and stability in table reasoning.

📁 Project Structure

STaR/
├── 📂 data/                    # Datasets
│   ├── STaR-sft.parquet        # SFT training data
│   ├── STaR-train-easy.parquet # Easy training samples
│   ├── STaR-train-hard.parquet # Hard training samples
│   ├── STaR-train-all.parquet  # All training samples
│   ├── STaR-eval.parquet       # Evaluation data
│   └── STaR-test.parquet       # Test data
├── 📂 model/                   # Pre-trained models
├── 📂 sh/                      # Training & evaluation scripts
├── 📂 verl/                    # VERL framework
├── 📂 checkpoints/             # Model checkpoints
├── 📄 reward.py                # Reward function
├── 📄 eval-by-trajectory.py    # Evaluation script
└── 📄 requirements.txt         # Dependencies

🛠️ Installation

Requirements: Python 3.10+ and CUDA GPUs

# 1️⃣ Clone the repository
git clone https://github.com/zhjai/STaR.git
cd STaR

# 2️⃣ Install Python dependencies
pip install -r requirements.txt

# 3️⃣ Install verl in editable mode
cd verl
pip install -e .
cd ..

📦 Data & Models

🤗 Datasets

Download the datasets from Hugging Face and place them in the data/ folder:

Dataset	Description	Link
STaR-Datasets	Full training & evaluation data

🤖 Base Models

Download the base models and place them in the model/ folder:

Model	Parameters	Link
Qwen3-0.6B	0.6B
Qwen3-8B	8B

🏆 Trained Checkpoints

Our trained model weights are available on Hugging Face:

Model	Parameters	Link
STaR-0.6B	0.6B
STaR-8B	8B

🚀 Training

Training scripts are located in the sh/ directory. Adjust paths and hyperparameters as needed.

📚 Stage 1: Supervised Fine-Tuning (SFT)

# Qwen3-0.6B
bash sh/STaR-sft-qwen3-0.6b.sh

# Qwen3-8B
bash sh/STaR-sft-qwen3-8b.sh

🎯 Stage 2: Reinforced Fine-Tuning (RFT) - Foundational Training

# Qwen3-0.6B
bash sh/STaR-sft-stage1-qwen3-0.6b.sh

# Qwen3-8B
bash sh/STaR-sft-stage1-qwen3-8b.sh

🔥 Stage 2: Reinforced Fine-Tuning (RFT) - Progressive Training

# Qwen3-0.6B
bash sh/STaR-sft-stage1-stage2-qwen3-0.6b.sh

# Qwen3-8B
bash sh/STaR-sft-stage1-stage2-qwen3-8b.sh

📊 Evaluation

1️⃣ Generate Trajectories

bash sh/STaR-eval.sh

2️⃣ Compute Metrics

python eval-by-trajectory.py

📖 Citation

If you find this work useful, please cite our paper:

Note: The citation on Google Scholar may still display the old title. The correct title is: STaR: Towards Effective and Stable Table Reasoning via Slow-Thinking Large Language Models

🙏 Acknowledgements

This work builds on the excellent VERL framework
Base models from Qwen team at Alibaba
Thanks to the open-source community for tools and datasets

⭐ Star this repo if you find it helpful! ⭐

Made with ❤️ for the research community

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
sh		sh
verl		verl
LICENSE		LICENSE
STaR.png		STaR.png
eval-by-trajectory.py		eval-by-trajectory.py
readme.md		readme.md
readme_cn.md		readme_cn.md
requirements.txt		requirements.txt
reward.py		reward.py
star-framework.png		star-framework.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⭐ STaR: Towards Effective and Stable Table Reasoning via Slow-Thinking Large Language Models

📌 Overview

✨ Key Features

📋 Abstract

📁 Project Structure

🛠️ Installation

📦 Data & Models

🤗 Datasets

🤖 Base Models

🏆 Trained Checkpoints

🚀 Training

📚 Stage 1: Supervised Fine-Tuning (SFT)

🎯 Stage 2: Reinforced Fine-Tuning (RFT) - Foundational Training

🔥 Stage 2: Reinforced Fine-Tuning (RFT) - Progressive Training

📊 Evaluation

1️⃣ Generate Trajectories

2️⃣ Compute Metrics

📖 Citation

🙏 Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⭐ STaR: Towards Effective and Stable Table Reasoning via Slow-Thinking Large Language Models

📌 Overview

✨ Key Features

📋 Abstract

📁 Project Structure

🛠️ Installation

📦 Data & Models

🤗 Datasets

🤖 Base Models

🏆 Trained Checkpoints

🚀 Training

📚 Stage 1: Supervised Fine-Tuning (SFT)

🎯 Stage 2: Reinforced Fine-Tuning (RFT) - Foundational Training

🔥 Stage 2: Reinforced Fine-Tuning (RFT) - Progressive Training

📊 Evaluation

1️⃣ Generate Trajectories

2️⃣ Compute Metrics

📖 Citation

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages