- We propose WebDancer, a novel end-to-end agentic training framework designed to enhance the multi-step information-seeking capabilities of web-based agents.
- We introduce a four-stage training paradigm comprising browsing data construction, trajectory sampling, supervised fine-tuning for effective cold start, and reinforcement learning for improved generalization, enabling the agent to autonomously acquire robust search and reasoning skills.
- Our data-centric approach integrates trajectory-level supervision and online learning to develop a scalable pipeline for training agentic systems.
- We instantiate this framework in a ReAct-based agent and conduct extensive experiments on GAIA and WebWalkerQA benchmarks. Results demonstrate that WebDancer achieves strong performance across diverse tasks, validating the effectiveness of our proposed paradigm and providing systematic insights for future agent development.
We provide demos for WebWalkerQA, GAIA and Daily Use. Our model can execute the long-horizon tasks with multiple steps and complex reasoning, such as web traversal, information seeking and question answering.
βοΈ The deployment of models and demos will be updated soon.
This work is implemented based on LLaMA-Factory and verl. We greatly appreciate their valuable contributions to the community, especially for WebThinker.
If this work is helpful, please kindly cite as:
@misc{wu2025webdancer,
title={WebDancer: Towards Autonomous Information Seeking Agency},
author={Jialong Wu and Baixuan Li and Runnan Fang and Wenbiao Yin and Liwen Zhang and Zhengwei Tao and Dingchu Zhang and Zekun Xi and Yong Jiang and Pengjun Xie and Fei Huang and Jingren Zhou},
year={2025},
eprint={2505.22648},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.22648},
}



