You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🤖 Machine Learning @ Amap (Alibaba), Alibaba Group
We are the Machine Learning team at Amap (Alibaba), focusing on delivering AI products and cutting-edge research in large language models, computer vision, generative AI, agent, world model, generative recommendation and intelligent mobility. Our work has been published at top-tier venues including ICLR, AAAI, ICCV, EMNLP, ACM MM, and WWW.
We are always looking for talented interns and full-time researchers with strong coding skills and research experience. Please email us at cxxgtxy@gmail.com if you are interested.
🔥 News
2026.02.06 💻 We open-sourced -- A Scalable Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios.
2026.02.06 🎉 is accepted by ICLR 2026 -- Benchmarking Spatial Intelligence of Text-to-Image Models.
2026.02.06 🎉 is accepted by ICLR 2026 -- Tree Search for LLM Agent Reinforcement Learning.
2026.02.06 🎉 is accepted by ICLR 2026 -- Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models.
2026.02.05 🎉 is accepted by ICLR 2026 -- Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation.
2026.02.04 💻 We open-sourced -- A GUI World Model via Renderable Code Generation.
2026.02.04 🎉 is accepted by ICLR 2026 -- A Simple and Strong Reinforcement Learning Baseline for Model Reasoning.
2026.02.04 🎉 is accepted by ICLR 2026 -- A Comprehensive Narrative-Centric Evaluation for Long Video Generation Models.
2026.02.04 🎉 is accepted by ICLR 2026 -- Advancing End-To-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training.
2026.02.04 🎉 is accepted by AAAI 2026.
2026.02.04 🎉 is accepted by AAAI 2026 -- Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints.
2026.02.04 🎉 is accepted by EMNLP 2025 -- Position Bias Mitigated via Inter-Position Knowledge Distillation.
2026.02.04 🎉 is accepted by WWW 2025 -- Disentangled Scenario Factorization for Multi-Scenario Route Ranking.
2026.02.04 🎉 is accepted by ICCV 2025 -- Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement.
2026.02.03 🎉 is accepted by ICCV 2025.
2026.02.02 💻 We open-sourced .
2026.02.02 🎉 is accepted by ICCV 2025 -- A Benchmark for Perception-Aligned Video Motion Generation.
2026.01.31 🎉 is accepted by ICLR 2026 -- Urban Socio-Semantic Segmentation with Vision-Language Reasoning.
2026.01.29 💻 We open-sourced -- Benchmarking Spatial Intelligence of Text-to-Image Models.
2026.01.28 💻 We open-sourced -- Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation.
2026.01.23 💻 We open-sourced -- Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning.
2026.01.21 🎉 is accepted by ICCV 2025 -- Unified Self-Supervised Pretraining for Image Generation and Understanding.
2026.01.15 💻 We open-sourced -- Urban Socio-Semantic Segmentation with Vision-Language Reasoning.
2026.01.13 🎉 is accepted by ACM MM 2025 -- Content-Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos.
2026.01.13 🎉 is accepted by EMNLP 2025 -- Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation.
2026.01.13 🎉 is accepted by AAAI 2026 -- Scale-wise Controllable Visual Autoregressive Learning.
2026.01.07 💻 We open-sourced -- Reinforced Parallel Map-Augmented Agent for Geolocalization.
2025.11.25 💻 We open-sourced -- Scale-wise Controllable Visual Autoregressive Learning.
2025.11.18 💻 We open-sourced -- Towards Close-up High-resolution Video-based Virtual Try-on.
2025.10.22 💻 We open-sourced .
2025.10.14 💻 We open-sourced -- Advancing End-To-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training.
2025.09.27 💻 We open-sourced -- Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation.
2025.09.25 💻 We open-sourced -- Tree Search for LLM Agent Reinforcement Learning.
2025.09.09 💻 We open-sourced -- Disentangled Scenario Factorization for Multi-Scenario Route Ranking.
2025.08.29 💻 We open-sourced -- Position Bias Mitigated via Inter-Position Knowledge Distillation.
2025.08.28 💻 We open-sourced .
2025.08.18 💻 We open-sourced -- Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints.
2025.08.15 💻 We open-sourced -- Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models.
2025.08.11 💻 We open-sourced .
2025.07.16 💻 We open-sourced -- Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement.
2025.07.03 💻 We open-sourced .
2025.06.20 💻 We open-sourced -- A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing.
2025.05.28 💻 We open-sourced -- Content-Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos.
2025.05.21 💻 We open-sourced -- Reasoning Guided Universal Visual Grounding with Reinforcement Learning.
2025.05.09 💻 We open-sourced -- A Comprehensive Narrative-Centric Evaluation for Long Video Generation Models.
2025.04.07 💻 We open-sourced -- Realistic Image Quality and Aesthetic Scoring with Multimodal LLM.
2025.04.03 💻 We open-sourced -- A Simple and Strong Reinforcement Learning Baseline for Model Reasoning.
2025.03.12 💻 We open-sourced -- A Benchmark for Perception-Aligned Video Motion Generation.
2025.03.11 💻 We open-sourced -- Unified Self-Supervised Pretraining for Image Generation and Understanding.
Adopts tree-search rollouts in place of independent chain-based rollouts for LLM agent RL, achieving superior performance with only a quarter of the rollout budget.
A minimalist RL approach (Group Policy Gradient) that directly optimizes the original RL objective, eliminating critic/reference models and KL constraints while outperforming GRPO.
Proposes difficulty-aware GRPO and multi-aspect question reformulation to boost math reasoning by targeting harder questions from both algorithmic and data perspectives.
A hierarchical sampling framework that identifies boundary-level problems and dynamically reallocates sampling budget toward high-utility problems for self-taught reasoners.
A novel text editing framework for multi-line scene text in complex visual scenarios, with Condition Injection LoRA module and regional text perceptual loss.
Leverages stochastic block-dropping to construct sub-networks for training-free guidance, surpassing CFG on text-to-image and text-to-video generation.
Unified self-supervised pretraining via masked latent modeling in VAE space, significantly improving diffusion model convergence and generation quality.
A prompt-guided adaptive test-time search strategy that dynamically adjusts search space and reward for imaginative video generation with long-distance semantic dependencies.
A vision-language reasoning framework for urban socio-semantic segmentation that simulates human annotation via cross-modal recognition and multi-stage RL-based reasoning.
A 14,715-image UGC dataset with 10 fine-grained attributes for realistic image quality and aesthetic scoring; achieves SOTA on 5 public IQA/IAA benchmarks using next-token prediction.
A sensitivity-aware task vector insertion framework that identifies context-sensitive heads and selects task vectors via RL for many-shot multimodal in-context learning.
Disentangled scenario factorization for multi-scenario route ranking with the first large-scale public MSDR dataset; deployed in AMap for online traffic.