AMAP-ML

🤖 Machine Learning @ Amap (Alibaba), Alibaba Group

We are the Machine Learning team at Amap (Alibaba), focusing on delivering AI products and cutting-edge research in large language models, computer vision, generative AI, agent, world model, generative recommendation and intelligent mobility. Our work has been published at top-tier venues including ICLR, AAAI, ICCV, EMNLP, ACM MM, and WWW.

We are always looking for talented interns and full-time researchers with strong coding skills and research experience. Please email us at cxxgtxy@gmail.com if you are interested.

🔥 News

2026.02.06 💻 We open-sourced -- A Scalable Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios.
2026.02.06 🎉 is accepted by ICLR 2026 -- Benchmarking Spatial Intelligence of Text-to-Image Models.
2026.02.06 🎉 is accepted by ICLR 2026 -- Tree Search for LLM Agent Reinforcement Learning.
2026.02.06 🎉 is accepted by ICLR 2026 -- Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models.
2026.02.05 🎉 is accepted by ICLR 2026 -- Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation.
2026.02.04 💻 We open-sourced -- A GUI World Model via Renderable Code Generation.
2026.02.04 🎉 is accepted by ICLR 2026 -- A Simple and Strong Reinforcement Learning Baseline for Model Reasoning.
2026.02.04 🎉 is accepted by ICLR 2026 -- A Comprehensive Narrative-Centric Evaluation for Long Video Generation Models.
2026.02.04 🎉 is accepted by ICLR 2026 -- Advancing End-To-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training.
2026.02.04 🎉 is accepted by AAAI 2026.
2026.02.04 🎉 is accepted by AAAI 2026 -- Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints.
2026.02.04 🎉 is accepted by EMNLP 2025 -- Position Bias Mitigated via Inter-Position Knowledge Distillation.
2026.02.04 🎉 is accepted by WWW 2025 -- Disentangled Scenario Factorization for Multi-Scenario Route Ranking.
2026.02.04 🎉 is accepted by ICCV 2025 -- Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement.
2026.02.03 🎉 is accepted by ICCV 2025.
2026.02.02 💻 We open-sourced .
2026.02.02 🎉 is accepted by ICCV 2025 -- A Benchmark for Perception-Aligned Video Motion Generation.
2026.01.31 🎉 is accepted by ICLR 2026 -- Urban Socio-Semantic Segmentation with Vision-Language Reasoning.
2026.01.29 💻 We open-sourced -- Benchmarking Spatial Intelligence of Text-to-Image Models.
2026.01.28 💻 We open-sourced -- Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation.
2026.01.23 💻 We open-sourced -- Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning.
2026.01.21 🎉 is accepted by ICCV 2025 -- Unified Self-Supervised Pretraining for Image Generation and Understanding.
2026.01.15 💻 We open-sourced -- Urban Socio-Semantic Segmentation with Vision-Language Reasoning.
2026.01.13 🎉 is accepted by ACM MM 2025 -- Content-Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos.
2026.01.13 🎉 is accepted by EMNLP 2025 -- Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation.
2026.01.13 🎉 is accepted by AAAI 2026 -- Scale-wise Controllable Visual Autoregressive Learning.
2026.01.07 💻 We open-sourced -- Reinforced Parallel Map-Augmented Agent for Geolocalization.
2025.11.25 💻 We open-sourced -- Scale-wise Controllable Visual Autoregressive Learning.
2025.11.18 💻 We open-sourced -- Towards Close-up High-resolution Video-based Virtual Try-on.
2025.10.22 💻 We open-sourced .
2025.10.14 💻 We open-sourced -- Advancing End-To-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training.
2025.09.27 💻 We open-sourced -- Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation.
2025.09.25 💻 We open-sourced -- Tree Search for LLM Agent Reinforcement Learning.
2025.09.09 💻 We open-sourced -- Disentangled Scenario Factorization for Multi-Scenario Route Ranking.
2025.08.29 💻 We open-sourced -- Position Bias Mitigated via Inter-Position Knowledge Distillation.
2025.08.28 💻 We open-sourced .
2025.08.18 💻 We open-sourced -- Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints.
2025.08.15 💻 We open-sourced -- Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models.
2025.08.11 💻 We open-sourced .
2025.07.16 💻 We open-sourced -- Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement.
2025.07.03 💻 We open-sourced .
2025.06.20 💻 We open-sourced -- A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing.
2025.05.28 💻 We open-sourced -- Content-Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos.
2025.05.21 💻 We open-sourced -- Reasoning Guided Universal Visual Grounding with Reinforcement Learning.
2025.05.09 💻 We open-sourced -- A Comprehensive Narrative-Centric Evaluation for Long Video Generation Models.
2025.04.07 💻 We open-sourced -- Realistic Image Quality and Aesthetic Scoring with Multimodal LLM.
2025.04.03 💻 We open-sourced -- A Simple and Strong Reinforcement Learning Baseline for Model Reasoning.
2025.03.12 💻 We open-sourced -- A Benchmark for Perception-Aligned Video Motion Generation.
2025.03.11 💻 We open-sourced -- Unified Self-Supervised Pretraining for Image Generation and Understanding.

📚 Research Areas

🧠 LLM Reasoning & Reinforcement Learning

Repository	Description	Venue
Tree-GRPO	Adopts tree-search rollouts in place of independent chain-based rollouts for LLM agent RL, achieving superior performance with only a quarter of the rollout budget.	ICLR 2026
GPG	A minimalist RL approach (Group Policy Gradient) that directly optimizes the original RL objective, eliminating critic/reference models and KL constraints while outperforming GRPO.	ICLR 2026
MathForge	Proposes difficulty-aware GRPO and multi-aspect question reformulation to boost math reasoning by targeting harder questions from both algorithmic and data perspectives.	ICLR 2026
HS-STaR	A hierarchical sampling framework that identifies boundary-level problems and dynamically reallocates sampling budget toward high-utility problems for self-taught reasoners.	EMNLP 2025
Pos2Distill	A position-to-position knowledge distillation framework that transfers knowledge from advantageous positions to mitigate position bias in LLMs.	EMNLP 2025

🎨 Image Generation & Editing

Repository	Description	Venue
FluxText	A novel text editing framework for multi-line scene text in complex visual scenarios, with Condition Injection LoRA module and regional text perceptual loss.	-
S2-Guidance	Leverages stochastic block-dropping to construct sub-networks for training-free guidance, surpassing CFG on text-to-image and text-to-video generation.	ICLR 2026
EPG	Advancing end-to-end pixel-space generative modeling via self-supervised pre-training.	ICLR 2026
Omni-Effects	A unified framework for prompt-guided and spatially controllable composite visual effects generation, using LoRA-MoE and spatial-aware prompts.	AAAI 2026
SCALAR	Scale-wise controllable visual autoregressive learning for image generation.	AAAI 2026
USP	Unified self-supervised pretraining via masked latent modeling in VAE space, significantly improving diffusion model convergence and generation quality.	ICCV 2025
SpatialGenEval	A benchmark with 1,230 information-dense prompts and 12,300 multi-choice questions to evaluate complex spatial intelligence in text-to-image models.	ICLR 2026

🎬 Video Generation & Understanding

Repository	Description	Venue
NarrLV	The first benchmark to comprehensively evaluate narrative expression capabilities of long video generation models, inspired by film narrative theory.	ICLR 2026
ImagerySearch	A prompt-guided adaptive test-time search strategy that dynamically adjusts search space and reward for imaginative video generation with long-distance semantic dependencies.	AAAI 2026
VMBench	A perception-aligned video motion benchmark with human-aligned metrics achieving 35.3% improvement in Spearman's correlation over baselines.	ICCV 2025
Eevee	A high-resolution dataset and benchmark for video-based virtual try-on, supporting both full-shot and close-up garment detail views.	-
FingER	Content-aware fine-grained evaluation with reasoning for AI-generated videos.	ACM MM 2025
FE2E	-	-

👁️ Multimodal & Vision-Language Models

Repository	Description	Venue
UniVG-R1	Reasoning guided universal visual grounding with reinforcement learning.	-
SocioReasoner	A vision-language reasoning framework for urban socio-semantic segmentation that simulates human annotation via cross-modal recognition and multi-stage RL-based reasoning.	ICLR 2026
RealQA	A 14,715-image UGC dataset with 10 fine-grained attributes for realistic image quality and aesthetic scoring; achieves SOTA on 5 public IQA/IAA benchmarks using next-token prediction.	-
Code2World	A GUI world model via renderable code generation.	-
Taming-Hallucinations	-	-
STV	A sensitivity-aware task vector insertion framework that identifies context-sensitive heads and selects task vectors via RL for many-shot multimodal in-context learning.	AAAI 2026

🗺️ Maps, Mobility & Spatial Intelligence

Repository	Description	Venue
Thinking-with-Map	A map-augmented agent that conducts reasoning with real-world maps for geolocalization, trained via reinforcement learning.	-
MobilityBench	A scalable benchmark for evaluating route-planning agents in real-world mobility scenarios.	-
DSFNet	Disentangled scenario factorization for multi-scenario route ranking with the first large-scale public MSDR dataset; deployed in AMap for online traffic.	WWW 2025
AR-MAP	-	-

🔍 Object Detection & Segmentation

Repository	Description	Venue
UPRE	Zero-shot domain adaptation for object detection via unified prompt and representation enhancement.	ICCV 2025
LD-RPS	LD-RPS.	ICCV 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMAP-ML

AMAP-ML

🔥 News

📚 Research Areas

🧠 LLM Reasoning & Reinforcement Learning

🎨 Image Generation & Editing

🎬 Video Generation & Understanding

👁️ Multimodal & Vision-Language Models

🗺️ Maps, Mobility & Spatial Intelligence

🔍 Object Detection & Segmentation

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!