Skip to content
@AMAP-ML

AMAP-ML

AMAP-ML

🤖 Machine Learning @ Amap (Alibaba), Alibaba Group

We are the Machine Learning team at Amap (Alibaba), focusing on delivering AI products and cutting-edge research in large language models, computer vision, generative AI, agent, world model, generative recommendation and intelligent mobility. Our work has been published at top-tier venues including ICLR, AAAI, ICCV, EMNLP, ACM MM, and WWW.

We are always looking for talented interns and full-time researchers with strong coding skills and research experience. Please email us at cxxgtxy@gmail.com if you are interested.


🔥 News

  • 2026.02.06 💻 We open-sourced MobilityBench -- A Scalable Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios.
  • 2026.02.06 🎉 SpatialGenEval is accepted by ICLR 2026 -- Benchmarking Spatial Intelligence of Text-to-Image Models.
  • 2026.02.06 🎉 Tree-GRPO is accepted by ICLR 2026 -- Tree Search for LLM Agent Reinforcement Learning.
  • 2026.02.06 🎉 S2-Guidance is accepted by ICLR 2026 -- Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models.
  • 2026.02.05 🎉 MathForge is accepted by ICLR 2026 -- Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation.
  • 2026.02.04 💻 We open-sourced Code2World -- A GUI World Model via Renderable Code Generation.
  • 2026.02.04 🎉 GPG is accepted by ICLR 2026 -- A Simple and Strong Reinforcement Learning Baseline for Model Reasoning.
  • 2026.02.04 🎉 NarrLV is accepted by ICLR 2026 -- A Comprehensive Narrative-Centric Evaluation for Long Video Generation Models.
  • 2026.02.04 🎉 EPG is accepted by ICLR 2026 -- Advancing End-To-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training.
  • 2026.02.04 🎉 Omni-Effects is accepted by AAAI 2026.
  • 2026.02.04 🎉 ImagerySearch is accepted by AAAI 2026 -- Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints.
  • 2026.02.04 🎉 Pos2Distill is accepted by EMNLP 2025 -- Position Bias Mitigated via Inter-Position Knowledge Distillation.
  • 2026.02.04 🎉 DSFNet is accepted by WWW 2025 -- Disentangled Scenario Factorization for Multi-Scenario Route Ranking.
  • 2026.02.04 🎉 UPRE is accepted by ICCV 2025 -- Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement.
  • 2026.02.03 🎉 LD-RPS is accepted by ICCV 2025.
  • 2026.02.02 💻 We open-sourced AR-MAP.
  • 2026.02.02 🎉 VMBench is accepted by ICCV 2025 -- A Benchmark for Perception-Aligned Video Motion Generation.
  • 2026.01.31 🎉 SocioReasoner is accepted by ICLR 2026 -- Urban Socio-Semantic Segmentation with Vision-Language Reasoning.
  • 2026.01.29 💻 We open-sourced SpatialGenEval -- Benchmarking Spatial Intelligence of Text-to-Image Models.
  • 2026.01.28 💻 We open-sourced MathForge -- Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation.
  • 2026.01.23 💻 We open-sourced STV -- Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning.
  • 2026.01.21 🎉 USP is accepted by ICCV 2025 -- Unified Self-Supervised Pretraining for Image Generation and Understanding.
  • 2026.01.15 💻 We open-sourced SocioReasoner -- Urban Socio-Semantic Segmentation with Vision-Language Reasoning.
  • 2026.01.13 🎉 FingER is accepted by ACM MM 2025 -- Content-Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos.
  • 2026.01.13 🎉 HS-STaR is accepted by EMNLP 2025 -- Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation.
  • 2026.01.13 🎉 SCALAR is accepted by AAAI 2026 -- Scale-wise Controllable Visual Autoregressive Learning.
  • 2026.01.07 💻 We open-sourced Thinking-with-Map -- Reinforced Parallel Map-Augmented Agent for Geolocalization.
  • 2025.11.25 💻 We open-sourced SCALAR -- Scale-wise Controllable Visual Autoregressive Learning.
  • 2025.11.18 💻 We open-sourced Eevee -- Towards Close-up High-resolution Video-based Virtual Try-on.
  • 2025.10.22 💻 We open-sourced Taming-Hallucinations.
  • 2025.10.14 💻 We open-sourced EPG -- Advancing End-To-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training.
  • 2025.09.27 💻 We open-sourced HS-STaR -- Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation.
  • 2025.09.25 💻 We open-sourced Tree-GRPO -- Tree Search for LLM Agent Reinforcement Learning.
  • 2025.09.09 💻 We open-sourced DSFNet -- Disentangled Scenario Factorization for Multi-Scenario Route Ranking.
  • 2025.08.29 💻 We open-sourced Pos2Distill -- Position Bias Mitigated via Inter-Position Knowledge Distillation.
  • 2025.08.28 💻 We open-sourced FE2E.
  • 2025.08.18 💻 We open-sourced ImagerySearch -- Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints.
  • 2025.08.15 💻 We open-sourced S2-Guidance -- Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models.
  • 2025.08.11 💻 We open-sourced Omni-Effects.
  • 2025.07.16 💻 We open-sourced UPRE -- Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement.
  • 2025.07.03 💻 We open-sourced LD-RPS.
  • 2025.06.20 💻 We open-sourced FluxText -- A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing.
  • 2025.05.28 💻 We open-sourced FingER -- Content-Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos.
  • 2025.05.21 💻 We open-sourced UniVG-R1 -- Reasoning Guided Universal Visual Grounding with Reinforcement Learning.
  • 2025.05.09 💻 We open-sourced NarrLV -- A Comprehensive Narrative-Centric Evaluation for Long Video Generation Models.
  • 2025.04.07 💻 We open-sourced RealQA -- Realistic Image Quality and Aesthetic Scoring with Multimodal LLM.
  • 2025.04.03 💻 We open-sourced GPG -- A Simple and Strong Reinforcement Learning Baseline for Model Reasoning.
  • 2025.03.12 💻 We open-sourced VMBench -- A Benchmark for Perception-Aligned Video Motion Generation.
  • 2025.03.11 💻 We open-sourced USP -- Unified Self-Supervised Pretraining for Image Generation and Understanding.

📚 Research Areas

🧠 LLM Reasoning & Reinforcement Learning

Repository Description Venue
Tree-GRPO Adopts tree-search rollouts in place of independent chain-based rollouts for LLM agent RL, achieving superior performance with only a quarter of the rollout budget. ICLR 2026
GPG A minimalist RL approach (Group Policy Gradient) that directly optimizes the original RL objective, eliminating critic/reference models and KL constraints while outperforming GRPO. ICLR 2026
MathForge Proposes difficulty-aware GRPO and multi-aspect question reformulation to boost math reasoning by targeting harder questions from both algorithmic and data perspectives. ICLR 2026
HS-STaR A hierarchical sampling framework that identifies boundary-level problems and dynamically reallocates sampling budget toward high-utility problems for self-taught reasoners. EMNLP 2025
Pos2Distill A position-to-position knowledge distillation framework that transfers knowledge from advantageous positions to mitigate position bias in LLMs. EMNLP 2025

🎨 Image Generation & Editing

Repository Description Venue
FluxText A novel text editing framework for multi-line scene text in complex visual scenarios, with Condition Injection LoRA module and regional text perceptual loss. -
S2-Guidance Leverages stochastic block-dropping to construct sub-networks for training-free guidance, surpassing CFG on text-to-image and text-to-video generation. ICLR 2026
EPG Advancing end-to-end pixel-space generative modeling via self-supervised pre-training. ICLR 2026
Omni-Effects A unified framework for prompt-guided and spatially controllable composite visual effects generation, using LoRA-MoE and spatial-aware prompts. AAAI 2026
SCALAR Scale-wise controllable visual autoregressive learning for image generation. AAAI 2026
USP Unified self-supervised pretraining via masked latent modeling in VAE space, significantly improving diffusion model convergence and generation quality. ICCV 2025
SpatialGenEval A benchmark with 1,230 information-dense prompts and 12,300 multi-choice questions to evaluate complex spatial intelligence in text-to-image models. ICLR 2026

🎬 Video Generation & Understanding

Repository Description Venue
NarrLV The first benchmark to comprehensively evaluate narrative expression capabilities of long video generation models, inspired by film narrative theory. ICLR 2026
ImagerySearch A prompt-guided adaptive test-time search strategy that dynamically adjusts search space and reward for imaginative video generation with long-distance semantic dependencies. AAAI 2026
VMBench A perception-aligned video motion benchmark with human-aligned metrics achieving 35.3% improvement in Spearman's correlation over baselines. ICCV 2025
Eevee A high-resolution dataset and benchmark for video-based virtual try-on, supporting both full-shot and close-up garment detail views. -
FingER Content-aware fine-grained evaluation with reasoning for AI-generated videos. ACM MM 2025
FE2E - -

👁️ Multimodal & Vision-Language Models

Repository Description Venue
UniVG-R1 Reasoning guided universal visual grounding with reinforcement learning. -
SocioReasoner A vision-language reasoning framework for urban socio-semantic segmentation that simulates human annotation via cross-modal recognition and multi-stage RL-based reasoning. ICLR 2026
RealQA A 14,715-image UGC dataset with 10 fine-grained attributes for realistic image quality and aesthetic scoring; achieves SOTA on 5 public IQA/IAA benchmarks using next-token prediction. -
Code2World A GUI world model via renderable code generation. -
Taming-Hallucinations - -
STV A sensitivity-aware task vector insertion framework that identifies context-sensitive heads and selects task vectors via RL for many-shot multimodal in-context learning. AAAI 2026

🗺️ Maps, Mobility & Spatial Intelligence

Repository Description Venue
Thinking-with-Map A map-augmented agent that conducts reasoning with real-world maps for geolocalization, trained via reinforcement learning. -
MobilityBench A scalable benchmark for evaluating route-planning agents in real-world mobility scenarios. -
DSFNet Disentangled scenario factorization for multi-scenario route ranking with the first large-scale public MSDR dataset; deployed in AMap for online traffic. WWW 2025
AR-MAP - -

🔍 Object Detection & Segmentation

Repository Description Venue
UPRE Zero-shot domain adaptation for object detection via unified prompt and representation enhancement. ICCV 2025
LD-RPS LD-RPS. ICCV 2025

Popular repositories Loading

  1. FluxText FluxText Public

    Implementation of "FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing"

    Python 435 30

  2. Tree-GRPO Tree-GRPO Public

    [ICLR 2026] Tree Search for LLM Agent Reinforcement Learning

    Python 283 24

  3. FE2E FE2E Public

    192 7

  4. GPG GPG Public

    [ICLR26]GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning

    Python 173 5

  5. Omni-Effects Omni-Effects Public

    [AAAI2026] Implementation Code for Omni-Effects

    Python 173 6

  6. UniVG-R1 UniVG-R1 Public

    UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

    Python 157 7

Repositories

Showing 10 of 36 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…