Skip to content

hu-zijing/Awesome-Multimodal-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

Reinforcement Learning in Generative Multimodal AI

Introduction

Generative multimodal artificial intelligence (AI) has achieved remarkable progress in recent years, driven by large-scale pre-training and the emergence of powerful foundation models. While these models have demonstrated strong capabilities in perception, reasoning, and content synthesis, their training is predominantly based on supervised objectives, which are often insufficient to capture task-specific goals and user intent. Reinforcement learning (RL) has therefore emerged as a critical training framework for improving generative multimodal models.

This repository collects research papers on reinforcement learning in generative multimodal AI. We primarily focus on three categories of models:

  • Multimodal understanding models, which focus on perceiving and reasoning over visual inputs and produce corresponding natural language responses.
  • Visual generation models, which synthesize visual content conditioned on textual prompts or inputs from other modalities.
  • Unified models, which adopt a single framework to jointly support visual understanding and visual generation, allowing multimodal inputs and flexibly producing outputs in the visual or textual form.

Papers

Autoregression-based RL

Diffusion-based RL

About

This repository collects research papers on reinforcement learning in generative multimodal models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors