Skip to content

cornball-ai/diffuseR

Repository files navigation

diffuseR

CRAN status Lifecycle: experimental

Overview

diffuseR is a functional R implementation of diffusion models, inspired by Hugging Face's Python diffusers library. The package provides a simple, idiomatic R interface to state-of-the-art generative AI models for image generation and manipulation using base R and the torch package. No Python dependencies. Currently supports Windows and Linux cpu and cuda devices.

Example output

Installation

First install torch. As per this comment, using the pre-built binaries from https://torch.mlverse.org/docs/articles/installation#pre-built is heavily recommend. "The pre-built binaries bundle the necessary CUDA and cudnn versions, so you don't need a global compatible system version of CUDA":

options(timeout = 600) # increasing timeout is recommended since we will be downloading a 2GB file.
# For Windows and Linux: "cpu", "cu128" are the only currently supported
# For MacOS the supported are: "cpu-intel" or "cpu-m1"
kind <- "cu124"
version <- available.packages()["torch","Version"]
options(repos = c(
  torch = sprintf("https://torch-cdn.mlverse.org/packages/%s/%s/", kind, version),
  CRAN = "https://cloud.r-project.org" # or any other from which you want to install the other R dependencies.
))
install.packages("torch")

You can install the development version of diffuseR from GitHub:

# install.packages("devtools")
devtools::install_github("cornball-ai/diffuseR")
# Or
# install.packages("targets")
targets::install_github("cornball-ai/diffuseR")

Features

  • Text-to-Image Generation: Create images from textual descriptions
  • Image-to-Image Generation: Modify existing images based on text prompts
  • Two Models: Stable Diffusion 2.1 and SDXL (fully native R torch implementation)
  • Scheduler Options: DDIM (more coming soon)
  • Device Support: CPU and CUDA GPUs (including Blackwell RTX 50xx)
  • R-native Interface: Functional programming approach that feels natural in R

Quick Start

Basic Usage

Warning: The first time you run the code below, it will download ~5.3GB of Stable Diffusion 2.1 CPU-only model files from Hugging Face and load them into memory. Ensure you have enough RAM, disk space, and a stable internet connection. Memory management with deep learning models is crucial, so consider using a machine with sufficient resources; ~8GB of free RAM is recommended for running Stable Diffusion 2.1 on CPU only.

options(timeout = 600) # increasing timeout is recommended since we will be downloading a 3.5GB file.
library(diffuseR)
torch::local_no_grad()

# Generate an image from text
cat_img <- txt2img(
  prompt = "a photorealistic cat wearing sunglasses",
  model = "sd21", # Specify the model to use, e.g., "sd21" for Stable Diffusion 2.1
  download_models = TRUE, # Automatically download the model if not already present
  steps = 30,
  seed = 42,
  filename = "cat.png",
)

# Clear out pipeline to free up GPU memory
pipeline <- NULL
torch::cuda_empty_cache()

Advanced Usage with GPU

The unet is the most computationally-intensive part of the model, so it is recommended to run it on a GPU if possible. The decoder and text encoder can be run on CPU if you have limited GPU memory. SDXL's unet requires a minimum of 6GB of GPU memory (VRAM), while Stable Diffusion 2.1 requires a minimum of 2GB.

# Increasing timeout is recommended since we will be downloading 5.1 and 2.8GB model files, among others.
options(timeout = 1200) 

library(diffuseR)
torch::local_no_grad() # Prevents torch from tracking gradients, which is not needed for inference

# Assign the various deep learning models to devices
model_name = "sdxl"
devices = list(unet = "cuda", decoder = "cpu",
               text_encoder = "cpu", encoder = "cpu")

m2d <- models2devices(model_name = model_name, devices = devices,
                      unet_dtype_str = "float16", download_models = TRUE)

pipeline <- load_pipeline(model_name = model_name, m2d = m2d, i2i = TRUE,
                          unet_dtype_str = "float16")

# Generate an image from text
cat_img <- txt2img(
  prompt = "a photorealistic cat wearing sunglasses",
  model_name = model_name,
  devices = devices,
  pipeline = pipeline,
  num_inference_steps = 30,
  guidance_scale = 7.5,
  seed = 42,
  filename = "cat2.png",
)

gambling_cat <- img2img(
  input_image = "cat2.png",
  prompt = "a photorealistic cat throwing dice",
  img_dim = 1024,
  model_name = model_name,
  devices = devices,
  pipeline = pipeline,
  num_inference_steps = 30,
  strength = 0.75,
  guidance_scale = 7.5,
  seed = 42,
  filename = "gambling_cat.png"
)

# Clear out pipeline to free up GPU memory
pipeline <- NULL
torch::cuda_empty_cache()

Supported Models

Currently supported models:

  • Stable Diffusion 2.1
  • Stable Diffusion XL (SDXL)
  • More coming soon!

Roadmap

Future plans for diffuseR include:

  • Inpainting support
  • Additional schedulers (PNDM, DPMSolverMultistep, Euler ancestral)
  • text-to-video generation

How It Works

diffuseR supports two execution modes:

Native R torch (recommended for SDXL): Pure R implementations of VAE decoder, text encoders, and UNet. Required for Blackwell GPUs (RTX 50xx series) and recommended for best compatibility. Enable with use_native_* flags or use txt2img_sdxl() which defaults to native.

TorchScript (legacy): Pre-exported models from PyTorch. Still available for SD21 and older GPUs. Scripts to build TorchScript files are at diffuseR-TS.

No Python dependencies required for either mode.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache 2. License - see the LICENSE file for details.

Acknowledgments

  • Hugging Face for the original diffusers library
  • Stability AI for Stable Diffusion
  • The R and torch communities for their excellent tooling

About

an R port of huggingface/diffusers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages