ARIA: Artistic Rendering of Images into Audio

An AI model that generates MIDI music based on the emotional content of artwork using CLIP-based image encoding.

Dependencies

This project uses the following external dependencies:

midi-emotion - For MIDI music generation based on emotions

Setup

Clone the repository with submodules:

git clone --recursive https://github.com/vincentamato/aria.git
cd aria

Initialize and update submodules (if cloned without --recursive):

git submodule init
git submodule update

Install the package and dependencies:

pip install -e .  # This will also install the local midi-emotion package

Download model files:

Visit ARIA on Hugging Face

Download the following files and place them in the corresponding directories:

models/
├── continuous_concat/     # For continuous vector concatenation
│   ├── model.pt
│   ├── mappings.pt
│   └── model_config.pt
├── continuous_token/      # For continuous vector prepending
│   ├── model.pt
│   ├── mappings.pt
│   └── model_config.pt
└── discrete_token/        # For discrete emotion tokens
    ├── model.pt
    ├── mappings.pt
    └── model_config.pt

Also download image_encoder.pt for the CLIP-based image emotion model

How It Works

ARIA uses two main components:

A CLIP-based image encoder that extracts emotional content (valence and arousal) from artwork
A music generation model (midi-emotion) that creates MIDI music based on these emotions

Emotion Conditioning Modes

ARIA supports different ways of incorporating emotional information into the music generation process:

Continuous Concat (Default): Embeds emotions as continuous vectors and concatenates them to all tokens in the sequence. This provides consistent emotional influence throughout the generation.
Continuous Token: Embeds emotions as continuous vectors and prepends them to the sequence. The emotional information is provided at the start of generation.
Discrete Token: Quantizes emotions into discrete bins and uses them as special tokens. Useful when you want more distinct emotional categories.
None: Generates music without emotional conditioning. Use this for baseline comparison or when you want purely structural music generation.

Usage

Generate music from an image using the following command:

python src/models/aria/generate.py \
    --image path/to/your/image.jpg \
    --image_model_checkpoint path/to/image/model.pt \
    --midi_model_dir path/to/midi_emotion/model \
    --conditioning continuous_token \
    --out_dir output

Required Arguments

--image: Path to the input image
--image_model_checkpoint: Path to the CLIP-based image emotion model checkpoint
--midi_model_dir: Path to the midi-emotion model directory
--conditioning: Type of emotion conditioning (choices: none, discrete_token, continuous_token, continuous_concat)

Optional Arguments

--out_dir: Directory to save generated MIDI (default: "output")
--gen_len: Length of generation in tokens (default: 512)
--temperature: Temperature for sampling [note_temp, rest_temp] (default: [1.2, 1.2])
--top_k: Top-k sampling, -1 to disable (default: -1)
--top_p: Top-p sampling threshold (default: 0.7)
--min_instruments: Minimum number of instruments required (default: 1)
--cpu: Force CPU inference
--batch_size: Number of samples to generate (default: 1)

The model will output:

Predicted emotional values (valence and arousal)
Path to the generated MIDI file

Attribution

This project incorporates the following open-source works:

midi-emotion by Serkan Sulun et al. (https://github.com/serkansulun/midi-emotion)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
models		models
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
modelcard.md		modelcard.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARIA: Artistic Rendering of Images into Audio

Dependencies

Setup

How It Works

Emotion Conditioning Modes

Usage

Required Arguments

Optional Arguments

Attribution

About

Uh oh!

Releases

Packages

Languages

vincentamato/ARIA

Folders and files

Latest commit

History

Repository files navigation

ARIA: Artistic Rendering of Images into Audio

Dependencies

Setup

How It Works

Emotion Conditioning Modes

Usage

Required Arguments

Optional Arguments

Attribution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages