Skip to content

ZeroMeOut/ImageFromAudio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio to Image using various models

Recently I have been interested in audio-to-image generation. That is using the audio of something, let's say the sound of being inside a beach to generate an image of a beach. Now I didn't find a dataset that pairs audio to images, so I made one based on this paper. Basically an MNSIT-FSDD dataset. I also made branches for each step of the process while I was trying to figure out things. First was WGAN, I used a YouTube tutorial for that. Then I went on to learn about and code up a VAE (specifically a CVAE) with conv layers. I also coded up a VAEGAN and used the loss from the first paper I linked (I am not still sure it's correct lol). CGAN added 5 months later, future me typing. Done with my beloved, Python.

Results after 100 epochs

CVAE
CVAE
VAEGAN
VAEGAN
CGAN
CGAN

I know I am supposed to show some charts for the loss, but I will probably do that later (or you can help me with that :3)(5 months later and I still haven't done it lmao).

Thoughts

It seems to struggle to reconstruct the image after a certain point. I honestly don't know if it's because of the loss function I used, the learning rate, the one-to-one pairing for making the dataset, or some other thing I can't think of. But I will figure it out eventually, this was just an attempt after all. Feel free to yoink it and do whatever :) orrr help figure it out idk just a thought.

5 months later edit:

Hello there, I finally came back to this because every time I opened the repo it bothered me a bit lol. I used CGAN as you can see above and it blows the other two out of the water, which was pretty exciting to see in real time while I was training it. The paper and GitHub that I totally didn't collect from are in the comments just at the top of the CGAN.py file in the models' folder. Till next time :3

About

An attempt to generate an image from audio using various models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages