Skip to content

banerjeepragyan/Voice-Based-Virtual-Trial-Room

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice-Based-Virtual-Trial-Room

The project takes in voice input from the user and retrieves the most relevant clothing items from its database and overlays it on the image of the user. Our target was to create an automated virtual try on operated through voice.

Our model aims at generating photo-realistic try-on result while preserving both the character of clothes and details of human identity (posture, body parts, bottom clothes) through speech input from the user. This has the potential to revolutionize user’s experience while shopping for clothes online.

Voice to Text

We make use of Wav2Vec model for generating transcript for the user’s voice input.

alt text

Cloth Selection

● We employ CLIP Encoders to generate embeddings for clothes and user’s cloth description.

● Most similar cloth is selected using cosine similarity between Image embeddings and text embedding.

alt text

Overlaying Cloth Image

We overlay the cloth image using 3 modules involving

○ Semantic Generation Module (SGM),

○ Clothes Warping Module (CWM)

○ Content Fusion Module (CFM).

alt text

Results

The following are some of the results we achieved

alt text

References

● Wav2Vec2 : Unsupervised pre-training for speech recognition

● CLIP: Learning Transferable Visual Models From Natural Language Supervision

● Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors