Voice Chimera is a research project focused on redefining voice identity through innovative synthesis techniques. This repository contains the data, research paper, and Jupyter notebook related to our experiments on voice blending using diverse methods like Linear Interpolation, Spherical Linear Interpolation, Genetic Algorithm-inspired approaches, and Principal Component Analysis.
This repository includes the following files:
- Voice Chimera Experiment.xlsx - Contains the experimental data and results from our voice blending methods.
- Voice Chimera: Redefining Voice Identity Through Diverse Synthesis Techniques.pdf - A detailed research paper explaining the methodologies, results, and implications of our experiments.
- Voice_Chimera_Research.ipynb - A Jupyter notebook that includes the code for our experiments, data analysis, and visualization of results.
The Voice Chimera project explores the creation of unique vocal identities by blending characteristics from multiple speakers. Utilizing advanced machine learning models like the VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech), this research investigates four voice blending techniques to generate natural-sounding, diverse synthetic voices. These techniques include:
- Linear Interpolation (LERP)
- Spherical Linear Interpolation (SLERP)
- Genetic Algorithm-Inspired Method
- Principal Component Analysis (PCA)
The objective is to enhance the naturalness and diversity of synthetic voices, which could have significant applications in entertainment, assistive technologies, and personalized AI interfaces.
To run the Jupyter notebook:
- Ensure you have Python installed, along with Jupyter.
- Install the necessary libraries with
pip install numpy matplotlib scikit-learn torch. - Launch Jupyter Notebook in the directory containing
Voice_Chimera_Research.ipynband run the cells sequentially.
For any inquiries, please reach out to Ju Lee, Viterbi School of Engineering, University of Southern California.
This research is supported by insights and frameworks developed by various researchers cited in this paper. Special thanks to the developers of the VITS model and all participants in our voice synthesis studies.