-
Add the following to your
.envfile (if using the default implementations):# .env VITE_DEEPGRAM_KEY= VITE_11LABS_KEY= VITE_OPENAI_KEY= -
Install the required dependencies:
npm install "@mantine/core@^7.11.2" "@mantine/hooks@^7.11.2" "@ricky0123/vad-react@^0.0.24" "@ricky0123/vad-web@^0.0.18" "axios@^1.7.2" "elevenlabs@^0.9.1" "howler@^2.2.4" "onnxruntime-web@^1.18.0"
Or if you use Yarn:
yarn add "@mantine/core@^7.11.2" "@mantine/hooks@^7.11.2" "@ricky0123/vad-react@^0.0.24" "@ricky0123/vad-web@^0.0.18" "axios@^1.7.2" "elevenlabs@^0.9.1" "howler@^2.2.4" "onnxruntime-web@^1.18.0"
-
Copy all files from the
publicfolder in theVoiceModepackage to your project'spublicfolder. -
Copy/paste the
VoiceModefolder into your repository. -
Import the component in your project:
// example import import VoiceMode from "@components/VoiceMode/VoiceMode";
The VoiceMode component accepts the following props:
- fillerAudioFileSrcList (required): List of filler audio files to play when the user stops speaking.
- fillerTriggerTimeMs (optional): Time in milliseconds to wait before playing filler audio. Default is
2500. - playbackRate (optional): Playback speed of the audio, ranging from
0.5to2.0. Default is0.92. - containerStyles (optional): Custom styles for the container, used to change background, border, blur, etc. Default styles are applied.
- showTranscription (optional): Whether to show transcription messages. Default is
true. - transcribeAudio (optional): Function to transcribe audio. Default is
transcribeAudioDeepgram. - getCompletion (optional): Function to get completion text. Default is
getCompletionOpenAI. - getAudioFromText (optional): Function to generate audio from text. Default is
getAudioFromTextElevenLabs.
The VoiceMode component comes with the following default values:
const VoiceMode: React.FC<{
fillerAudioFileSrcList: AudioToHowl[];
fillerTriggerTimeMs?: number;
playbackRate?: number;
containerStyles?: React.CSSProperties;
showTranscription?: boolean;
transcribeAudio?: (audioBlob: Blob) => Promise<string>;
getCompletion?: (text: string, history?: Chat[]) => Promise<string>;
getAudioFromText?: (text: string) => Promise<{ src: string; format: string } | null>;
}> = ({
fillerAudioFileSrcList,
fillerTriggerTimeMs = 2500,
playbackRate = 0.92,
containerStyles = {
background: "linear-gradient(0deg, rgba(2, 1, 19, 0.4), rgba(2, 1, 19, 0.4)), rgba(240, 1, 246, 0.1)",
border: "1px solid rgba(255, 255, 255, 0.1)",
backdropFilter: "blur(150px)",
},
showTranscription = true,
transcribeAudio = transcribeAudioDeepgram,
getCompletion = getCompletionOpenAI,
getAudioFromText = getAudioFromTextElevenLabs,
}) => {
// Component implementation here
}Implement the following functions to customize the behavior of the VoiceMode component:
const getFillerAudioFileSrcList = (persona) => {
return [
{ src: "https://example.com/audio.mp3", format: "mp3" },
];
};const getTranscribeAudioFn = (persona) => {
return async (audioBlob) => {
return "transcribed text";
};
};const getCompletionFn = (persona) => {
return async (text, history) => {
return "completion text";
};
};const getAudioFromTextFn = (persona) => {
return async (text) => {
return { src: "https://example.com/audio.mp3", format: "mp3" };
};
};<VoiceMode
fillerAudioFileSrcList={getFillerAudioFileSrcList("persona-1")}
transcribeAudio={getTranscribeAudioFn("persona-1")}
getCompletion={getCompletionFn("persona-1")}
getAudioFromText={getAudioFromTextFn("persona-1")}
/>Feel free to adjust the default values and implement the required functions to suit your needs.
Author: Sameer Pashikanti @spashii