Skip to content
This repository was archived by the owner on May 26, 2025. It is now read-only.
/ voice-mode Public archive

A React component for voice conversations with AI, featuring real-time speech-to-text, intelligent audio responses, and natural conversation flow with filler audio support.

Notifications You must be signed in to change notification settings

spashii/voice-mode

Repository files navigation

VoiceMode Component

Installation (Yank and Paste Method!)

  1. Add the following to your .env file (if using the default implementations):

    # .env
    VITE_DEEPGRAM_KEY=
    VITE_11LABS_KEY=
    VITE_OPENAI_KEY=
  2. Install the required dependencies:

    npm install "@mantine/core@^7.11.2" "@mantine/hooks@^7.11.2" "@ricky0123/vad-react@^0.0.24" "@ricky0123/vad-web@^0.0.18" "axios@^1.7.2" "elevenlabs@^0.9.1" "howler@^2.2.4" "onnxruntime-web@^1.18.0"

    Or if you use Yarn:

    yarn add "@mantine/core@^7.11.2" "@mantine/hooks@^7.11.2" "@ricky0123/vad-react@^0.0.24" "@ricky0123/vad-web@^0.0.18" "axios@^1.7.2" "elevenlabs@^0.9.1" "howler@^2.2.4" "onnxruntime-web@^1.18.0"
  3. Copy all files from the public folder in the VoiceMode package to your project's public folder.

  4. Copy/paste the VoiceMode folder into your repository.

  5. Import the component in your project:

    // example import
    import VoiceMode from "@components/VoiceMode/VoiceMode";

VoiceMode Component Props

The VoiceMode component accepts the following props:

  • fillerAudioFileSrcList (required): List of filler audio files to play when the user stops speaking.
  • fillerTriggerTimeMs (optional): Time in milliseconds to wait before playing filler audio. Default is 2500.
  • playbackRate (optional): Playback speed of the audio, ranging from 0.5 to 2.0. Default is 0.92.
  • containerStyles (optional): Custom styles for the container, used to change background, border, blur, etc. Default styles are applied.
  • showTranscription (optional): Whether to show transcription messages. Default is true.
  • transcribeAudio (optional): Function to transcribe audio. Default is transcribeAudioDeepgram.
  • getCompletion (optional): Function to get completion text. Default is getCompletionOpenAI.
  • getAudioFromText (optional): Function to generate audio from text. Default is getAudioFromTextElevenLabs.

Default Values

The VoiceMode component comes with the following default values:

const VoiceMode: React.FC<{
  fillerAudioFileSrcList: AudioToHowl[];
  fillerTriggerTimeMs?: number;
  playbackRate?: number;
  containerStyles?: React.CSSProperties;
  showTranscription?: boolean;
  transcribeAudio?: (audioBlob: Blob) => Promise<string>;
  getCompletion?: (text: string, history?: Chat[]) => Promise<string>;
  getAudioFromText?: (text: string) => Promise<{ src: string; format: string } | null>;
}> = ({
  fillerAudioFileSrcList,
  fillerTriggerTimeMs = 2500,
  playbackRate = 0.92,
  containerStyles = {
    background: "linear-gradient(0deg, rgba(2, 1, 19, 0.4), rgba(2, 1, 19, 0.4)), rgba(240, 1, 246, 0.1)",
    border: "1px solid rgba(255, 255, 255, 0.1)",
    backdropFilter: "blur(150px)",
  },
  showTranscription = true,
  transcribeAudio = transcribeAudioDeepgram,
  getCompletion = getCompletionOpenAI,
  getAudioFromText = getAudioFromTextElevenLabs,
}) => {
  // Component implementation here
}

Function Implementations

Implement the following functions to customize the behavior of the VoiceMode component:

getFillerAudioFileSrcList

const getFillerAudioFileSrcList = (persona) => {
  return [
    { src: "https://example.com/audio.mp3", format: "mp3" },
  ];
};

getTranscribeAudioFn

const getTranscribeAudioFn = (persona) => {
  return async (audioBlob) => {
    return "transcribed text";
  };
};

getCompletionFn

const getCompletionFn = (persona) => {
  return async (text, history) => {
    return "completion text";
  };
};

getAudioFromTextFn

const getAudioFromTextFn = (persona) => {
  return async (text) => {
    return { src: "https://example.com/audio.mp3", format: "mp3" };
  };
};

Example Usage

<VoiceMode
   fillerAudioFileSrcList={getFillerAudioFileSrcList("persona-1")}
   transcribeAudio={getTranscribeAudioFn("persona-1")}
   getCompletion={getCompletionFn("persona-1")}
   getAudioFromText={getAudioFromTextFn("persona-1")}
/>

Feel free to adjust the default values and implement the required functions to suit your needs.

Author: Sameer Pashikanti @spashii

About

A React component for voice conversations with AI, featuring real-time speech-to-text, intelligent audio responses, and natural conversation flow with filler audio support.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published