Skip to content

ronin207/fun-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice Agent

A voice-powered conversational AI agent with session memory capabilities. This implementation provides basic functionality for voice input/output and conversation memory as specified in the technical requirements.

Features

Voice Input/Output: Speech-to-text and text-to-speech capabilities
Session Memory: Remembers conversation history within the current session
LLM Integration: Uses OpenAI GPT-4 for intelligent responses
Conversation Loop: Continuous voice interaction until user exits

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Set OpenAI API Key

export OPENAI_API_KEY="your-api-key-here"

Or create a .env file:

OPENAI_API_KEY=your-api-key-here

3. Run the Demo

python demo.py

Choose between:

  • Full voice mode: Complete voice interaction (requires microphone + speakers)
  • Text-only mode: Test conversation logic without audio

4. Direct Usage

python voice_agent.py

How It Works

Audio Pipeline

  1. ASR (Speech-to-Text): Captures audio via microphone using Google Speech Recognition
  2. LLM Processing: Sends text to OpenAI GPT-4 with conversation context
  3. TTS (Text-to-Speech): Converts response to audio using pyttsx3
  4. Memory Update: Stores conversation turn in session memory

Session Memory

  • Maintains a sliding window of recent conversation turns (default: 6 turns)
  • Provides context to the LLM for coherent, contextual responses
  • Tracks session statistics and conversation history

Configuration

Environment variables:

Variable Default Description
OPENAI_API_KEY Required OpenAI API key
VOICE_AGENT_MODEL gpt-4 OpenAI model to use
CONVERSATION_WINDOW_SIZE 6 Number of turns to remember
TTS_VOICE_ID 0 Voice ID for text-to-speech
ASR_LANGUAGE en-US Speech recognition language

Usage Examples

Basic Conversation

You: "Hello, what's the weather like?"
Agent: "I don't have access to real-time weather data, but I'd be happy to help you with other questions or have a conversation!"

You: "What did I just ask about?"
Agent: "You just asked about the weather. I remember our conversation history in this session."

Exit Commands

Say any of these to end the conversation:

  • "exit"
  • "quit"
  • "goodbye"
  • "stop"

Requirements

System Requirements

  • Python 3.8+
  • Microphone (for voice input)
  • Speakers or headphones (for voice output)
  • Internet connection (for OpenAI API and speech recognition)

Dependencies

  • openai>=1.0.0 - LLM integration
  • speech_recognition>=3.10.0 - Speech-to-text
  • pyttsx3>=2.90 - Text-to-speech
  • pyaudio>=0.2.11 - Audio input/output
  • python-dotenv>=1.0.0 - Environment variable management

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Microphone    │───▶│   ASR (STT)      │───▶│  Session Memory │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                        │
┌─────────────────┐    ┌──────────────────┐            ▼
│   Speakers      │◀───│   TTS            │    ┌─────────────────┐
└─────────────────┘    └──────────────────┘    │   LLM (GPT-4)   │
                                ▲               └─────────────────┘
                                │
                        ┌──────────────────┐
                        │  Response Gen.   │
                        └──────────────────┘

Troubleshooting

Audio Issues

  • Microphone not working: Check system permissions and microphone settings
  • No sound output: Verify speakers/headphones and system volume
  • Speech not recognized: Try speaking clearly and ensure internet connection

API Issues

  • OpenAI API errors: Verify API key and check account credits
  • Rate limiting: Wait a moment and try again

Installation Issues

  • PyAudio installation fails:
    • macOS: brew install portaudio then pip install pyaudio
    • Ubuntu: sudo apt-get install python3-pyaudio
    • Windows: Download wheel from PyAudio Windows

Development

Project Structure

voice_agent.py    # Main voice agent implementation
demo.py          # Demo script with multiple modes
requirements.txt # Python dependencies
README.md        # This documentation
rules.mdc        # Technical specifications

Key Classes

  • VoiceAgent: Main agent orchestrating audio pipeline and conversation
  • SessionMemory: Manages conversation history and context
  • ConversationTurn: Data structure for individual conversation exchanges

Next Steps

This implementation provides the basic functionality requested. The architecture is designed to be extensible for future enhancements like:

  • Long-term memory persistence across sessions
  • User profiles and personalization
  • RAG (Retrieval-Augmented Generation) capabilities
  • Safety and content moderation
  • Proactive conversation features
  • Multi-language support

License

This project is part of a hackathon implementation focusing on core voice agent functionality.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages