Voice Agent

A voice-powered conversational AI agent with session memory capabilities. This implementation provides basic functionality for voice input/output and conversation memory as specified in the technical requirements.

Features

✅ Voice Input/Output: Speech-to-text and text-to-speech capabilities
✅ Session Memory: Remembers conversation history within the current session
✅ LLM Integration: Uses OpenAI GPT-4 for intelligent responses
✅ Conversation Loop: Continuous voice interaction until user exits

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Set OpenAI API Key

export OPENAI_API_KEY="your-api-key-here"

Or create a .env file:

OPENAI_API_KEY=your-api-key-here

3. Run the Demo

python demo.py

Choose between:

Full voice mode: Complete voice interaction (requires microphone + speakers)
Text-only mode: Test conversation logic without audio

4. Direct Usage

python voice_agent.py

How It Works

Audio Pipeline

ASR (Speech-to-Text): Captures audio via microphone using Google Speech Recognition
LLM Processing: Sends text to OpenAI GPT-4 with conversation context
TTS (Text-to-Speech): Converts response to audio using pyttsx3
Memory Update: Stores conversation turn in session memory

Session Memory

Maintains a sliding window of recent conversation turns (default: 6 turns)
Provides context to the LLM for coherent, contextual responses
Tracks session statistics and conversation history

Configuration

Environment variables:

Variable	Default	Description
`OPENAI_API_KEY`	Required	OpenAI API key
`VOICE_AGENT_MODEL`	`gpt-4`	OpenAI model to use
`CONVERSATION_WINDOW_SIZE`	`6`	Number of turns to remember
`TTS_VOICE_ID`	`0`	Voice ID for text-to-speech
`ASR_LANGUAGE`	`en-US`	Speech recognition language

Usage Examples

Basic Conversation

You: "Hello, what's the weather like?"
Agent: "I don't have access to real-time weather data, but I'd be happy to help you with other questions or have a conversation!"

You: "What did I just ask about?"
Agent: "You just asked about the weather. I remember our conversation history in this session."

Exit Commands

Say any of these to end the conversation:

"exit"
"quit"
"goodbye"
"stop"

Requirements

System Requirements

Python 3.8+
Microphone (for voice input)
Speakers or headphones (for voice output)
Internet connection (for OpenAI API and speech recognition)

Dependencies

openai>=1.0.0 - LLM integration
speech_recognition>=3.10.0 - Speech-to-text
pyttsx3>=2.90 - Text-to-speech
pyaudio>=0.2.11 - Audio input/output
python-dotenv>=1.0.0 - Environment variable management

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Microphone    │───▶│   ASR (STT)      │───▶│  Session Memory │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                        │
┌─────────────────┐    ┌──────────────────┐            ▼
│   Speakers      │◀───│   TTS            │    ┌─────────────────┐
└─────────────────┘    └──────────────────┘    │   LLM (GPT-4)   │
                                ▲               └─────────────────┘
                                │
                        ┌──────────────────┐
                        │  Response Gen.   │
                        └──────────────────┘

Troubleshooting

Audio Issues

Microphone not working: Check system permissions and microphone settings
No sound output: Verify speakers/headphones and system volume
Speech not recognized: Try speaking clearly and ensure internet connection

API Issues

OpenAI API errors: Verify API key and check account credits
Rate limiting: Wait a moment and try again

Installation Issues

PyAudio installation fails:
- macOS: brew install portaudio then pip install pyaudio
- Ubuntu: sudo apt-get install python3-pyaudio
- Windows: Download wheel from PyAudio Windows

Development

Project Structure

voice_agent.py    # Main voice agent implementation
demo.py          # Demo script with multiple modes
requirements.txt # Python dependencies
README.md        # This documentation
rules.mdc        # Technical specifications

Key Classes

VoiceAgent: Main agent orchestrating audio pipeline and conversation
SessionMemory: Manages conversation history and context
ConversationTurn: Data structure for individual conversation exchanges

Next Steps

This implementation provides the basic functionality requested. The architecture is designed to be extensible for future enhancements like:

Long-term memory persistence across sessions
User profiles and personalization
RAG (Retrieval-Augmented Generation) capabilities
Safety and content moderation
Proactive conversation features
Multi-language support

License

This project is part of a hackathon implementation focusing on core voice agent functionality.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
chroma_db		chroma_db
.gitignore		.gitignore
ELDERLY_CARE_README.md		ELDERLY_CARE_README.md
IMPROVED_LISTENING_README.md		IMPROVED_LISTENING_README.md
README.md		README.md
analysis_partial_failures.md		analysis_partial_failures.md
better_voices.py		better_voices.py
demo.py		demo.py
demo_personas.py		demo_personas.py
elderly_care_profiles.py		elderly_care_profiles.py
elderly_chat_demo.py		elderly_chat_demo.py
enhanced_voice_demo.py		enhanced_voice_demo.py
fix_tts_voice.py		fix_tts_voice.py
improved_listening.py		improved_listening.py
install.py		install.py
long_term_memory.py		long_term_memory.py
openai_tts_upgrade.py		openai_tts_upgrade.py
proactive_agent.py		proactive_agent.py
requirements.txt		requirements.txt
rules.mdc		rules.mdc
test_ai_voice.py		test_ai_voice.py
test_improved_listening.py		test_improved_listening.py
user_profile.py		user_profile.py
user_profiles.db		user_profiles.db
voice_agent.py		voice_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Agent

Features

Quick Start

1. Install Dependencies

2. Set OpenAI API Key

3. Run the Demo

4. Direct Usage

How It Works

Audio Pipeline

Session Memory

Configuration

Usage Examples

Basic Conversation

Exit Commands

Requirements

System Requirements

Dependencies

Architecture

Troubleshooting

Audio Issues

API Issues

Installation Issues

Development

Project Structure

Key Classes

Next Steps

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Agent

Features

Quick Start

1. Install Dependencies

2. Set OpenAI API Key

3. Run the Demo

4. Direct Usage

How It Works

Audio Pipeline

Session Memory

Configuration

Usage Examples

Basic Conversation

Exit Commands

Requirements

System Requirements

Dependencies

Architecture

Troubleshooting

Audio Issues

API Issues

Installation Issues

Development

Project Structure

Key Classes

Next Steps

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages