A voice-powered conversational AI agent with session memory capabilities. This implementation provides basic functionality for voice input/output and conversation memory as specified in the technical requirements.
✅ Voice Input/Output: Speech-to-text and text-to-speech capabilities
✅ Session Memory: Remembers conversation history within the current session
✅ LLM Integration: Uses OpenAI GPT-4 for intelligent responses
✅ Conversation Loop: Continuous voice interaction until user exits
pip install -r requirements.txtexport OPENAI_API_KEY="your-api-key-here"Or create a .env file:
OPENAI_API_KEY=your-api-key-here
python demo.pyChoose between:
- Full voice mode: Complete voice interaction (requires microphone + speakers)
- Text-only mode: Test conversation logic without audio
python voice_agent.py- ASR (Speech-to-Text): Captures audio via microphone using Google Speech Recognition
- LLM Processing: Sends text to OpenAI GPT-4 with conversation context
- TTS (Text-to-Speech): Converts response to audio using pyttsx3
- Memory Update: Stores conversation turn in session memory
- Maintains a sliding window of recent conversation turns (default: 6 turns)
- Provides context to the LLM for coherent, contextual responses
- Tracks session statistics and conversation history
Environment variables:
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
Required | OpenAI API key |
VOICE_AGENT_MODEL |
gpt-4 |
OpenAI model to use |
CONVERSATION_WINDOW_SIZE |
6 |
Number of turns to remember |
TTS_VOICE_ID |
0 |
Voice ID for text-to-speech |
ASR_LANGUAGE |
en-US |
Speech recognition language |
You: "Hello, what's the weather like?"
Agent: "I don't have access to real-time weather data, but I'd be happy to help you with other questions or have a conversation!"
You: "What did I just ask about?"
Agent: "You just asked about the weather. I remember our conversation history in this session."
Say any of these to end the conversation:
- "exit"
- "quit"
- "goodbye"
- "stop"
- Python 3.8+
- Microphone (for voice input)
- Speakers or headphones (for voice output)
- Internet connection (for OpenAI API and speech recognition)
openai>=1.0.0- LLM integrationspeech_recognition>=3.10.0- Speech-to-textpyttsx3>=2.90- Text-to-speechpyaudio>=0.2.11- Audio input/outputpython-dotenv>=1.0.0- Environment variable management
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Microphone │───▶│ ASR (STT) │───▶│ Session Memory │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
┌─────────────────┐ ┌──────────────────┐ ▼
│ Speakers │◀───│ TTS │ ┌─────────────────┐
└─────────────────┘ └──────────────────┘ │ LLM (GPT-4) │
▲ └─────────────────┘
│
┌──────────────────┐
│ Response Gen. │
└──────────────────┘
- Microphone not working: Check system permissions and microphone settings
- No sound output: Verify speakers/headphones and system volume
- Speech not recognized: Try speaking clearly and ensure internet connection
- OpenAI API errors: Verify API key and check account credits
- Rate limiting: Wait a moment and try again
- PyAudio installation fails:
- macOS:
brew install portaudiothenpip install pyaudio - Ubuntu:
sudo apt-get install python3-pyaudio - Windows: Download wheel from PyAudio Windows
- macOS:
voice_agent.py # Main voice agent implementation
demo.py # Demo script with multiple modes
requirements.txt # Python dependencies
README.md # This documentation
rules.mdc # Technical specifications
VoiceAgent: Main agent orchestrating audio pipeline and conversationSessionMemory: Manages conversation history and contextConversationTurn: Data structure for individual conversation exchanges
This implementation provides the basic functionality requested. The architecture is designed to be extensible for future enhancements like:
- Long-term memory persistence across sessions
- User profiles and personalization
- RAG (Retrieval-Augmented Generation) capabilities
- Safety and content moderation
- Proactive conversation features
- Multi-language support
This project is part of a hackathon implementation focusing on core voice agent functionality.