A conversational AI assistant that helps users schedule meetings through natural language interactions, supporting both text and voice input. The agent integrates with Google Calendar to find available time slots and book meetings automatically.
- Natural Language Processing: Understands scheduling requests in plain English
- Voice & Text Support: Complete voice-to-voice conversations or traditional text chat
- Google Calendar Integration: Real-time availability checking and event creation
- Multi-turn Conversations: Maintains context throughout the scheduling process
- Smart Entity Extraction: Automatically extracts dates, times, durations, and meeting details
- Timezone Support: Configured for IST (Indian Standard Time) with proper timezone handling
-
Agent (agent.py): The brain of the system
- Uses Google's Gemini AI for natural language understanding
- Manages conversation state and flow
- Extracts entities (duration, date, time) from user input
- Coordinates between different services
-
Calendar Service (calendar_service.py): Google Calendar integration
- Authenticates with Google Calendar API
- Finds available time slots based on user preferences
- Creates calendar events with proper timezone handling
-
Audio Processing (audio_utils.py): Voice interaction capabilities
- Uses OpenAI Whisper for speech-to-text (local processing)
- Integrates ElevenLabs API for high-quality text-to-speech
- Optimized for low latency with async processing
-
Flask API (main.py): Web service layer
- Provides REST endpoints for text and voice interactions
- Manages user sessions
- Handles file uploads and audio processing
Conversational Flow: The agent follows a natural conversation pattern:
- Understand user intent
- Collect required information (duration, date, time)
- Search calendar for availability
- Present options to user
- Confirm selection and book meeting
State Management: Each user session maintains conversation state including:
- Required fields (duration, date, time)
- Available time slots
- Selected slot and confirmation status
- Conversation history for context
Async Processing: Voice interactions use async processing to handle transcription and TTS simultaneously for better performance.
- Python 3.8 or higher
- Google Cloud Console account
- ElevenLabs API account (for voice features)
- Google AI Studio API key (for Gemini)
git clone <repository-url>
cd smart-scheduler-ai
pip install -r requirements.txt- Go to Google Cloud Console
- Create a new project or select existing one
- Enable the Google Calendar API
- Create credentials (Desktop Application)
- Download the credentials file and save as
credentials.jsonin project root
Create a .env file in the project root:
# Google AI Studio (Gemini)
GEMINI_API_KEY=your_gemini_api_key_here
# ElevenLabs (for voice synthesis)
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
VOICE_ID=EXAVITQu4vr4xnSDxMaL
# Optional: Customize voice settings
# VOICE_ID=your_preferred_voice_idGemini API Key:
- Visit Google AI Studio
- Create a new API key
- Copy the key to your
.envfile
ElevenLabs API Key:
- Sign up at ElevenLabs
- Go to your profile settings
- Generate an API key
- Copy to your
.envfile
Install system audio dependencies:
Ubuntu/Debian:
sudo apt-get install ffmpeg portaudio19-devmacOS:
brew install ffmpeg portaudioWindows:
- Download FFmpeg from ffmpeg.org
- Add to system PATH
- Run the application:
python main.py- On first run, you'll be redirected to Google OAuth consent screen
- Grant calendar access permissions
- A
token.picklefile will be created for future authentication
- Web Interface: Open
http://localhost:5000in your browser - API Endpoints:
- POST
/chat- Text-based conversations - POST
/voice-chat- Voice-based conversations - POST
/reset- Reset conversation state
- POST
User: "I need to schedule a meeting"
Agent: "Sure! How long should the meeting be?"
User: "1 hour tomorrow afternoon"
Agent: "Got it - 1 hour meeting tomorrow afternoon. Let me check your calendar..."
Agent: "I found these available 60-minute slots on Tuesday, June 26:
1. 2:00 PM - 3:00 PM IST
2. 3:30 PM - 4:30 PM IST
3. 4:00 PM - 5:00 PM IST
Which slot would you prefer?"
User: "I'll take option 2"
Agent: "Great choice! What would you like to name this meeting?"
User: "Team standup"
Agent: "Perfect! Let me confirm:
Title: Team standup
Date: Tuesday, June 26
Time: 3:30 PM - 4:30 PM IST
Duration: 60 minutes
Should I book this meeting?"
User: "Yes"
Agent: "Meeting booked successfully! Event ID: abc123..."
- Click the microphone button
- Speak your request: "Schedule a 30-minute meeting for tomorrow morning"
- The agent will transcribe, process, and respond with voice
- Continue the conversation naturally until booking is complete
# Whisper model size (speed vs accuracy trade-off)
WHISPER_MODEL = 'tiny' # Options: tiny, base, small, medium, large
# TTS Voice settings
TTS_VOICE_SETTINGS = {
'stability': 0.5, # Voice consistency
'similarity_boost': 0.5, # Voice similarity to original
'style': 0.0, # Style exaggeration
'use_speaker_boost': True
}
# Performance settings
TTS_TIMEOUT = 8 # API timeout in seconds
MAX_AUDIO_SIZE_MB = 25 # Maximum audio file size
SAMPLE_RATE = 16000 # Audio sample rate# Business hours for slot generation
time_blocks = {
'morning': (9, 12), # 9 AM - 12 PM
'afternoon': (12, 17), # 12 PM - 5 PM
'evening': (17, 20), # 5 PM - 8 PM
'any': (9, 18) # 9 AM - 6 PM
}
# Timezone
default_timezone = pytz.timezone('Asia/Kolkata') # IST1. "Gemini API Key not found"
- Ensure
GEMINI_API_KEYis set in your.envfile - Verify the API key is valid and has quota remaining
2. "Google Calendar authentication failed"
- Delete
token.pickleand re-authenticate - Ensure
credentials.jsonis in the project root - Check that Calendar API is enabled in Google Cloud Console
3. "ElevenLabs API error"
- Verify
ELEVENLABS_API_KEYin.envfile - Check your ElevenLabs account quota
- Ensure the
VOICE_IDexists in your account
4. "Whisper model loading failed"
- Install missing audio dependencies (ffmpeg, portaudio)
- Try a different model size in
voice_config.py - Check available disk space (models are 40MB-3GB)
5. "No available slots found"
- Check if the requested date is in the past
- Verify business hours configuration
- Ensure your calendar has free time during requested periods
For Faster Voice Processing:
- Use
WHISPER_MODEL = 'tiny'(fastest, less accurate) - Reduce
TTS_TIMEOUTfor quicker failures - Enable
USE_MEMORY_PROCESSING = True
For Better Accuracy:
- Use
WHISPER_MODEL = 'base'or'small' - Increase
TTS_TIMEOUTfor more reliable API calls - Enable
ENABLE_AUDIO_PREPROCESSING = True
POST /chat
Content-Type: application/json
{
"message": "I need to schedule a 1-hour meeting tomorrow"
}Response:
{
"response": "Sure! What time tomorrow works best for you?",
"success": true
}POST /voice-chat
Content-Type: multipart/form-data
audio: <audio-file>Response:
{
"transcript": "I need to schedule a meeting",
"response": "How long should the meeting be?",
"audio_data": "hex-encoded-audio-data",
"success": true
}POST /resetResponse:
{
"success": true
}- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google AI Studio for Gemini API
- ElevenLabs for high-quality text-to-speech
- OpenAI for Whisper speech recognition
- Google Calendar API for scheduling integration
Need help? Open an issue on GitHub or check the troubleshooting section above.