An intelligent robotic bartender system that combines advanced computer vision, natural language processing, and precise robotic manipulation to create cocktails through voice commands. The system integrates real-time speech recognition, AI-powered conversation, color-based bottle detection, and robotic arm control to deliver a seamless bartending experience.
The Conversational Robo Bartender is a sophisticated ROS2-based system that enables users to order cocktails through natural conversation. The system uses computer vision to detect colored bottles, processes voice commands through AI, and executes precise robotic movements to create drinks.
- Real-time Voice Interaction: Natural conversation using ElevenLabs STT/TTS and Groq LLM
- Intelligent Bottle Detection: YOLO-based computer vision for multi-color bottle recognition
- Precision Robotics: xArm robotic arm control for accurate bottle manipulation and pouring
- Smart Recipe Management: Cocktail recipes mapped to bottle color combinations
- Multi-Modal Integration: Seamless coordination between speech, vision, and motion systems
The system consists of three main integrated nodes that communicate via ROS2 topics:
┌──────────────────────────────────────────────────────────────┐
│ ROS2 Communication Layer │
├──────────────────┬─────────────────┬─────────────────────────┤
│ Conversational │ Computer │ Robotic Arm │
│ TTS Node │ Vision Node │ Node │
│ │ │ │
│ • STT/TTS │ • Bottle │ • xArm Control │
│ • AI Conversation│ Detection │ • Motion Planning │
│ • Recipe Logic │ • Color ID │ • Pour Operations │
│ • Order Status │ • Multi-bottle │ • Safety Management │
└──────────────────┴─────────────────┴─────────────────────────┘
- User speaks → STT converts to text
- AI processes request and identifies cocktail
- Vision system detects available bottles
- Robot arm executes picking, pouring, and serving sequence
- TTS confirms completion to user
📚 Detailed Documentation: For comprehensive setup and usage instructions for individual components, see:
Primary Functions:
- Real-time speech-to-text using ElevenLabs WebSocket API
- AI-powered conversation with Groq LLM (Moonshot Kimi model)
- Text-to-speech synthesis for natural responses
- Cocktail recipe management and order processing
Key Technologies:
- ElevenLabs Realtime STT with silence detection
- Groq API for conversational AI
- WebSocket-based audio streaming
- ROS2 message coordination
Cocktail Recipes:
COCKTAILS = {
'martini': ['red', 'blue'],
'mojito': ['green', 'blue'],
'bloody mary': ['red'],
'blue lagoon': ['blue'],
'cosmopolitan': ['red', 'green']
}Current LLM System Prompt: The AI bartender uses the following system prompt to ensure accurate, context-aware responses:
You are a friendly robot bartender.
Your camera detects bottle colors in real time. The currently visible
bottle colors are provided in THIS LIST - {{current_bottle}}, which is always a Python-style list.
There are only three valid bottle colors: red, green, blue.
NEVER hallucinate bottles. ONLY trust {{current_bottle}}.
Cocktail mapping (each cocktail requires ALL listed colors to be visible):
If you see red and blue in {{current_bottle}}, tell martini is available
If you see green and blue in {{current_bottle}}, tell mojito is available
If you see red in {{current_bottle}}, tell bloody mary is available
If you see blue in {{current_bottle}}, tell blue lagoon is available
If you see red and green in {{current_bottle}}, tell cosmopolitan is available
Keep your replies within 30 words.
This prompt ensures the AI only recommends cocktails based on actually detected bottles and maintains concise, natural interactions.
Primary Functions:
- Real-time bottle detection using YOLO11
- HSV-based color classification (red, green, blue)
- Multi-bottle tracking and validation
- Integration with order requirements
Detection Capabilities:
- Multiple bottle types (classes: 39, 41, 40, 45, 75)
- Robust color classification with HSV thresholding
- Real-time video processing with OpenCV
- Confidence-based filtering
Color Ranges (HSV):
- Red: (0-10, 170-180) hue with 50+ saturation
- Green: (30-90) hue with 40+ saturation
- Blue: (80-140) hue with 40+ saturation
Primary Functions:
- Precise xArm robotic arm control
- Multi-step bottle manipulation sequences
- Safety management and error handling
- Coordinated pick-pour-return operations
Workspace Configuration:
BOTTLES = {
"green": {"pos": (418.9, 30.8, 153), "rpy": (90.7, -68.2, 88.2)},
"red": {"pos": (380, -140.2, 153), "rpy": (90.7, -68.2, 88.2)},
"blue": {"pos": (225.5, -140.2, 153), "rpy": (90.7, -68.2, 88.2)}
}
CUP_POS = (240.4, 19.1, 153)Movement Sequence:
- Approach bottle at safe height (220mm)
- Pick bottle with gripper engagement
- Transport to cup position with tilt control
- Pour with precise angle adjustment
- Return bottle to original position
- Operating System: Windows/Linux with ROS2 Humble
- Python: 3.8+
- Hardware:
- USB Camera for bottle detection
- Microphone for voice input
- Speakers for audio output
- xArm robotic arm (IP: 192.168.1.152)
# Clone the repository
git clone https://github.com/Vikranth3140/Robo-Bartender.git
cd Robo-Bartender
# Install dependencies
pip install -r requirements.txt
# Source ROS2 environment
source /opt/ros/humble/setup.bashCreate environment variables or update the configuration in conversational_tts_node.py:
ELEVEN_API_KEY = "your_elevenlabs_api_key"
GROQ_API_KEY = "your_groq_api_key"
TTS_VOICE_ID = "your_elevenlabs_voice_id"Camera Setup:
- Connect USB camera as device 0
- Ensure proper lighting for color detection
- Test bottle visibility in camera frame
Robot Arm Setup:
- Connect xArm to network (IP: 192.168.1.152)
- Calibrate workspace positions
- Verify safety boundaries
# Setup ROS2 workspace and package
cd ~/ros2_ws
ros2 pkg create --build-type ament_python robo-bartender
colcon build --symlink-install
source install/setup.bash
# Terminal 1: Launch Conversational Node
ros2 run robo-bartender conversational_tts_node
# Terminal 2: Launch Computer Vision Node
ros2 run robo-bartender cv_bottle_detector_node
# Terminal 3: Launch Robotic Arm Node
ros2 run robo-bartender xarm_node- Vision System: Verify bottle detection in OpenCV window
- Audio System: Test microphone input and speaker output
- Robot Arm: Confirm connection and home position
- Integration: Place bottles and attempt voice order
| Topic | Publisher | Subscriber | Message Type | Purpose |
|---|---|---|---|---|
/bottle_detection_result |
CV Node | TTS Node | String | Detected bottle colors |
/bottle_detection_stream |
CV Node | TTS Node | String | Real-time detection stream |
/requested_bottle_color |
TTS Node | CV/Arm Nodes | String | Required colors for cocktail |
/order_status |
Arm Node | TTS Node | String | Order completion status |
Bottle Detection:
"visible: red blue" # Multiple bottles detected
"detected: red blue" # Order-specific detection
Order Status:
"done" # Order completed successfully
- "I'd like a martini please" → Requests red and blue bottles
- "Can I get a mojito?" → Requests green and blue bottles
- "What drinks can you make?" → Lists available cocktails based on visible bottles
- "Make me a blue lagoon" → Requests blue bottle only
The AI bartender provides contextual responses based on:
- Currently visible bottles detected by camera
- Available cocktail recipes
- Order processing status
- Error conditions or missing ingredients
STT_SAMPLE_RATE = 16000 # Speech recognition sample rate
TTS_OUTPUT_SR = 22050 # Text-to-speech output rate
STT_CHUNK_SEC = 0.5 # Audio processing chunk size
SILENCE_DURATION_SEC = 2.0 # Silence detection thresholdconfidence_threshold = 0.1 # YOLO detection confidence
color_threshold = 0.02 # HSV color matching thresholdZ_APPROACH = 220 # Safe approach height (mm)
Z_PICK = 150 # Bottle picking height (mm)
Z_POUR = 180 # Pouring height (mm)
speed = 50 # Movement speed (mm/s)Audio Problems:
- Check microphone permissions and device selection
- Verify ElevenLabs API key and quota
- Test audio devices with
sounddevice
Vision Issues:
- Ensure adequate lighting for color detection
- Calibrate HSV color ranges for your bottles
- Check camera device index (usually 0)
Robot Communication:
- Verify xArm IP address and network connectivity
- Check arm initialization and safety status
- Calibrate workspace coordinates
ROS2 Integration:
- Confirm all nodes are running with
ros2 node list - Monitor topics with
ros2 topic echo <topic_name> - Check for error messages in node logs
This project is developed for research and educational purposes under MIT License.