WhatsApp Assistant

A WhatsApp bot that uses AI to help manage conversations by transcribing voice messages using OpenAI's Whisper, completing conversations using OpenAI's LLM models, and generating voice responses using ElevenLabs.

Features

🎙️ Voice message transcription using OpenAI's Whisper
💬 AI-powered conversation completion
🔊 Text-to-speech responses using ElevenLabs
🗣️ Voice cloning and management capabilities
🤖 Telegram bot integration for message management
🔄 Audio format conversion between MP3, OGG, and Opus

Prerequisites

Python 3.12+
ffmpeg installed on your system
Valid API keys for:
- OpenAI
- ElevenLabs
- Telegram Bot
- LogFire (optional, for logging)

Installation

Clone the repository:

git clone https://github.com/BonifacioCalindoro/whatsapp-AI-assistant.git
cd whatsapp-AI-assistant

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate

Install the required packages:

pip install -r requirements.txt

Install ffmpeg:

Ubuntu/Debian: sudo apt-get install ffmpeg libmp3lame0
macOS: brew install ffmpeg
Windows: Download from the official ffmpeg website

Copy the example environment file and fill in your credentials:

cp .env.example .env

Configuration

Edit the .env file with your API keys and settings:

LOGIFRE_TOKEN=your_logfire_token
TELEGRAM_BOT_TOKEN=your_telegram_bot_token
TELEGRAM_CHAT_ID=your_telegram_chat_id (create a group, add the bot and get the chat id with the /chatid command)
OPENAI_API_KEY=your_openai_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=your_elevenlabs_voice_id
OPENAI_MODEL=your_openai_model (message format is adapted to o1-preview, if you want to use a non-o model, the first message role should be "system")
MY_PHONE_NUMBER=your_phone_number (with the country code (but no +))

Usage

The application consists of three main components that need to be running (you can use the "screen" package to run some of them in the background):

Start the API server:

python api.py

Start the WhatsApp client (needs a screen session to run(not the screen package)!!):

python whatsapp.py

and scan the QR code

Start the Telegram bot:

python bot.py

How it Works

When a WhatsApp message is received, it's processed by the WhatsApp client
Voice messages are automatically transcribed using OpenAI's Whisper API
Messages are forwarded to a Telegram bot for management
Users can choose to:
- Complete the conversation using AI
- Send text responses
- Generate and send voice responses using ElevenLabs
- Clone voices from audio samples
- Manage voice settings and profiles

Voice Management Features

The assistant includes comprehensive voice management capabilities:

Clone voices from audio samples
Edit voice settings and profiles
List available voices
Delete voices
Customize voice parameters

Project Structure

api.py: FastAPI server handling message processing, AI completions, and voice management
whatsapp.py: WhatsApp client integration with message handling
bot.py: Telegram bot for message management and voice control
utils.py: Utility functions for audio processing, transcription, and voice synthesis

Key functions:

Audio conversion between formats (MP3, OGG, Opus)
Voice transcription with OpenAI Whisper
Text-to-speech with ElevenLabs
Voice cloning and management
Conversation management and completion

Telegram Bot Commands

The Telegram bot provides several commands for managing the assistant:

/start - Dummy command
/clone - Clone a voice from samples
/voices - List available voices
/chatid - Get your Telegram chat ID
/setvoiceid - Set the voice id you want to use for audio responses
/editvoicesettings - Edit the settings for the voice model
/deletevoice - Exactly what you think it does
Voice editing and management commands

Limitations

The whatsapp implementation depends on future versions of the WhatsApp Web client, so it might stop working if WhatsApp changes their web client.
The Elevenlabs API and the OpenAI API are not free, so take that into account.
I haven't tested with long conversations, so i still don't know how well it will work with long conversations.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

WhatsApp Assistant

Features

Prerequisites

Installation

Configuration

Usage

How it Works

Voice Management Features

Project Structure

Telegram Bot Commands

Limitations

Contributing

License

Acknowledgments

Funding

☕ Sponsor Me

About

Uh oh!

Sponsor this project

Uh oh!

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github		.github
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api.py		api.py
bot.py		bot.py
requirements.txt		requirements.txt
utils.py		utils.py
whatsapp.py		whatsapp.py

Uh oh!

License

BonifacioCalindoro/whatsapp-AI-assistant

Folders and files

Latest commit

History

Repository files navigation

WhatsApp Assistant

Features

Prerequisites

Installation

Configuration

Usage

How it Works

Voice Management Features

Project Structure

Telegram Bot Commands

Limitations

Contributing

License

Acknowledgments

Funding

☕ Sponsor Me

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Uh oh!

Languages