A Discord bot that allows users to upload CSV files and query them using natural language. Built with FastAPI, DuckDB, Celery, and discord.py.
- Upload CSV files through Discord slash commands
- Query data using natural language questions
- Secure multi-tenant data isolation
- Asynchronous file processing
- Versioning support for datasets
- Tabular results displayed in Discord
- FastAPI: Main backend API handling file uploads and queries
- DuckDB: Per-user database for efficient CSV querying
- Celery: Async task processing for file ingestion
- PostgreSQL: Metadata storage (users, datasets, versions)
- Redis: Task queue for Celery
- discord.py: Discord bot interface using slash commands
- Docker and Docker Compose
- A Discord bot token
- Python 3.10+
- pip (Python package installer)
- Clone the repository:
git clone <repository-url>
cd wobby-new- Create a virtual environment and install dependencies:
python3 -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
pip install -r requirements.txt- Create a
.envfile:
DISCORD_TOKEN=your_discord_bot_token_here
OPENAI_API_KEY=your_openai_api_key
LANGFUSE_PUBLIC_KEY=your_langfuse_public_key
LANGFUSE_SECRET_KEY=your_langfuse_secret_key- Build and start the services:
docker compose up --buildThe application is configured for deployment on Railway with GitHub Actions CI/CD pipeline.
- Create a Railway account
- Install the Railway CLI:
npm i -g @railway/cli- Login to Railway:
railway login- Set up GitHub repository secrets:
RAILWAY_TOKEN: Your Railway API token (get from Railway dashboard)RAILWAY_SERVICE_NAME: The name of your Railway service
- Push your code to GitHub:
git add .
git commit -m "Your commit message"
git push origin main-
The GitHub Actions workflow will automatically:
- Run tests
- Deploy to Railway if tests pass
- Set up all required services (API, Celery, Discord bot, PostgreSQL, Redis)
-
Configure environment variables in Railway:
- Go to your project settings in Railway
- Add the following variables:
DISCORD_TOKENOPENAI_API_KEYLANGFUSE_PUBLIC_KEYLANGFUSE_SECRET_KEYLANGFUSE_HOST
-
Monitor deployment:
- Check GitHub Actions tab for deployment status
- View logs in Railway dashboard
You can also deploy manually using the Railway CLI:
railway up/dataset upload [dataset_id] <attach CSV file>
dataset_idis optional; if not provided, one will be generated- Attach your CSV file to the command
/dataset query <dataset_id> <question>
dataset_id: The ID of your datasetquestion: Your natural language question about the data
- Set up the development environment:
# Make the setup script executable
chmod +x dev-setup.sh
# Run the setup script
./dev-setup.sh- Run services locally:
# Terminal 1: FastAPI
source venv/bin/activate
uvicorn app.main:app --reload
# Terminal 2: Celery Worker
source venv/bin/activate
celery -A app.celery_app worker --loglevel=info
# Terminal 3: Discord Bot
source venv/bin/activate
python discord_bot/bot.py.
├── app/
│ ├── routers/ # FastAPI route handlers
│ ├── schemas/ # Pydantic models
│ ├── db/ # Database connections
│ ├── tasks/ # Celery tasks
│ ├── main.py # FastAPI entry point
│ └── celery_app.py # Celery configuration
├── discord_bot/ # Discord bot code
├── data/ # Mounted volume for user data
└── tests/ # Test files
- CSV files are stored in
/data/<user_id>/ - Each user gets their own DuckDB file at
/data/<user_id>/db.duckdb - Metadata (versions, query logs) stored in PostgreSQL
- Each user's data is isolated in separate directories and DuckDB files
- File access is controlled through user authentication
- All Discord interactions are ephemeral (private to the user)
Run the test suite:
source venv/bin/activate
pytest- Currently supports CSV files up to Discord's file size limit
- Basic text-to-SQL conversion (can be enhanced with LLMs)
- Single-node deployment (can be scaled with modifications)
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
MIT License - See LICENSE file for details