CSV Query Bot

A Discord bot that allows users to upload CSV files and query them using natural language. Built with FastAPI, DuckDB, Celery, and discord.py.

Features

Upload CSV files through Discord slash commands
Query data using natural language questions
Secure multi-tenant data isolation
Asynchronous file processing
Versioning support for datasets
Tabular results displayed in Discord

Architecture

FastAPI: Main backend API handling file uploads and queries
DuckDB: Per-user database for efficient CSV querying
Celery: Async task processing for file ingestion
PostgreSQL: Metadata storage (users, datasets, versions)
Redis: Task queue for Celery
discord.py: Discord bot interface using slash commands

Prerequisites

Docker and Docker Compose
A Discord bot token
Python 3.10+
pip (Python package installer)

Setup

Clone the repository:

git clone <repository-url>
cd wobby-new

Create a virtual environment and install dependencies:

python3 -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate
pip install -r requirements.txt

Create a .env file:

DISCORD_TOKEN=your_discord_bot_token_here
OPENAI_API_KEY=your_openai_api_key
LANGFUSE_PUBLIC_KEY=your_langfuse_public_key
LANGFUSE_SECRET_KEY=your_langfuse_secret_key

Build and start the services:

docker compose up --build

Deployment

The application is configured for deployment on Railway with GitHub Actions CI/CD pipeline.

Prerequisites for Deployment

Create a Railway account
Install the Railway CLI:

npm i -g @railway/cli

Login to Railway:

railway login

Set up GitHub repository secrets:

RAILWAY_TOKEN: Your Railway API token (get from Railway dashboard)
RAILWAY_SERVICE_NAME: The name of your Railway service

Deployment Process

Push your code to GitHub:

git add .
git commit -m "Your commit message"
git push origin main

The GitHub Actions workflow will automatically:
- Run tests
- Deploy to Railway if tests pass
- Set up all required services (API, Celery, Discord bot, PostgreSQL, Redis)
Configure environment variables in Railway:
- Go to your project settings in Railway
- Add the following variables:
  - DISCORD_TOKEN
  - OPENAI_API_KEY
  - LANGFUSE_PUBLIC_KEY
  - LANGFUSE_SECRET_KEY
  - LANGFUSE_HOST
Monitor deployment:
- Check GitHub Actions tab for deployment status
- View logs in Railway dashboard

Manual Deployment

You can also deploy manually using the Railway CLI:

railway up

Discord Commands

Upload a CSV

/dataset upload [dataset_id] <attach CSV file>

dataset_id is optional; if not provided, one will be generated
Attach your CSV file to the command

Query Data

/dataset query <dataset_id> <question>

dataset_id: The ID of your dataset
question: Your natural language question about the data

Development

Set up the development environment:

# Make the setup script executable
chmod +x dev-setup.sh
# Run the setup script
./dev-setup.sh

Run services locally:

# Terminal 1: FastAPI
source venv/bin/activate
uvicorn app.main:app --reload

# Terminal 2: Celery Worker
source venv/bin/activate
celery -A app.celery_app worker --loglevel=info

# Terminal 3: Discord Bot
source venv/bin/activate
python discord_bot/bot.py

Project Structure

.
├── app/
│   ├── routers/          # FastAPI route handlers
│   ├── schemas/          # Pydantic models
│   ├── db/              # Database connections
│   ├── tasks/           # Celery tasks
│   ├── main.py          # FastAPI entry point
│   └── celery_app.py    # Celery configuration
├── discord_bot/         # Discord bot code
├── data/               # Mounted volume for user data
└── tests/             # Test files

Data Storage

CSV files are stored in /data/<user_id>/
Each user gets their own DuckDB file at /data/<user_id>/db.duckdb
Metadata (versions, query logs) stored in PostgreSQL

Security

Each user's data is isolated in separate directories and DuckDB files
File access is controlled through user authentication
All Discord interactions are ephemeral (private to the user)

Testing

Run the test suite:

source venv/bin/activate
pytest

Limitations

Currently supports CSV files up to Discord's file size limit
Basic text-to-SQL conversion (can be enhanced with LLMs)
Single-node deployment (can be scaled with modifications)

Contributing

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

License

MIT License - See LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
app		app
discord_bot		discord_bot
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.celery		Dockerfile.celery
Dockerfile.discord		Dockerfile.discord
Dockerfile.railway		Dockerfile.railway
RAILWAY.md		RAILWAY.md
README.md		README.md
curl.txt		curl.txt
dev-setup.sh		dev-setup.sh
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.railway.yml		docker-compose.railway.yml
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
entrypoint.supervisord.sh		entrypoint.supervisord.sh
railway.toml		railway.toml
requirements.txt		requirements.txt
supervisord.conf		supervisord.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSV Query Bot

Features

Architecture

Prerequisites

Setup

Deployment

Prerequisites for Deployment

Deployment Process

Manual Deployment

Discord Commands

Upload a CSV

Query Data

Development

Project Structure

Data Storage

Security

Testing

Limitations

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CSV Query Bot

Features

Architecture

Prerequisites

Setup

Deployment

Prerequisites for Deployment

Deployment Process

Manual Deployment

Discord Commands

Upload a CSV

Query Data

Development

Project Structure

Data Storage

Security

Testing

Limitations

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages