Skip to content

meghaditya13/olist_agentic_chatbot

Repository files navigation

Olist Agentic Data Chatbot

An intelligent chatbot that answers computational queries across Olist's 9 relational CSV datasets using natural language. Built with ReAct agent architecture, DuckDB, and Gemini AI.

🚀 Features

  • Natural Language Queries: Ask questions in plain English about the Olist e-commerce data
  • Multi-table Analytics: Performs joins, filters, aggregations across all 9 CSV files
  • Interactive Charts: Generates visualizations (line, bar, histogram, pie, scatter) on request
  • Session Memory: Remembers context within the same chat session for follow-up queries
  • ReAct Agent Loop: Intelligent planning and execution with validation (max 4 steps)
  • Security Guardrails: SQL injection prevention and query validation
  • Modern Web UI: Beautiful, responsive chat interface with TailwindCSS

🏗️ Architecture

  • Orchestrator: ReAct-style agent loop (interpret → plan → validate → execute → verify → respond)
  • LLM Brain: Gemini Flash 2.5 for intent parsing and response generation
  • Tools: Schema, Query, Chart, and Memory tools with function calling
  • Data Layer: DuckDB for efficient relational operations on CSV files
  • Interface: FastAPI web application with real-time chat UI
  • Guardrails: Security validation, performance limits, and error handling

📋 Prerequisites

  • Python 3.8 or higher
  • Gemini API key

🛠️ Quick Start

1. Clone and Setup

# Install dependencies
pip install -r requirements.txt

2. Get Gemini API Key

  1. Visit Google AI Studio
  2. Create a free API key
  3. Create a .env file and add
set GEMINI_API_KEY=your-api-key-here

4. Start the Application

# Automated startup with checks
python start.py

# Or manual startup
python main.py

5. Open in Browser

Navigate to http://localhost:8000

📊 Dataset Overview

The Olist dataset contains 100k orders from 2016-2018 across Brazil with:

Tables & Relationships

orders ⟷ order_items ⟷ products ⟷ category_translation
   ↓         ↓              
customers   sellers
   ↓         ↓
geolocation ←→ geolocation

orders ⟷ payments
orders ⟷ reviews

Key Metrics Available

  • Revenue Analysis: Total sales, average order value, revenue by state/category
  • Customer Insights: Geographic distribution, purchase patterns, loyalty
  • Product Performance: Top categories, pricing analysis, inventory trends
  • Seller Analytics: Performance metrics, geographic coverage, ratings
  • Operational KPIs: Delivery times, payment methods, review scores

💬 Example Queries

Basic Analytics

  • "What's the total revenue for 2017?"
  • "How many customers are from São Paulo?"
  • "Show me the average delivery time by state"

Advanced Queries

  • "Which product categories have the highest profit margins?"
  • "Create a chart showing monthly revenue trends"
  • "Compare average review scores between different payment methods"
  • "Find the top 10 sellers by total sales volume"

Multi-table Analysis

  • "Show revenue by customer state with delivery performance"
  • "Analyze the relationship between product price and review scores"
  • "Which geographic regions have the fastest delivery times?"

🔒 Security Features

  • SQL Injection Prevention: Query validation and sanitization
  • Allowed Operations: Only SELECT and WITH statements permitted
  • Resource Limits: Row limits, query complexity scoring
  • Input Sanitization: Dangerous character removal
  • Rate Limiting: Configurable query limits per session

🧪 Logging

set the TEST_FLAG in .env file

set TEST_FLAG=true

⚙️ Configuration

Key settings in config.py:

  • MAX_QUERY_ROWS = 1000: Maximum rows returned
  • MAX_CHART_POINTS = 500: Chart data point limit
  • MAX_SESSION_HISTORY = 10: Queries remembered per session
  • SUPPORTED_CHART_TYPES: Available visualization types

Performance Tips

  • Use specific filters to reduce result size
  • Limit date ranges for time-based queries
  • Use LIMIT clause for large datasets
  • Group by categorical columns for aggregations

🙏 Acknowledgments

  • Olist: For providing the Brazilian e-commerce dataset
  • Google: For Gemini AI API
  • DuckDB: For fast analytical processing
  • Plotly: For interactive visualizations

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors