Skip to content

yashkk07/LLM-driven-data-analysis-executor

Repository files navigation

LLM-Powered Data Analyst

Upload a CSV, describe the analysis you want, and let the LLM generate and execute Python code (pandas + matplotlib) to answer your question.

Features

  • Upload any CSV at runtime (no hardcoded data.csv).
  • LLM-generated analysis code with safety checks before execution.
  • Preview data, see generated code, view resulting table and chart.

Prerequisites

  • Python 3.10+
  • A Groq API key stored in .env as GROQ_API_KEY=<your_key>.

Setup

  1. Create and activate a virtual environment:
    python -m venv venv
    . venv/Scripts/activate    # Windows
    source venv/bin/activate   # macOS/Linux
  2. Install dependencies:
    pip install -r requirements.txt
  3. Create .env with your Groq key:
    echo GROQ_API_KEY=your_key_here > .env

Run

streamlit run app.py

Then upload a CSV and enter your analysis prompt (e.g., "Show total sales per region as a bar chart").

Live Demo

How it Works

  • app.py: Streamlit UI for upload, preview, LLM invocation, and result display.
  • prompt.py: Builds the LLM prompt using the uploaded file path and schema.
  • llm.py: Calls Groq to generate analysis code.
  • executor.py: Validates and executes generated code with restricted imports.
  • schema.py: Reads column names from the uploaded CSV to inform the prompt.

Notes

  • Only pandas and matplotlib are allowed in generated code; writing files is limited to output.png.
  • Uploaded CSV is saved to a temporary path for the duration of the Streamlit session.

About

Built a system where natural-language queries are converted into executable Python code, run safely on structured datasets, and returned as visualizations and dataframes. Implemented runtime execution, error recovery, and conditional plotting for automated exploratory data analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages