Interact with your CSV data just like you're chatting with a data analyst!
This project lets you upload any CSV file and ask questions. Behind the scenes, it uses Gemini (Google's LLM) with LangChain agents to understand your query, reason through it, run Python code, and give the result β instantly!
-
At its core, a Large Language Model like Gemini processes a prompt β which is a mix of instructions, examples, and the actual input β and generates a meaningful response. As shown below:
-
Prompt = Instruction + Test Input: The prompt can contain a task description (e.g., "Summarize the data"), a few examples, and your actual input β such as a question about the CSV.
-
LLM Reasoning: The model uses its pre-trained knowledge and the provided prompt to predict what comes next β effectively "thinking" about how to answer.
-
Generated Output: The output might be plain text (a summary or explanation) or executable code (as in this project). The code is then executed to retrieve the final answer.
This mechanism allows the LLM to understand your intent, dynamically generate logic (like Python code), and provide relevant, accurate results from your data.
| Component | Description |
|---|---|
| π§ Google Gemini (gemini-1.5-flash) | The Large Language Model that understands and answers your questions |
| π LangChain CSV Agent | Translates your questions into Python code and runs it |
| π Pandas (indirect) | Used internally to analyze CSV files |
| π Streamlit | Frontend to upload files and interact with the model |
| π dotenv | Securely loads your API key from .env |
- Upload a CSV file through the UI.
- Ask a natural language question (e.g. "What is the mean area of the malignant tumors?")
- LangChain wraps the Gemini model with a Python code executor.
- Gemini thinks in steps: it figures out what to calculate, writes code, runs it, and returns the answer.
- The answer is shown on your screen with an optional explanation.
It goes through:
- π§ Thought
- βοΈ Action (Python code)
- π₯ Input (code to run)
- π Observation (what the code returns)
- β Final Answer
Hereβs an actual screenshot showing the thought process of Gemini through LangChain:
The user sees a simple Streamlit UI where they can upload a CSV and ask questions interactively:
Streamlit auto-refreshes results when the question is submitted. Great for rapid exploration.
Recording.2025-06-21.003920.1.mp4
Follow these steps to get the app running locally:
git clone https://github.com/AsmitaMishra24/Chat_with_CSV_Using_Gemini.git
cd Chat_with_CSV_Using_Geminipython -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activatepip install -r requirements.txtCreate a .env file by copying the example:
cp .env.example .envThen paste your actual Google API Key in .env:
GOOGLE_API_KEY=your_google_api_key_hereπ Get your key from Google AI Studio
streamlit run app.pyThen go to: http://localhost:8501
# .env.example
GOOGLE_API_KEY=your_google_api_key_here-
What is the total profit by category?
-
Which month had the highest revenue?
-
Whatβs the average number of orders per region?
-
Show top 5 performing products.
Want to improve the project? Fix bugs? Add features?
-
Fork the repo
-
Create a new branch (git checkout -b feature-name)
-
Commit changes (git commit -am 'Add feature')
-
Push and open a PR
Made by Asmita Mishra
πββοΈ Have a suggestion or found a bug?
- Feel free to raise an issue in this repository.
π© Want to connect or collaborate?
- Feel free to reach out via LinkedIn





