AI Agent Data Analysis – Mobile Sales Dataset

Project Overview

This project demonstrates how to build a lightweight AI-assisted data analysis pipeline using Python and the OpenAI API.

Instead of sending the full dataset to the AI model, the system generates a structured metadata summary (schema, statistics, missing values, etc.) and sends only that summary for analysis. This approach:

Reduces token usage
Improves performance
Minimizes unnecessary data exposure
Maintains scalability

The project simulates an AI-powered data analyst that reviews dataset structure and produces business insights.

Dataset

Source: Kaggle
Dataset: Synthetic Mobile Sales 2025

The dataset contains simulated mobile device sales data, including:

Product information
Sales quantities
Revenue
Transaction dates
Regions
Additional transactional attributes

Project Architecture

The workflow follows this structure:

Install dependencies
Download dataset from Kaggle
Load dataset into Pandas
Generate automated metadata summary
Send structured summary to OpenAI
Receive AI-generated business insights

Only summarized metadata is transmitted to the model.

Step 1: Environment Setup

Install required dependencies:

pip install kagglehub[pandas-datasets] openai pandas numpy

Import required libraries:

import os
import json
import pandas as pd
import numpy as np
import kagglehub
from getpass import getpass
from openai import OpenAI

Securely load your OpenAI API key:

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")
client = OpenAI()

Important: Never hard-code API keys into a public repository.

Step 2: Download Dataset from Kaggle

Programmatically download the dataset:

path = kagglehub.dataset_download("syedaeman2212/mobile-sales-data")

Step 3: Load Dataset

Load the CSV file into a Pandas DataFrame:

df = pd.read_csv(os.path.join(path, "synthetic_mobile_sales_2025.csv"))

Preview the dataset:

df.head()
df.info()

Step 4: Generate Structured Data Summary

Instead of sending raw records, build a structured metadata summary:

summary = {
    "shape": df.shape,
    "columns": df.columns.tolist(),
    "dtypes": df.dtypes.astype(str).to_dict(),
    "missing": df.isnull().sum().to_dict(),
    "numeric_summary": df.describe().to_dict()
}

Convert summary to JSON for API transmission:

summary_json = json.dumps(summary, indent=2)

This ensures:

Reduced token usage
Faster API responses
Improved security
Scalability

Step 5: Send Metadata to OpenAI

Construct a prompt that instructs the AI to analyze the dataset summary:

prompt = f"""
You are a senior data analyst. Based on the following dataset metadata, provide:

1. Key insights
2. Observed trends
3. Data quality concerns
4. Potential business recommendations

Dataset Summary:
{summary_json}
"""

Send request to the API:

response = client.responses.create(
    model="gpt-4.1-mini",
    input=prompt
)

analysis_output = response.output_text
print(analysis_output)

Step 6: AI-Generated Insights

The AI returns:

Business insights
Identified patterns
Anomaly detection suggestions
Data quality observations
Strategic recommendations

This simulates an AI-powered analytical review.

Design Rationale

Sending only metadata rather than full datasets:

Reduces API token costs
Prevents unnecessary data sharing
Improves performance
Maintains governance standards
Enables production scalability

This architecture is well-suited for enterprise AI analytics pipelines.

Potential Enhancements

The project can be expanded to include:

Automated visualizations (Matplotlib, Seaborn, Plotly)
Correlation matrix analysis
Outlier detection
Feature engineering
Machine learning model training
Automated report generation (PDF or HTML)
Interactive dashboard (Streamlit)
User-uploaded dataset support

Skills Demonstrated

Data ingestion and preprocessing
Automated exploratory data analysis
Structured metadata engineering
API integration with OpenAI
Efficient token management
AI-assisted analytics pipeline design
Reproducible Python workflows

Production Considerations

For real-world deployment:

Add environment variable management
Implement error handling
Add structured logging
Implement API rate limit handling
Containerize with Docker
Deploy via a cloud platform (AWS, Azure, GCP)

Use Case

This project demonstrates how to build an AI-assisted analytics system capable of reviewing datasets and generating actionable insights without exposing raw data.

It is suitable for:

AI-driven reporting systems
Internal analytics assistants
Enterprise data governance workflows
Portfolio demonstration of applied AI integration

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
AI_Agent_Data_Analysis.ipynb		AI_Agent_Data_Analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Agent Data Analysis – Mobile Sales Dataset

Project Overview

Dataset

Project Architecture

Step 1: Environment Setup

Step 2: Download Dataset from Kaggle

Step 3: Load Dataset

Step 4: Generate Structured Data Summary

Step 5: Send Metadata to OpenAI

Step 6: AI-Generated Insights

Design Rationale

Potential Enhancements

Skills Demonstrated

Production Considerations

Use Case

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Agent Data Analysis – Mobile Sales Dataset

Project Overview

Dataset

Project Architecture

Step 1: Environment Setup

Step 2: Download Dataset from Kaggle

Step 3: Load Dataset

Step 4: Generate Structured Data Summary

Step 5: Send Metadata to OpenAI

Step 6: AI-Generated Insights

Design Rationale

Potential Enhancements

Skills Demonstrated

Production Considerations

Use Case

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages