A retail store needs to analyze its daily transactions and track customer behavior across various locations. This includes analyzing purchases and returns across multiple product categories. The goal is to derive business insights that can help improve customer understanding, optimize product offerings, and enhance store operations.
To perform a comprehensive data analysis by:
- Merging and transforming multiple datasets
- Generating descriptive statistics and visualizations
- Extracting key business metrics and actionable insights
Customers.csv: Contains customer demographics and city codesProduct_cat_info.csv: Contains information about product categories and subcategoriesTransactions.csv: Contains detailed transaction data including date, store type, and transaction amount
- Merged
Customers,Product_cat_info, andTransactionsinto a single datasetCustomer_Final - Used inner join to include only customers with valid transactions
- a. Columns Overview: Listed column names and data types
- b. Data Preview: Displayed top 10 and bottom 10 rows
- c. Five-number Summary: Calculated min, Q1, median, Q3, and max for all continuous variables
- d. Frequency Tables: Generated frequency distributions for all categorical variables
- Histograms for continuous variables
- Time Period: Identified the start and end dates of available transaction data
- Negative Transactions: Counted transactions with negative total amounts
- Compared product category popularity between female and male customers
- Found the city code with the maximum number of customers
- Calculated the percentage share of customers from that city
- Determined the store type that sold the most products by:
- Total transaction value
- Total quantity sold
- Calculated total revenue from Electronics and Clothing categories in Flagship Stores
- Computed total amount spent by male customers in the Electronics category
- Identified customers with more than 10 unique transactions
- Filtered out all transactions with negative amounts
- a. Category Spending: Total spent on Electronics and Books
- b. Time-Range Spending: Total spent between 1st Jan 2014 and 1st Mar 2014
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Jupyter Notebook
- CSV Files for input data
- Identified high-performing store types and product categories
- Derived customer preferences across gender and age groups
- Located regions with high customer concentration
- Highlighted opportunities to reduce return rates and increase transaction value
Retail_Transaction_Analysis/
├── data/
│ ├── Customers.csv
│ ├── Product_cat_info.csv
│ └── Transactions.csv
├── notebooks/
│ └── Retail_Store_Analysis.ipynb
└── README.md
- Apply machine learning for customer segmentation and lifetime value prediction
- Build recommendation systems for cross-sell and up-sell strategies
- Perform real-time analytics using streaming data solutions
Thanks to the data science and open-source communities for enabling powerful data analysis using Python and its libraries.