This repository contains a Python project developed as part of the Kiwilytics Data Engineering Course.
The project demonstrates practical data analysis and manipulation techniques using Python and Pandas.
The goal of this project is to perform essential data engineering tasks, including:
- Handling missing values
- Calculating total prices and revenues
- Grouping and aggregating data
- Identifying insights such as top-selling products and highest-spending customers
All tasks were implemented in a Jupyter Notebook to demonstrate the workflow clearly.
-
Data Cleaning and Handling Missing Values
- Filled missing
unit_pricevalues using the average price per product.
- Filled missing
-
Feature Engineering
- Created a new column
total_priceby multiplyingunit_priceandquantity.
- Created a new column
-
Revenue Calculation
- Calculated total revenue across all orders.
-
Data Analysis and Insights
- Identified which product has the highest total quantity sold.
- Determined which customer has the highest total spending.
- Python 3
- Pandas
- Jupyter Notebook
- Data cleaning and transformation
- Use of
groupby(),fillna(), andtransform()in Pandas - Logical problem solving for real-world data engineering tasks
The dataset used in this project is stored in the /data folder:
data/kiwilytics_orders.csv
📝 The dataset was provided by the course instructors for educational purposes and does not contain any real or sensitive information.
| File/Folder | Description |
|---|---|
Kiwilytics-Python-Project.ipynb |
Main Jupyter Notebook containing all Python code, data cleaning, and analysis |
data/kiwilytics_orders.csv |
Dataset used for the analysis (sample data provided by the course) |
README.md |
Project documentation, objectives, and instructions |
- Clone or download this repository.
- Open the Jupyter Notebook file: Kiwilytics-Python-Project.ipynb
- Make sure the CSV file exists in the
/datafolder: data/kiwilytics_orders.csv - Run all notebook cells sequentially to reproduce the results.