The primary objective of this project is to demonstrate the application of various Azure data engineering services to manage, process and analyze car sales data
The goal is to showcase the efficient handling of large datasets, perform complex transformations and derive meaningful business insights
About the Dataset:
The dataset comprises of 3,00,000+ rows and 16 columns
Columns include
1.year 2.make 3.model 4.trim 5.body 6.transmission 7.vin 8.state 9.condition 10.odometer 11.color 12.interior 13.seller 14.mmr 15.selling price 16.saledate
Overview:
We will be doing this Project in Six Major Steps
1)Data Cleaning and Transformations:
Extract, transform, and load (ETL) car sales data into a structured format using Databricks and Azure Data Factory
2)Data Modelling:
Create a robust data model by designing and implementing fact and dimension tables to facilitate efficient data querying and reporting
3)Data Storage and Accesibility:
Convert and store the processed data in Delta tables and Parquet files for optimized storage and accessibility.
4)Data Loading:
Load the processed data into an Azure SQL Database to enable advanced data analysis and integration with other business intelligence tools
5)Data Handling:
Implement dynamic data paths and pipeline variables in Azure Data Factory to handle data flexibly and efficiently
6)Data Insights:
Generate actionable insights from the transformed data to support business decisions and strategy formulation.