Skip to content

Atharva-Bodhankar/Azure-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Azure Data Engineering Project

image

Project Overview

The primary goal of this project was to establish a reliable data pipeline using Azure services, GCP, and PowerBI for efficient data processing and visualization. The key components include:

Azure Portal Setup:

Storage accounts, Data Factory, and containers were configured in the Azure Portal to create a solid foundation for the data integration pipeline.

Google Cloud Platform Integration:

A custom API was crafted on GCP to securely transmit tokens to the Azure Data Factory, ensuring a secure and smooth flow of data from Google Sheets to Azure storage.

ETL Process:

The heart of the project lies in the Extract, Transform, Load (ETL) process. Columns were accurately mapped, and parameters were configured for precise data ingestion. Special attention was given to detail, laying the groundwork for a successful transformation.

Data Output Choice:

An interesting aspect of this project was the decision to store the data output in Parquet files. Parquet was chosen for its flexibility and efficiency, contributing to enhanced future data processing endeavors.

ETL Process Visualization:

The Data Factory's Pipeline section provided a visual representation of the ETL process in action. Real-time views of progress and performance were accessible, adding transparency to the data transformation journey.

PowerBI Integration:

The final phase involved integrating the transformed data with PowerBI. Tables were loaded from the transformed data, and a finance data dashboard was designed to provide insightful visualizations

About

Azure ETL Pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published