The primary goal of this project was to establish a reliable data pipeline using Azure services, GCP, and PowerBI for efficient data processing and visualization. The key components include:
Storage accounts, Data Factory, and containers were configured in the Azure Portal to create a solid foundation for the data integration pipeline.
A custom API was crafted on GCP to securely transmit tokens to the Azure Data Factory, ensuring a secure and smooth flow of data from Google Sheets to Azure storage.
The heart of the project lies in the Extract, Transform, Load (ETL) process. Columns were accurately mapped, and parameters were configured for precise data ingestion. Special attention was given to detail, laying the groundwork for a successful transformation.
An interesting aspect of this project was the decision to store the data output in Parquet files. Parquet was chosen for its flexibility and efficiency, contributing to enhanced future data processing endeavors.
The Data Factory's Pipeline section provided a visual representation of the ETL process in action. Real-time views of progress and performance were accessible, adding transparency to the data transformation journey.
The final phase involved integrating the transformed data with PowerBI. Tables were loaded from the transformed data, and a finance data dashboard was designed to provide insightful visualizations
