GitHub - Atharva-Bodhankar/GCP: ETL Pipeline using Google Cloud Platform

End-to-End Data Pipeline in Google Cloud Platform

This project demonstrates the creation of an end-to-end data pipeline in Google Cloud Platform (GCP). The pipeline involves uploading a CSV file to Cloud Storage, performing transformations using Dataflow, and loading the transformed data into BigQuery. Finally, we validate the data by executing queries in BigQuery.

Overview: The goal of this project is to showcase the seamless integration of various GCP services in building a robust data pipeline. It covers steps from data ingestion to transformation, and ultimately, data validation.

Project Setup

Cloud Services Configuration - Use Cloud Shell to configure essential services like Cloud Storage, Virtual Environment, Apache Beam, and BigQuery.

Enable Cloud Components - Execute the necessary commands to enable all the required cloud components for the project**.

Usage**

Data Ingestion - Upload the source CSV file to Google Cloud Storage bucket.

Data Transformation - Execute the provided Python script to transform the data using Apache Beam/Dataflow.

Data Loading - Load the transformed data into BigQuery.

Components:

The project leverages the following GCP components:

Google Cloud Storage Apache Beam (Dataflow) BigQuery Sample Code Find the Python script for data transformation in the data_transformation/ directory. The script utilizes Apache Beam to perform the necessary transformations.

Validation To validate the data loading, execute SQL queries in BigQuery to ensure that the transformed data is accurately loaded.

Additional Resources

For further information and in-depth tutorials on GCP components, refer to the following resources:

Google Cloud Documentation

Google Cloud Blog

Google Cloud YouTube Channel

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
commands.md		commands.md
data_ingestion.py		data_ingestion.py
data_transformation.py		data_transformation.py
usa_names.csv		usa_names.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Atharva-Bodhankar/GCP

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages