Skip to content

Data Modeling, Cloud Data Warehouses, Spark, Data Pipelines/ Apache Cassandra, PostgreSQL, AWS-Redshift, EC2, S3, EMR, ETL, PySpark, SparkSQL, DataFrame, Apache Airflow

Notifications You must be signed in to change notification settings

JuliaHong21/Data_Engineering_Udacity

Repository files navigation

Data Engineering

This is the repo for [Udacity's Data Engineering NanoDegree Program]

https://d20vrrgs8k4bvw.cloudfront.net/documents/en-US/Data+Engineering+Nanodegree+Program+Syllabus.pdf

Data Modeling

Learn to create relational and NoSQL data models to fit the diverse needs of data consumers. Use ETL to build databases in PostgreSQL and Apache Cassandra.

Cloud Data Warehouses

Sharpen your data warehousing skills and deepen your understanding of data infrastructure. Create cloud-based data warehouses on Amazon Web Services (Redshift)

Spark and Data Lakes

Understand the big data ecosystem and how to use Spark to work with massive datasets. Store big data in a data lake and query it with Spark. (Pyspark, SparkSQL, DataFrame, AWS S3, EC2, EMR)

Data Pipelines with Airflow

Schedule, automate, and monitor data pipelines using Apache Airflow. Run data quality checks, track data lineage, and work with data pipelines in production.

Capstone Project

Combine what you've learned throughout the program to build your own data engineering portfolio project.

About

Data Modeling, Cloud Data Warehouses, Spark, Data Pipelines/ Apache Cassandra, PostgreSQL, AWS-Redshift, EC2, S3, EMR, ETL, PySpark, SparkSQL, DataFrame, Apache Airflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published