- Data Engineering Toolkits
- Running Linux using Docker containers
- Linux CLI command and bash scripts
- Python basics
- Hadoop and MapReduce
- Big Data Overview
- HDFS
- YARN
- MapReduce
-
MapReduce using MRJob 1
-
MapReduce using MRJob
-
Apache Hive 1
- Databases for Big Data
- HiveQL and Querying Data
- Joins
- UDF in Hive
- Apache Hive 2
- Data Persistence In Hive
- Managed Tables and External Tables
- Partitions and Buckets
- Storage Formats
-
Apache Pig 1
-
Apache Pig 2
-
Apache Spark - Spark core
-
Apache Spark - Spark SQL
-
Apache Spark - Spark ML
-
Amazon Elastic MapReduce
Project: Data Engineering Project