42 lines (29 loc) · 694 Bytes

Syllabus

Data Engineering Toolkits

Running Linux using Docker containers
Linux CLI command and bash scripts
Python basics

Hadoop and MapReduce

Big Data Overview
HDFS
YARN
MapReduce

MapReduce using MRJob 1
MapReduce using MRJob
Apache Hive 1

Databases for Big Data
HiveQL and Querying Data
Joins
UDF in Hive

Apache Hive 2

Data Persistence In Hive
Managed Tables and External Tables
Partitions and Buckets
Storage Formats

Apache Pig 1
Apache Pig 2
Apache Spark - Spark core
Apache Spark - Spark SQL
Apache Spark - Spark ML
Amazon Elastic MapReduce

Project: Data Engineering Project