Skip to content

Latest commit

 

History

History
42 lines (29 loc) · 694 Bytes

File metadata and controls

42 lines (29 loc) · 694 Bytes

Syllabus

  1. Data Engineering Toolkits
  • Running Linux using Docker containers
  • Linux CLI command and bash scripts
  • Python basics
  1. Hadoop and MapReduce
  • Big Data Overview
  • HDFS
  • YARN
  • MapReduce
  1. MapReduce using MRJob 1

  2. MapReduce using MRJob

  3. Apache Hive 1

  • Databases for Big Data
  • HiveQL and Querying Data
  • Joins
  • UDF in Hive
  1. Apache Hive 2
  • Data Persistence In Hive
  • Managed Tables and External Tables
  • Partitions and Buckets
  • Storage Formats
  1. Apache Pig 1

  2. Apache Pig 2

  3. Apache Spark - Spark core

  4. Apache Spark - Spark SQL

  5. Apache Spark - Spark ML

  6. Amazon Elastic MapReduce

Project: Data Engineering Project