This directory contains all of the chapter codes for "Data Algorithms with Spark".
- Chapter 01: Introduction to Data Algorithms
- Chapter 02: Transformations in Action
- Chapter 03: Mapper Transformations
- Chapter 04: Reductions in Spark
- Chapter 05: Partitioning Data
- Chapter 06: Graph Algorithms
- Chapter 07: Interacting with External Data Sources
- Chapter 08: Ranking Algorithms
- Chapter 09: Fundamental Data Design Patterns
- Chapter 10: Common Data Design Patterns
- Chapter 11: Join Design Patterns
- Chapter 12: Feature Engineering in PySpark
The following directories are bonus chapters:
| Bonus Chapter | Description |
|---|---|
| Data Design Patterns | Practical Data Design Patterns |
| Word Count | Provided multiple solutions for word count problem using reduceByKey() and groupByKey() reducers. |
| Anagrams | Find words, which are anagrams: provided multiple solutions for anagrams problem using reduceByKey(), groupByKey(), and combineByKey() reducers. |
| Lambda Expressions | How to use Lambda Expressions in PySpark programs |
| TF-IDF | Term Frequency - Inverse Document Frequency |
| K-mers | K-mers for DNA Sequences |
| Correlation | All vs. All Correlation |