Learn the fundamentals of how to produce industrial strength applications using the Hadoop ecosystem. In addition to the basics we introduce advanced topics such as intelligent hashing, partition skew detection, Monte Carlo simulation, partition pruning, and push predicates. Emerging industry standards in data formats, messaging, and stream processing provide guidance to students on future studies.
What am I going to get from this course?
- Understand core Hadoop components, how they work together, and real world industry best practices.
- How to produce industrial strength MapReduce applications with the highest standards of quality and robustness.
- Learn to utilize the Hadoop APIs for basic Data Science tasks such as Monte Carlo Simulation and data preparation.
- How to partition, reduce, sort, and join data using MapReduce to produce any result you could produce using SQL.
- Leverage the latest data storage formats to make data processing using MapReduce faster and easier than ever before.
- Proper usage of compression in large scale environments.
- How to collect data using Flume and Sqoop.
- Data exploration using Hive, Pig, and Drill.
- How to create truly reusable User Defined Functions which operate identically regardless of Hadoop distributions or version upgrades.
- Methods of exposing an API to enable Hadoop as a Service (HaaS)
- Future directions and trends in Big Data.