Course Description

Gartner, IBM, Accenture and many others have asserted that 80% or more of the world’s information is unstructured – and inherently hard to analyze. What does that mean? And what is required to extract insight from unstructured data? Unstructured data is infinitely variable in quality and format, because it is produced by humans who can be fastidious, unpredictable, ill-informed, or even cynical, but always unique, not standard in any way. Recent advances in natural language processing provides the notion that unstructured content can be included in data analysis. Serious growth and value companies are committed to data. The exponential growth of Big Data has posed major challenges in data governance and data analysis. Good data governance is pivotal for business growth. Therefore, it is of paramount importance to slice and dice Big Data that addresses data governance and data analysis issues. In order to support high quality business decision making, it is important to fully harness the potential of Big Data by implementing proper Data Migration, Data Ingestion, Data Management, Data Analysis, Data Visualization and Data Virtualization tools.

What am I going to get from this course?

At completion of this course, students will possess an in-depth understanding that will help them to

Architect Big Data Infrastructure
Use Big Data (Hadoop) tools and techniques
Design and Deploy MapReduce processing
Perform Big Data Migration from Oracle to HIVE
Plan & Implement Big Data ETL
Manage Big Data
Analyze and Visualize Big Data

These skills will be applied on projects that will be either storage driven or application driven. These projects may serve the following end goals:

Big Data Ingestion (ETL)
Big Data Management (Apache HIVE Datawarehouse)
Big Data Visualization and Analytics (Tableau / 3D - Dashboard)
Big Data Migration
Big Data Integration

Prerequisites and Target Audience

What will students need to know or do before starting this course?

Good understanding of recent developments in I.T hardware and software directions
Good understanding of Java Virtual Machine and Compute, Network, and Storage hardware
Good understanding of distributed processing and cloud computing

Who should take this course? Who should not?

I.T middle / high level managers
B.I Architects
B.I Leads
Senior JAVA / system programmers
Big Data Landscape enthusiasts

Curriculum

Module 1: Introduction to Big Data

Lecture 1 Business Value of Big Data

This class will focus on: (1) Why Big Data is a big leap forward from Business Intelligence world of the past, and (2) Various ways to slice and dice Big Data to extract maximum value from it.

Lecture 2 Rapid Growth of Big Data, Big Data Definition, and Big Data Projects

This class will focus on: (1) Understanding of the primary drivers for the growth of Big Data and why Health Care industry is most involved in Big Data analytics, (2) Understanding of what Big Data is, the hidden value in it, and how new architecture, algorithms, and techniques can be used to extract that hidden value, and (3) Understanding of the broad characteristics of Big Data projects

Module 2: Big Data Implementation

Lecture 3 Hadoop Eco System, Hadoop Infrastructure, and Hadoop JVM Framework

This class will focus on: (1) How Hadoop Eco System and Hadoop Infrastructure exploit latest technologies to support efficient and distributed processing of massive amounts of data, (2) How to harness the capability of Virtual Machines that enables use of large number of inexpensive commodity servers, and (3) How Hadoop Infrastructure capitalizes on the Compute, Network and Storage technologies.

Lecture 4 Hadoop Distributed File System (HDFS) and associated tutorials

This class will focus on: (1) How Hadoop Version 2 manages the cluster of Virtual Machines, (2)How HDFS incorporates fast, efficient and fault tolerant design, and (3) File and directory manipulation commands that are used on HDFS.

Lecture 5 MapReduce Software, MapReduce Processing and associated tutorial

This class will focus on: (1) How the components of Hadoop Eco system are packaged in Cloudera distribution bundle that are designed to run on Virtual Machine clusters, (2) How MapReduce splits input dataset into independent chunks which are processed by MapReduce tasks in a completely parallel manner, and (3) Pseudo code for MapReduce JAVA classes such as Mapper, Reducer etc.

Module 3: Big Data Migration

Lecture 6 Apache SQOOP - Data Migration, SQOOP commands, and HIVE arguments

This class will focus on: (1) Apache SQOOP as a powerful data exchange tool, and (2) SQOOP command line interface commands for migrating data from Oracle R-DBMS to Cloudera Hive.

Lecture 7 SQOOP Architecture and associated tutorial

This class will focus on: (1) Salient features of Apache SQOOP such as connectors for all major R-DBMS to load data into Apache HIVE, and (2) SQOOP Architecture and how different components interact to facilitate data transfer between legacy Enterprise Data Warehouses / R-DBMSs and HDFS / Apache HIVE.

Module 4: Big Data Ingestion / Big Data Management

Lecture 8 Tools & Techniques - Informatica BDM and HIVE

This class will focus on: (1) Importance of Informatica BDM for Data Ingestion and HIVE for Data Management as effective way to build Big Data repository for data analytics, and (2) HIVE architecture and how it supports HIVE Web Interface and HIVE Command Line Interface.

Lecture 9 High Level Tasks to set up Big Data Business Intelligence Application

This class will focus on: (1) Sequence of tasks required to build a Big Data business intelligence application that will be instrumental in extracting business value from Big Data, and (2) Technical architecture of Big Data business intelligence application.

Module 5: Big Data Visualization

Lecture 10 Success Factors for Big Data Analytics and TABLEAU

This class will focus on: (1) Implications of scale, velocity and scope of Big Data, (2) Characteristics of a great Data Visualization tool such as TABLEAU, and (3) Importance of Type 3 data that provides actionable insights.

Lecture 11 3-D Dashboards (Fast, Wide and Deep) and TABLEAU Architecture

This class will focus on: (1) Fast, Wide and Deep (3-D) dashboards that are result of (i) streaming analytics of click stream data, (ii) analysis of real-time data, and (iii) machine learning, and (2) TABLEAU Architecture which uses a proprietary technology that makes interactive data visualization an integral part of understanding data.

Module 6: Cloud Computing

Lecture 12 Cloud Computing versus Hadoop Processing and effective use of Cloud Computing in Big Data

This class will focus on: (1) Difference between Cloud Computing and Hadoop Processing, (2) Why Big Data is converging towards Cloud Computing, and (3) Why IaaS is the preferred cloud type for Big Data applications.

Quiz 1

March Towards Big Data - Big Data Implementation, Migration, Ingestion, Management, & Visualization

Certification

Need Custom Training for Your Team?

Call Us

Inquire About This Course

Instructor

Nasir Raheem

Instructor: Nasir Raheem

Learn how to slice and dice big data

Duration: 2h 3m

About Course

Prerequisites

Curriculum

Course Description

What am I going to get from this course?

Prerequisites and Target Audience

What will students need to know or do before starting this course?

Who should take this course? Who should not?

Curriculum

Module 1: Introduction to Big Data

Module 2: Big Data Implementation

Module 3: Big Data Migration

Module 4: Big Data Ingestion / Big Data Management

Module 5: Big Data Visualization

Module 6: Cloud Computing