Industry recognized certification enables you to add this credential to your resume upon completion of all courses

Need Custom Training for Your Team?
Get Quote
Call Us

Toll Free (844) 397-3739

Inquire About This Course
Abdallah Bari, Instructor - Why Big Data is so important today?

Abdallah Bari

Dr. Abdallah Bari has a Ph.D in Genetics (imaging techniques) from university of Cordoba (2005 Spain) with PhD equivalency (2010 Canada). He's a Researcher-Founder and Author with a focus on applied mathematics and networking in research and development to develop collaboratively practical solutions . He is collaborating with academia, researchers, international research organizations, governmental institutions, communities, municipalities and development agencies. He advocates use of mathematics in research and development to address complexity and global challenges including climate change and sustainability. He helped with capacity development in a number of countries including training on applied mathematics spanning from data capture, descriptive analytics, to predictive and prescriptive analytics leading to results considered as breakthroughs. He developed several databases and tailored applications used by national and international organizations, some of these application have been translated from English to French and Russian. Published peer-reviewed articles and books. Some of his work has been also reported by international media such as BBC News and Life Sciences News.

Instructor: Abdallah Bari

Big Data is a game-changing opportunity today with unprecedented challenges

  • Implement techniques and tools using the most powerful IDEs combining R and Spark to integrate, prepare and analyze Big Data.
  • The course helps you develop pertinent strategies to better leverage Big Data and understand its challenges faced in various sectors. 
  • Instructor is a researcher and author whose work has been reported by news media outlets like BBC and Life Sciences News. 

Course Description

Big Data has grown unprecedentedly rapidly with the spread of cloud infrastructure, in less than a decade. Big Data today has helped organisations who adopted a Big Data strategy to be at the forefront of research and development. This course aims to help to develop strategies to better leverage Big Data in today’s data-driven economy. This course refers to a wide range of techniques to address Big Data’s challenges with the aim to pave the way to more new opportunities. The course’s overall objective is to help in the application of different techniques and tools to address Big Data challenges and to scale Big Data Analytics with originality. The course is intended for Data Engineers working on data integration and data preparation including ETL processes, Data Scientists working on scaling Big Data analytics, Researchers working on Big Data Discovery, Policy makers working on Big Data to address today’s challenges across sectors and all people who would like to learn different techniques to address Big Data challenges today to become new Big Data savvy professionals.

What am I going to get from this course?

At the end of this course attendees will be able to develop pertinent strategies to better leverage Big Data. They will be able to understand thoroughly and intuitively the new opportunities and the new challenges of Big Data. Learn the much-needed new skills to address these challenges spanning data integration, data preparation and analytics, including the emerging analytics.  Implement different techniques and tools using one of the most powerful integrated programming environments (IDE) combing R and Spark that will help to integrate, prepare and analyze Big Data.

Prerequisites and Target Audience

What will students need to know or do before starting this course?

The following could help, but they are not totally a prerequisite for this course. 
  • Knowledge of relational database management systems (RDBMS) and/or SQL (Structured Query Language) or NoSQL (Not only SQL),  
  • Concepts and notions of algebra (inverse problems) and probability, 
  • Some familiarity with the development of algorithms and coding (Matlab or R).

Who should take this course? Who should not?

This course is intended for people working on Big Data across sectors and for people who are willing to become Big Data savvy professionals. It specifically designed for :
  • Data Engineers working on data integration and data preparation including ETL/ELT (Extract, Transform and Load or Extract, Load and Transform) processes,  
  • Data Scientists working on scaling Big Data Processing and Big Data Analytics and  developing Machine Learning (ML or AI) applications, 
  • Researchers and Scientists working on Big Data for Discovery (drug or gene discovery) or for testing new algorithms,
  • Policy makers working on Big Data to address today’s challenges across sectors including education, 
  • People who would like to learn different techniques to address Big Data challenges spanning data integration, data preparation and data analytics, including emerging analytics.


Module 1: Big Data - Opportunities & Challenges

Lecture 1 Opportunities and challenges of Big Data

With the arrival of Big Data, they are new opportunities as well as new challenges that are more subtle than they appear. Big data has created radical shifts not only as paradigm shifts but also shifts in jobs requiring new skills spanning data engineering skills to data science skills. Some of these new required skills have been described as more inquiry type of skills or “soft” skills.

Quiz 1 Financial Stock Market Trends

This quiz is meant to help to explore the tools and the techniques introduced throughout the course. A code file is attached to help to extract data on financial stock markets trends - Use this file to display the trends of some of the companies such as Amazon, Apple and IBM

Module 2: Big Data Integration & Preparation

Lecture 2 Big Data Integration - Processing & Streaming of Big Data

Big Data has grown in volume, velocity and variety, requiring its integration and its processing on real-time. Processing such large and stream data is a key operational challenge for major industries today. There are tools that can help with Big Data integration such as Hadoop ecosystem. The integration can be either through horizontal scaling where processing instructions (CPU based architecture) are processed in sequence or through vertical scaling where processing instructions are processed in parallel (GPU based architecture).

Lecture 3 Big Data Preparation - Extract, Transform and Load (ETL or ELT)

Big Data is not only growing rapidly but also expanding to include other types of mostly unstructured and "un-formatted" data such as image data, sensor data, textual, and web related data, java script object notation (json) data. In addition to their integration, laborious data preparation is also required prior to their analysis, a process that may take substantial amount of time estimated to reach up to 80% of the time needed to carry out any in-depth data analysis. This data preparation process is as as decisive as data integration to have data prepared in the right format for data analytics and to feed to ML. Data preparation includes ETL processes, ETL stands for “extract, transform and load". Data preparation may overlap with data integration and the tools can be put on the top of data integration tools such as Hive, which added on top of Hadoop to carry out ETL for scaling data preparation.

Quiz 2 Data Integration - Parallel Processing

This quiz can be carried by either connecting to data sets via internet (cloud) or can be used locally. A case is attached for practice and as quiz to carry out parallel processing by first finding the number of processors (cores) in your compter and apply parallel processing accordingly.

Module 3: Big Data Analytics & At Work

Lecture 4 Big Data Analytics

As with big data integration and big data preparation, big data analytics will continue to be challenging as it is expanding to include not only predictive analytics but also prescriptive analytics and the emerging analytics such as edge analytics. This lecture introduces different analytics with a focus however on machine learning, in particular supervised and unsupervised learning. This lecture refers to the effective strategy to carry out big data analytics along ways to measure different ML s' performance and accuracy with demos and examples to test these different ML techniques.

Lecture 5 Big Data at Work

This lecture focuses on the implementation of the different techniques presented throughout this course from data integration to data analytics. The implementation will be carried out using R via it IDE RStudio with Apache Spark. R as functional programming uses two key concepts, functions and objects, to prepare and analyse any types of data. Spark is also based on two key concepts, Distributed File System and (DFS) and MapReduce to scale processing of Big Data. Spark is similar to Hadoop, however unlike Hadoop it uses the memory to process data streaming more rapidly and on real time. R and Spark combined allow scaling of streaming analytics, albeit more a horizontal scaling. Spark provides also SQL and Machine Learning capabilities. In this lecture, you will learn how to apply the R/Spark combined magic to merge, prepare, query and analyze Big Data and to catch up with Big Data's rapid growth.

Quiz 3 Data Analytics and Data at Work

This quiz consists of two parts. The first part relates to the use of R and Spark as platforms to develop functions (instructions) and data sets as data frames (objects). The second part spans the use all the different types of data sets, including a case of large data set (flight data set) and the use of different techniques of preparing and analyzing data.