Tidy Your Data Before Using It in Machine Learning Algorithms

Understand what data preprocessing is and why it is needed as part of an overall data science and machine learning methodology.
Be able to summarize your data by using some statistics and data visualization.
Instructor has 25 years experience with data design, data architecture, and analytics. He holds two graduate degrees in Information Systems & Management with a Ph.D. in IT.

Course Description

We know that data is very messy and comes in a variety of form. As part of the overall data mining and machine learning process, we must take the time to preprocess our data. This means we must ensure that it is structured, cleansed, and address any problems that the data may have. Preprocessing the data includes gaining a better understanding of the data through descriptive statistics and data visualization techniques. It also includes ensuring that missing data or outliers are handled accordingly.

What am I going to get from this course?

Understand what data preprocessing is and why it is needed as part of an overall data science and machine learning methodology
Review and understand data quality issues and how to address them
Apply specific Python functions to assist in cleansing and transforming your data
Be able to summarize your data by using some statistics and data visualization.

Prerequisites and Target Audience

What will students need to know or do before starting this course?

Programming Knowledge in Python

Lists, variables, loops, etc.

Basic Statistics Knowledge

Inferential and Descriptive Statistics

Python loaded onto your computer.

I use Spyder IDE and the Anaconda distribution.
I have Python 3.6.1 on my machine, so any version greater than 3.6 will work.

Who should take this course? Who should not?

Individuals with basic Python & statistics knowledge can take this course.

Curriculum

Module 1: Introduction to Data Preprocessing

Lecture 1 What is data preprocessing?

Lecture 2 What is dirty data?

Lecture 3 Structuring Data

Lecture 4 Overview of Data Cleansing

Module 2: Data Quality

Lecture 5 Data Quality

Lecture 6 Data Quality Challenges

Lecture 7 Raw Files and File Formats

Lecture 8 Structured Data

Lecture 9 Finding Data Sets

Lecture 10 Loading Data into Python

Lecture 11 Loading Data Into Python Part 2

Module 3: Summarizing Data with StatisticsModule...

Lecture 12 Review of Basic Statistics

Lecture 13 Summarizing Data with Python

Module 4: Data Visualization

Lecture 14 Introduction to Data Visualization

Lecture 15 EDA and CDA

Lecture 16 Creating a Histogram

Lecture 17 Box Plots

Lecture 18 Bar Graphs

Lecture 19 Other Graphs

Module 5: Data Cleansing

Lecture 20 Missing Data Part 1

Lecture 21 Missing Data Part 2

Lecture 22 Outlier Detection Part 1

Lecture 23 High-Dimensional Data

Lecture 24 Outlier Detection Part 2

Module 6: Feature Scaling

Lecture 25 Introduction to Feature Scaling

Lecture 26 Final Thoughts

Data Pre-Processing

Certification

Need Custom Training for Your Team?

Call Us

Inquire About This Course

Instructor

Dr. Rich Huebner

Instructor: Dr. Rich Huebner

Tidy Your Data Before Using It in Machine Learning Algorithms

Duration: 2h 04m

About Course

Prerequisites

Curriculum

Course Description

What am I going to get from this course?

Prerequisites and Target Audience

What will students need to know or do before starting this course?

Who should take this course? Who should not?

Curriculum

Module 1: Introduction to Data Preprocessing

Module 2: Data Quality

Module 3: Summarizing Data with StatisticsModule...

Module 4: Data Visualization

Module 5: Data Cleansing

Module 6: Feature Scaling