Self-Paced Online Learning

Unlimited access and Capstone project reviewed by the Instructor

USD 149
Enroll Now

Instructor Led Online Classes

Four 2-hour workshops. Personal walkthrough of 3 mini projects and a capstone project.

Sep 03 Sat

Extended - Late Summer (Saturday)

Sep 03 to Sep 24

Sep
  • 3 Sat

    Session 1

    11:00 AM to 01:00 PM (EST)

    This module will cover the following sections in detail:
    • Module 1 - Introduction
    • Module 2 - Data sources
    • Module 3 - Obtaining Data
    We will end with an introduction to the first Mini-Project which should be done after this session on the student's own time.

  • 10 Sat

    Session 1

    11:00 AM to 01:00 PM (EST)

    We will go over the first Mini-Project from the end of the previous session in detail at the start of this session.

    The module will then cover the following sections in detail:
    • Module 4 - Cleaning Data
    • Module 5 - Shaping Data
    We will end with an introduction to the second Mini-Project which should be done after this session on the student's own time.

  • 17 Sat

    Session 1

    11:00 AM to 01:00 PM (EST)

    We will go over the second Mini-Project from the end of the previous session in detail at the start of this session.

    The module will then cover the following section in detail
    • Module 6 - Features/Variables
    We will end with an introduction to the third Mini-Project which should be done after this session on the student's own time.

  • 24 Sat

    Session 1

    11:00 AM to 01:00 PM (EST)

    We will go over the third Mini-Project from the end of the previous session in detail at the start of this session.

    The module will then cover the following sections in detail:
    • Module 7 - Exporting & Saving
    • Module 8 - Data Pipeline
    • Module 9 - Conclusion & Capstone
    We will end with an introduction to the Capstone and instructions on how to submit it for personalized feedback.  The capstone should be done after this session on the student's own time and submitted directly to the instructor.

Oct 19 Wed

Extended - Fall (Wednesday)

Oct 19 to Nov 09

Oct
  • 19 Wed

    Session 1

    05:00 PM to 07:00 PM (EST)

    This module will cover the following sections in detail:
    • Module 1 - Introduction
    • Module 2 - Data sources
    • Module 3 - Obtaining Data
    We will end with an introduction to the first Mini-Project which should be done after this session on the student's own time.

  • 26 Wed

    Session 1

    05:00 PM to 07:00 PM (EST)

    We will go over the first Mini-Project from the end of the previous session in detail at the start of this session.

    The module will then cover the following sections in detail:
    • Module 4 - Cleaning Data
    • Module 5 - Shaping Data
    We will end with an introduction to the second Mini-Project which should be done after this session on the student's own time.

Nov
  • 2 Wed

    Session 1

    05:00 PM to 07:00 PM (EST)

    We will go over the second Mini-Project from the end of the previous session in detail at the start of this session.

    The module will then cover the following section in detail
    • Module 6 - Features/Variables
    We will end with an introduction to the third Mini-Project which should be done after this session on the student's own time.

  • 9 Wed

    Session 1

    05:00 PM to 07:00 PM (EST)

    We will go over the third Mini-Project from the end of the previous session in detail at the start of this session.

    The module will then cover the following sections in detail:
    • Module 7 - Exporting & Saving
    • Module 8 - Data Pipeline
    • Module 9 - Conclusion & Capstone
    We will end with an introduction to the Capstone and instructions on how to submit it for personalized feedback.  The capstone should be done after this session on the student's own time and submitted directly to the instructor.

Jan 24 Tue

Compact - January (Weekends)

Jan 24 to Feb 02

Jan
  • 24 Tue

    Session 1

    05:00 PM to 07:00 PM (CST)

    This module will cover the following sections in detail:
    • Module 1 - Introduction
    • Module 2 - Data sources
    • Module 3 - Obtaining Data
    We will end with an introduction to the first Mini-Project which should be done after this session on the student's own time.

  • 26 Thu

    Session 1

    05:00 PM to 07:00 PM (CST)

    We will go over the first Mini-Project from the end of the previous session in detail at the start of this session.

    The module will then cover the following sections in detail:
    • Module 4 - Cleaning Data
    • Module 5 - Shaping Data
    We will end with an introduction to the second Mini-Project which should be done after this session on the student's own time.

  • 31 Tue

    Session 1

    05:00 PM to 07:00 PM (CST)


Feb
  • 2 Thu

    Session 1

    05:00 PM to 07:00 PM (CST)


Need Custom Training for Your Team?

Get Quote

Call Us

Toll Free (844) 397-3739

Inquire About This Course

Instructor

Thumb 8b9e3c0b 517c 4ef4 bd0c aae17f99505d

Dr. Connie Brett

Dr. Connie Brett is a successful Data Scientist, Entrepreneur and Educator who has spent the past 15+ years implementing and coaching analytics teams across the entire SDLC. She brings a unique perspective to the problems faced by all phases of planning, developing and use of online products and solutions - this helps her teach you how to use analytics tools in the most effective way. With an M.S. and Ph.D. from The Ohio State University in Computational Chemistry, she worked in the quagmire of data problems, preparation, and analysis long before the coinage of the term "Data Science" or "Big Data". She has been published in peer-reviewed journals and recently filed for a US Patent on a Data Visualization Framework.

Data Wrangling in R

Instructor: Dr. Connie Brett

Real-world data preparation for further analysis using R

  • Learn from start to finish how to get your data into R efficiently and polish it up so that it is as good as it can be.
  • Instructor is the founder of Analytics Incubation Center at Cisco and has 15 years of analytics development experience.
  • Capstone project reviewed by the instructor.

Course Description

R is an extraordinarily powerful language with a vast community of great resources, but where should you start when all you want to do is get your data into a usable format? How do you know your data might be ready? What are the pitfalls you should watch for so that you don’t perform an analysis on bad data? This course will teach you from start to finish how to get your data into R efficiently and polish it up so that it is as good as it can be. This will let you or your team focus after this step on the statistical modeling, visualization, reporting, sharing, or any other post-processing task you wish to perform. Confidence, reliability, and reproducibility in your data acquisition and preparation are the kingpins to being able to maximize your data’s value. This course uses a variety of real-world data sets that contain real-world data quality, formatting, and other issues. It will ensure that you understand not just the R syntax to perform a task, but also sources of quality issues, how to recognize hidden data problems, and the benefits and adverse effects of the most common data manipulations. This course will give you real experience in the art and science of data preparation that you can take to your next real project forward with confidence. The capstone project utilizes open agricultural industry data in preparation for a future statistical analysis of the products and brands of the companies. Like a real project, the project goals and background are provided but the step-by-step data preparation is not given - the course will have provided the methods and insights needed to prepare this data for future statistical analysis! The capstone project is reviewed by the instructor and feedback is individually provided to each student in the course along with a full project solution.

What am I going to get from this course?

  • Understand the R syntax to perform a task
  • Identify sources of quality issues
  • Recognize hidden data problems
  • Understand benefits/detriments of the most common data manipulations
  • Prepare a real-world dataset for future statistical analysis and utilize the capstone project as a portfolio piece.

Prerequisites and Target Audience

What will students need to know or do before starting this course?
  • R-Studio installed (optional, but strongly suggested)
  • R installed
  • Basic R programming knowledge
Who should take this course? Who should not?
  • Students do not need to be an R expert to take this course, but should have a basic knowledge of how to use R.
  • Students should be persons who use data and R and want to better understand how to prepare data for analysis correctly and efficiently.

Curriculum

Module 1: Introduction
Lecture 1 Introduction to the Course
05:30

Course Objectives, Audience and Instructor Information.

Lecture 2 Course Slides

Download the entire set of Course Slides as a PDF to take notes/etc as you take this course.

Module 2: Data Sources
Lecture 3 Importance of Metadata
02:11

An overview and understanding of why metadata is important.

Lecture 4 Collection Bias
02:05

Understanding Collection Bias and why it is critical to keep in mind during data collection and analysis.

Lecture 5 Public Data Sources
03:24

Using public data sources including best practices.

Lecture 6 Private Data
10:28

Defining and understanding private data.

Module 3: Obtaining Data
Lecture 7 Database Connections
02:40

Connecting to and querying data directly from databases in R

Lecture 8 Files
07:00

Obtaining data from various file types and formats

Lecture 9 Hadoop
03:51

Interacting with Hadoop data stores in R

Lecture 10 Mini-Project 1

In this project we are going to obtain the data used in the mini-projects. Complete this project before the quiz!

Quiz 1 Mini-Project 1

Questions related to Mini-Project 1.

Module 4: Cleaning Data
Lecture 11 HTML
04:31

Dealing with HTML encoding in fields

Lecture 12 JSON
03:43

Dealing with JSON formatted data

Lecture 13 Excel
06:19

Excel-specific data cleaning issues and tips.

Lecture 14 Whitespace/Languages
02:05

Handling whitespace and multi-language issues in R

Lecture 15 Units and Conversions
01:48

Handling unit conversions

Lecture 16 Data Type Issues
01:55

Common data type issues

Lecture 17 Categorical Creep
03:47

Recognizing and solving categorical "creep" or spread

Lecture 18 Minor Corrections
01:36

Best Practices for minor corrections

Lecture 19 Completeness
03:50

Overview of detecting and handling of completeness issues during data cleaning

Lecture 20 Accuracy
01:31

Notes on accuracy considerations while cleaning data

Module 5: Shaping Data
Lecture 21 Long vs. Wide Formats
03:09

Understanding and converting between these commonly referenced data shapes

Lecture 22 Combined Data
02:22

Separating combined data in a single field

Lecture 23 Column & Row Names
03:31

Capturing data contained in column and row names

Lecture 24 Internally Structured Data
03:53

Flattening data with embedded structured data

Lecture 25 Internal Lists
04:22

Handling lists inside fields

Lecture 26 Naming Columns
00:58

Quick best-practices and considerations when naming columns

Lecture 27 OLAP Cubes
01:59

Using OLAP cube data in R

Lecture 28 Mini-Project 2

In this project we are going to prepare the data from Mini-Project 1 for analysis. Complete this project before the quiz!

Quiz 2 Mini-Project 2

Questions related to Mini-Project 2

Module 6: Features/Variables
Lecture 29 Introduction
01:14

Introducing Feature/Variable Selection

Lecture 30 Elimination - Variance
05:06

Eliminating features with zero or near-zero variance

Lecture 31 Elimination - Correlation
05:39

Eliminating features using correlation

Lecture 32 Feature Creation
03:35

Finding and creating features

Lecture 33 Examining Distributions
02:26

Examining variable distributions - continuous data

Lecture 34 Finding Rare Events
02:32

Finding rare events in data that may signal an issue

Lecture 35 Normalization
03:52

Normalizing and rescaling data

Lecture 36 Advanced Preprocessing
02:28

Handling less-common dat preprocessing scenarios such as baseline removal.

Lecture 37 Wrap-Up
00:53

Comments on selecting features/variables

Lecture 38 Mini-Project 3

In this project we are going to refine the dataset by feature manipulation Complete this project before the quiz!

Quiz 3 Mini-Project 3

Questions related to Mini-Project 3

Module 7: Exporting & Saving
Lecture 39 Exporting & Saving Prepared Data
05:07

Tips, tricks and notes about exporting and saving your prepared data

Module 8: Data Pipeline
Lecture 40 Working with R in a Data Pipeline
06:25

Considerations when Data Wrangling as part of a data pipeline.

Module 9: Conclusion & Capstone
Lecture 41 Next Steps and Additional Resources
03:52

Course Wrap-up

Lecture 42 Capstone Project

Instructions for the Capstone Project The capstone project utilizes open agricultural industry data in preparation for a future statistical analysis of the products and brands of the companies. Like a real project, the project goals and background are provided but the step-by-step data preparation is not given - you will be able to use the methods you learned in the class to prepare this data for the project's future statistical analysis.

Reviews

1 Review

Empty user
Xiao X

December, 2016