### Course Description

This course begins with a basic introduction, and describes why decision trees are useful tools and how they differ from more traditional analytical tools like linear regression. We’ll also cover some basics of R, as the examples in this course will use the R programming language to analyze data.
The second module then dives into so-called “regression trees”, or decision trees for continuous variables (i.e. variables that take on numeric values, like sales amounts or number of purchases). It provides a theoretical basis for these models as well as practical examples and use cases.
The third module is very similar to the second, except that it treats categorical variables (i.e. product type of next purchase) instead of continuous variables.
In the fourth module, we’ll talk about random forests and the idea of combining many individual classification or regression trees to make one final, improved prediction.
Module 5 builds on the idea of random forests, but presents a slightly different framework with boosted trees. You’ll learn about an implementation of boosted trees, XGBoost, which is one of the most popular tree algorithms and has been used extensively for machine learning problems.

#### What am I going to get from this course?

Learn and understand decision trees, random forests, boosted tree models, and interpret results to drive business decisions.

### Prerequisites and Target Audience

#### What will students need to know or do before starting this course?

We will be using R for this course, but little prior knowledge is required (as long as you’re willing to learn a bit along the way). Some basic understanding of mathematical functions and algorithms is also important.

#### Who should take this course? Who should not?

You should take this course if you want to learn how to start with tree based models

You should take this course if you want to implement tree based models in your daily work

You should take this course if you are curious about the theoretical ideas of machine learning models

You should not take this course if you have successfully implemented and used random forests and/or XGBoost on datasets in the past, unless you didn’t understand what you were doing.

### Curriculum

Lecture 1
R Installation and Basics

10:11

Since we'll be using R for this course, this lecture will give a quick overview of how to install R and RStudio. We'll also look at some basics of R, but we won't go too in depth.

Lecture 2
Interpreting Linear Models

03:24

In this lecture, we'll examine how linear models handle interactions, and, in particular, the challenges that come with trying to interpret linear models. In later lectures, we'll compare these models to tree based models.

Lecture 3
Basics of Decision Trees

06:40

We'll discuss some basic terms and definitions of decision trees, and we'll look at a few very simple examples to see what tree models can look like.

Lecture 4
Modeling Non-Linearity

04:23

We discuss how trees can handle input features that have highly non-linear relationships with the target variable, and we compare this to linear regression, where all forms of non-linearity must be specified via transformations of the input features.

#### Module 2: Regression Trees

01:36:32

Lecture 5
Dataset Introduction

17:54

We'll explore a sample dataset containing sales of 11 different types of orange juice products at 83 stores and over the course of 120 weeks. This dataset will be used extensively in the rest of the course.

Lecture 6
Boolean Features

06:05

We'll discuss Boolean features and how they are used in regression trees. We'll also see an example of how to construct a regression tree using a single Boolean feature.

Lecture 7
Categorical Features

03:59

We'll discuss categorical features and how they are used in regression trees. We'll also see an example of how to construct a regression tree using a single categorical feature.

Lecture 8
Continuous Features

05:49

We'll discuss continuous features and how they are used in regression trees. We'll also see an example of how to construct a regression tree using a single continuous feature.

Lecture 9
Split Finding

10:52

Previously, we've seen examples of regression trees using various types of features to construct models. In this lecture, we'll explore the rationale behind how these splits are chosen, and explore this theory with an example of one particular split.

Lecture 10
Advantages and Limitations

10:19

We've discussed several different properties of decision trees already, but in this lecture, we'll dive in a bit deeper to some of the strengths and weaknesses of regression trees, particularly as compared to linear regression.

Lecture 11
Full Model

09:01

In this lecture, we'll examine a more realistic regression tree model using many different features.

Lecture 12
Visual Interpretation

12:26

When fitting a tree based model, we usually first look at a plot of the decision tree in terms of the splits and rules used to create the splits. However, it's also sometimes interesting to explore the different relationships between the features and the target, and in this lecture, we'll examine ways of understanding relationships between one or two features (at a time) and the target.

Lecture 13
Complexity Parameters

09:49

Complexity parameters are used to control how complex or simple a regression tree is. We'll see some examples of various complexity parameters, and learn how they control the complexity of the tree.

Lecture 14
Optimizing Parameters

10:18

In order to create a good decision tree, we must find a good value for the complexity parameter. We'll discuss how we can pick good parameters for the model via a method known as cross-validation.

Estimate a model for logmove vs price_over_min, price_under_max, price_per_oz_over_min, price_per_oz_under_max, and classification. Estimate a new model for logmove using exp(price_over_min), exp(price_under_max), exp(price_per_oz_over_min), exp(price_per_oz_under_max), and classification. Estimate a third model using exp(logmove) and the same features of the first model. Compare the performance of the three models, and describe which one you think is best.

#### Module 3: Classification Trees

45:30

Lecture 15
Comparison with Regression Trees

07:29

We'll begin this module by discussing a new problem: how can we estimate a target that takes on one of many different categories? We'll look at why the techniques we used for regression trees can't be directly applied here, but we'll start to investigate how such models could be constructed.

Lecture 16
Goodness of Fit

14:04

We saw in the previous lecture how RMSE can't be applied to classification trees. So, we'll explore alternative error metrics and seek to understand how we can measure how good a particular split of a classification tree is.

Lecture 17
Boolean targets

09:37

The simplest type of categorical variable is one taking on two values. This lecture will present an example of a classification tree with a Boolean (i.e. taking on only two values) target.

Lecture 18
Targets with 3+ Categories

14:20

We'll generalize from the previous lecture of Boolean targets to targets which have multiple categories. We'll look at an example of these types of classification trees within the Orange Juice dataset.

Quiz 5
Module 3 Mini-Project

#### Module 4: Random Forests

38:44

Lecture 19
Introduction

05:59

We'll discuss the main idea behind random forests and develop some intuition around the basics of how they work with a simple example.

Lecture 20
Theory of Random Forests

08:22

We'll discuss the basic theory of random forests: why they work and why randomness is important. We'll also introduce the concept of out of bag observations.

Lecture 21
Tuning the Number of Trees

10:00

Determining the appropriate tuning parameters is an important part of any machine learning model. In this lecture, we'll learn how to pick a good value for the number of trees to fit in a random forest model.

Lecture 22
Tuning mtry

05:38

Determining the appropriate tuning parameters is an important part of any machine learning model. In this lecture, we'll learn how to pick a good value for the number of features to try at a every split in a random forest model.

Lecture 23
Interpretation/Importance

08:45

One advantage of using tree-based models is their interpretability, and we lose that interpretability when we average 500 trees. However, there are still some measures we can look at to understand why a random forest is predicting how it is, and we'll explore those in this lecture.

#### Module 5: Gradient Boosting

01:03:12

Lecture 24
Motivation

05:15

We'll look at an example of tree-based models with updated weights to motivate the idea of gradient boosting algorithms.

Lecture 25
Optimizing Loss Functions

09:42

We'll first look at the algorithm for Adaboost, one of the first implementations of this idea of gradient boosting. We'll then discuss the idea of a loss function, and use this to understand how gradient boosting works.

Lecture 26
Tuning Gradient Boosting

06:28

We'll discuss all (or at least many of) the tuning parameters available for tuning a gradient boosting model. In Lecture 28, we'll look at how to optimize these tuning parameters while fitting a model.

Lecture 27
Simple Example

08:23

In this lecture, we'll see how to use the XGBoost algorithm (a nice implementation of gradient boosting) within R.

Lecture 28
Finding the Optimal Model

20:26

We'll dive into the details of the XGBoost model. We'll learn about how to tune the various parameters in an efficient way in order to get an optimized final model.

Lecture 29
Categorical Features in XGBoost

12:58

We've seen that the implementation of random forests and decision trees handle categorical features, but the XGBoost implementation does not accept such categorical features. We'll look at two ways of converting categorical features into numerical ones.

Quiz 8
Module 5 Mini-Project