This course begins with a basic introduction, and describes why decision trees are useful tools and how they differ from more traditional analytical tools like linear regression. We’ll also cover some basics of R, as the examples in this course will use the R programming language to analyze data.
The second module then dives into so-called “regression trees”, or decision trees for continuous variables (i.e. variables that take on numeric values, like sales amounts or number of purchases). It provides a theoretical basis for these models as well as practical examples and use cases.
The third module is very similar to the second, except that it treats categorical variables (i.e. product type of next purchase) instead of continuous variables.
In the fourth module, we’ll talk about random forests and the idea of combining many individual classification or regression trees to make one final, improved prediction.
Module 5 builds on the idea of random forests, but presents a slightly different framework with boosted trees. You’ll learn about an implementation of boosted trees, XGBoost, which is one of the most popular tree algorithms and has been used extensively for machine learning problems.
What am I going to get from this course?
Learn and understand decision trees, random forests, boosted tree models, and interpret results to drive business decisions.
Prerequisites and Target Audience
What will students need to know or do before starting this course?
We will be using R for this course, but little prior knowledge is required (as long as you’re willing to learn a bit along the way). Some basic understanding of mathematical functions and algorithms is also important.
Who should take this course? Who should not?
You should take this course if you want to learn how to start with tree based models
You should take this course if you want to implement tree based models in your daily work
You should take this course if you are curious about the theoretical ideas of machine learning models
You should not take this course if you have successfully implemented and used random forests and/or XGBoost on datasets in the past, unless you didn’t understand what you were doing.
R Installation and Basics
Since we'll be using R for this course, this lecture will give a quick overview of how to install R and RStudio. We'll also look at some basics of R, but we won't go too in depth.
Interpreting Linear Models
In this lecture, we'll examine how linear models handle interactions, and, in particular, the challenges that come with trying to interpret linear models. In later lectures, we'll compare these models to tree based models.
Basics of Decision Trees
We'll discuss some basic terms and definitions of decision trees, and we'll look at a few very simple examples to see what tree models can look like.
We discuss how trees can handle input features that have highly non-linear relationships with the target variable, and we compare this to linear regression, where all forms of non-linearity must be specified via transformations of the input features.
Module 2: Regression Trees
We'll explore a sample dataset containing sales of 11 different types of orange juice products at 83 stores and over the course of 120 weeks. This dataset will be used extensively in the rest of the course.
We'll discuss Boolean features and how they are used in regression trees. We'll also see an example of how to construct a regression tree using a single Boolean feature.
We'll discuss categorical features and how they are used in regression trees. We'll also see an example of how to construct a regression tree using a single categorical feature.
We'll discuss continuous features and how they are used in regression trees. We'll also see an example of how to construct a regression tree using a single continuous feature.
Previously, we've seen examples of regression trees using various types of features to construct models. In this lecture, we'll explore the rationale behind how these splits are chosen, and explore this theory with an example of one particular split.
Advantages and Limitations
We've discussed several different properties of decision trees already, but in this lecture, we'll dive in a bit deeper to some of the strengths and weaknesses of regression trees, particularly as compared to linear regression.
In this lecture, we'll examine a more realistic regression tree model using many different features.
When fitting a tree based model, we usually first look at a plot of the decision tree in terms of the splits and rules used to create the splits. However, it's also sometimes interesting to explore the different relationships between the features and the target, and in this lecture, we'll examine ways of understanding relationships between one or two features (at a time) and the target.
Complexity parameters are used to control how complex or simple a regression tree is. We'll see some examples of various complexity parameters, and learn how they control the complexity of the tree.
In order to create a good decision tree, we must find a good value for the complexity parameter. We'll discuss how we can pick good parameters for the model via a method known as cross-validation.
Estimate a model for logmove vs price_over_min, price_under_max, price_per_oz_over_min, price_per_oz_under_max, and classification. Estimate a new model for logmove using exp(price_over_min), exp(price_under_max), exp(price_per_oz_over_min), exp(price_per_oz_under_max), and classification. Estimate a third model using exp(logmove) and the same features of the first model. Compare the performance of the three models, and describe which one you think is best.
Module 3: Classification Trees
Comparison with Regression Trees
We'll begin this module by discussing a new problem: how can we estimate a target that takes on one of many different categories? We'll look at why the techniques we used for regression trees can't be directly applied here, but we'll start to investigate how such models could be constructed.
Goodness of Fit
We saw in the previous lecture how RMSE can't be applied to classification trees. So, we'll explore alternative error metrics and seek to understand how we can measure how good a particular split of a classification tree is.
The simplest type of categorical variable is one taking on two values. This lecture will present an example of a classification tree with a Boolean (i.e. taking on only two values) target.
Targets with 3+ Categories
We'll generalize from the previous lecture of Boolean targets to targets which have multiple categories. We'll look at an example of these types of classification trees within the Orange Juice dataset.
Module 3 Mini-Project
Module 4: Random Forests
We'll discuss the main idea behind random forests and develop some intuition around the basics of how they work with a simple example.
Theory of Random Forests
We'll discuss the basic theory of random forests: why they work and why randomness is important. We'll also introduce the concept of out of bag observations.
Tuning the Number of Trees
Determining the appropriate tuning parameters is an important part of any machine learning model. In this lecture, we'll learn how to pick a good value for the number of trees to fit in a random forest model.
Determining the appropriate tuning parameters is an important part of any machine learning model. In this lecture, we'll learn how to pick a good value for the number of features to try at a every split in a random forest model.
One advantage of using tree-based models is their interpretability, and we lose that interpretability when we average 500 trees. However, there are still some measures we can look at to understand why a random forest is predicting how it is, and we'll explore those in this lecture.
Module 5: Gradient Boosting
We'll look at an example of tree-based models with updated weights to motivate the idea of gradient boosting algorithms.
Optimizing Loss Functions
We'll first look at the algorithm for Adaboost, one of the first implementations of this idea of gradient boosting. We'll then discuss the idea of a loss function, and use this to understand how gradient boosting works.
Tuning Gradient Boosting
We'll discuss all (or at least many of) the tuning parameters available for tuning a gradient boosting model. In Lecture 28, we'll look at how to optimize these tuning parameters while fitting a model.
In this lecture, we'll see how to use the XGBoost algorithm (a nice implementation of gradient boosting) within R.
Finding the Optimal Model
We'll dive into the details of the XGBoost model. We'll learn about how to tune the various parameters in an efficient way in order to get an optimized final model.
Categorical Features in XGBoost
We've seen that the implementation of random forests and decision trees handle categorical features, but the XGBoost implementation does not accept such categorical features. We'll look at two ways of converting categorical features into numerical ones.
Module 5 Mini-Project