This course begins with a basic introduction, and examines some of the strengths and weaknesses of traditional linear regression. We’ll also cover some basics of R, as the examples in this course will use the R programming language to analyze data.
The second module then dives into LASSO models. We see how the LASSO model can solve many of the challenges we face with linear regression, and how it can be a very useful tool for fitting linear models. We also look at a real world use case: forecasting sales at 83 different stores.
The third and final module looks at two additional regularized regression models: Ridge and ElasticNet. We then compare these models, both theoretically and by examining their performance on the forecasting problem from module 2.
What am I going to get from this course?
Implement LASSO, Ridge and Elastic Net models so that they can better analyze data. These models will help them capture relationships in their data, avoid overfitting, and provide models which will predict better than traditional linear regression.
Prerequisites and Target Audience
What will students need to know or do before starting this course?
This course is taught with the programming language R. Students not familiar with R should be prepared to spend a bit extra time catching up on some of the basics of R. Additionally, exposure to linear regression (for example, in an introductory statistics course) would be highly useful.
Who should take this course? Who should not?
- If you want to learn how to start with regularized regression models.
- Currently use linear regressions and want to implement better models.
- Are curious about the ideas of machine learning.
- You should not take this course if you have successfully implemented and used LASSO, Ridge and Elastic Net models in the past (unless you didn’t understand what you were doing).
Module 1: Strengths and Weaknesses of Linear Regression
We'll look at how to install R and examine some of the basics of R: comments, plotting, packages, help, etc. Also, we'll install the free RStudio and examine how this will help us when using R.
Linear Regression Review
In this lecture, we briefly review the concept of linear regression. How do linear models work? How do they choose one specific line to fit the data? What are the "sums of squares" and how are they used?
The Problem with Multicollinearity
Multicollinearity can cause huge problems when fitting linear regression models. In this lecture, we'll explore what multicollinearity means and examine it's impact on an example dataset.
Detecting Multicollinearity: VIFs
In this lecture, we'll explore how to detect multicollinearity using the Variance Inflation Factors, or VIFs. This statistic can be very useful for measuring the impact of correlated features.
The p>n Problem
Linear regression models also run into problems when the number of predictor variables (commonly written as "p") is more than the number of observations ("n"). We'll investigate what happens in these scenarios.
Best Linear Unbiased Estimator
We've examined some shortcomings of linear regression, but in this lecture we'll discuss some of the strengths. In particular, linear regression is the best model (in a statistical sense) if you want an unbiased model that is a linear function of the predictors.
Stepwise Variable Selection
Stepwise regression is an alternative to linear regression in which we select a subset of variables out of all possible features. This approach has some improvements over simple linear regression, and we'll explore how well this performs on some example datasets.
We'll examine the penalty functions which are optimized by linear and stepwise regression, and then we'll introduce a new penalty corresponding to the ridge regression model. We'll discuss how this penalty works and gain insight into why this improves linear regression models.
We'll fit our first ridge regression model in R! We'll examine how to fit the model, how to predict with the model, and how to extract the coefficients estimated by the model.
Estimation Along a Path
We'll learn a bit about the underlying algorithm which is used to fit ridge regression models. This will help us to understand how these models work, and how we can efficiently fit ridge regression models.
We'll examine the differences between error on the training data and new data (i.e. the "test" set). We'll learn about how important it is to evaluate models on their ability to fit new data, and we'll compare ridge regression models to linear regression models.
When running a ridge regression model, many different ridge regression models are generated corresponding to different lambda values. We'll talk a bit about how lambda influences the fit, and we'll see how to select a reasonable value of lambda using an example dataset.
We'll look at some of the parameters available in the R implementation of ridge regression models, and learn when we should use them.
We'll take a look at the example dataset that we'll be using in this course. In this lecture, we'll just explore the dataset and create some variables which will be used in later models.
Forecasting: Building Features
The real world application in this course will be to use the dataset described in the previous lecture and generate forecasts at the product level. In this lecture, we'll develop some new features which will be useful for this forecasting.
Forecasting: Using Ridge
In this lecture, we'll use the dataset and features we've generated so far and generate a forecasting model! We'll fit both linear regression and ridge regression models.
In order to evaluate the models fit in the previous lecture, we'll introduce the concept of cross-validation for time series. We'll examine the particular product from the previous lecture, and compare the linear and ridge regression models.
Forecasting: Comparing Results
In this lecture, we'll generalize the results of the previous lecture to the time series for all products. We'll compare the performance of our two models for all these products, and seek to understand when certain models perform better than others.
Module 3: Ridge and ElasticNet
In this lecture, we'll introduce the ElasticNet penalty and examine the differences between this model and the Ridge/LASSO. We'll also discuss the reason for having so many different models.
Comparing LASSO and Ridge
We'll use our new models, and apply them to the orange juice sales dataset from the previous module. We'll explore the different results we get between the three models.
Selecting LASSO vs Ridge vs ElasticNet
Now that we have several different models to choose from, we need a way to determine the best model. In this lecture, we'll explore how to use cross-validation to select a model, and we'll see which models are the best performers on the orange juice dataset.
In this project, we'll examine how the size of the dataset impacts linear, ridge, and LASSO regression models. So, simulate the following data:
x = random uniform variable between -10 and 10 (R: x = runif(n, -10, 10))
y1 = linear function of x plus noise (R: y1 = x + rnorm(n))
y2 = cubic function of x plus noise (R: y2 = .01*(x-5)*(x+7)*x + rnorm(n))
y3 = sine function of x plus noise (R: y3 = sin(x/2) + rnorm(n))
Vary n to get a good range of dataset sizes; I used 10, 30, 100, 1000, 10000, 100000.
For each simulated dataset, examine how well linear, ridge, and LASSO fit the data. Given that we have a non-linear relationship between x and y, let's also include polynomial terms in the model (i.e. x^2, x^3, ..., x^10).
Solution: I've uploaded my solution as an R file. Of course, your results may be different depending on how you chose your models, but hopefully they are similar to mine.