My name is Peter Chen and I am the instructor for this course. I want to introduce you to the wonderful world of Machine Learning through practical examples and code. The course will cover Supervised Learning algorithms and models in machine learning. More importantly, it will get you up and running quickly with a practical and at times funny applications of Supervised Learning algorithms. The course has code & sample data for you to run and learn from. It also encourages you to explore your own datasets using Supervised Learning algorithms.
Beginner knowledge of Python and R. It's used mostly for expository reasons. You do NOT need to be a Python or R expert to understand this course. Basic math and comfortable with basic probability and statistics.
What am I going to get from this course?
* Understand the two major types of Supervised Machine Learning
* Know when to apply a prediction machine learning algorithm
* Know when to apply a classification machine learning algorithm
* Gain an intuition behind the math of the underlying algorithms and be able to explain it
* Learn how to use Python scikit-learn library and R libraries to build supervised machine learning models and algorithms
* Apply Python & R code to their data sets to solve prediction and classification problems
* Evaluate the effectiveness of their machine learning models
* Develop a taste for tinkering with the model to improve results
Prerequisites and Target Audience
What will students need to know or do before starting this course?
Basic Python and R. Do not need to be an expert programmer. We use these languages mainly for expository reasons. Basic probability math.
Who should take this course? Who should not?
Students who are interested in a practical introduction to supervised machine learning. Less on theory, but more on practical application of machine learning that can get you up and running. Must like to play with data and code. Enthusiasm is more important than expertise.
Module 1: Introduction
Introductions to Supervised Machine Learning
Module 2: Regressions
We will cover the basics of linear regression. Also work through example code in Python.
Linear Regression using Statsmodels Module
Another approach using the statsmodels module. It gives more statistical tests and parameters than scikit learn.
Multiple Linear Regression
This lecture extends the basic concept of linear regression to include more variables using the same car price prediction data set. Exercise encourages student to turn on/off different variables to see the effects that has on the model performance.
Multiple Linear Regression Project: Turning Features On/Off
In our lecture example, we used ALL of the independent variables/features and we got an R^2 of 0.83. Please try to turn off some of the features and reran the multiple linear regression model to see what kinds of R^2 you can get. Does more features/predictors always equal to better R^2 or does it decrease the R^2 sometimes?
Multiple Linear Regression Addendum: Using Statsmodel Module
Another approach using statsmodel module. It gives more detailed statistical answers than scikit learn.
You'll learn a type of regression not frequently discussed: polynomial regression. Learning how to fit nonlinear relationships.
Polynomial Regression Project
Find a data set that has quadratic behavior and run a multiple linear regression and compare that R^2 with running a quadratic regression.
P.S. Bonus points to those who understood the math joke at the end of Lecture 4.
Polynomial Regression Addendum: Using Numpy
Another approach to polynomial regression using Numpy
Module 3: Neural Networks
Neural networks are machine learning algorithms that model after the way brain learns. You'll learn what they are and use neural network to predict continuous values problems such as the car price prediction before and many others.
Project: Playing with Different Neural Network Architectures
In this assignment, please try to change the number of hidden layers, the type of activation functions , and the number of input nodes to see if it changes the effectiveness of the model predictions. This is more open ended exploration of neural network and the various components.
Module 4: Regression Trees
Understand what tree regressions are and how flexible they can be for predicting often times non-linear problems that typical linear regression can fail.
Play with max_depth of the tree regression. Does increasing the max_depth on the tree improves the accuracy? At one point does diminishing returns occur? That is as you increase the max_depth more and more it stops getting better? Play with that parameter to see how it behaves. Also pick another nonlinear problem you like to predict using tree regression.
Module 5: Classification
Introduction to Classification
The second type of Supervise Learning is classification. This introduces the concept of classification and the applications of classification.
Logistic Regression Part1
Logistic Regression, despite its name, is used for classification problems and not prediction problems. We examine the concept of using logistic regression for binary classification of whether an email is spam or not. Armed with this knowledge and code, students can easily modify the code to classify other binary problems.
Logistic Regression: Part 2
How to measure performance of binary classification
Logistic Regression: Part 3
Practical example of spam detection using logistic regression
Naive Bayesian: Introductions & Probability Review
Develop an intuitive understanding of Bayes Rules through a worked out example also and see how that can be used to classify things.
Understand how tree models can be used for classification in addition to regression. Learn GraphViz application to visualize beautiful classification trees and use this powerful modern machine learning algorithm for the classic Iris data set.
Classification Tree Quiz