Course Description

In this course you will be introduced to the classification problem and a number of the approaches used to solve the problem. Each approach is presented with the underlying intuition as well as the necessary mathematical underpinnings. We discuss the learning algorithms and illustrate the python tools available using examples. You will learn the relative merits and demerits of each approach. The focus of the course is on learning to find the right model for the problem at hand using the available tools and experimentation. Throughout the course, exercises are provided to reinforce ideas.

What am I going to get from this course?

Learn several classification models that are widely in use.

Gain the knowledge and skills to effectively apply existing classification algorithms and tools to solve real-world problems.

Evaluate multiple models and select the most appropriate for the task at hand.

Prerequisites and Target Audience

What will students need to know or do before starting this course?

Students will benefit from prior exposure to probability and statistics, basic algebra and calculus. Familiarity with the Python programming language is required. Students should be able to use Python 3.x and Python Notebooks.

Who should take this course? Who should not?

Industry professionals and college students who are interested in learning about the available algorithms and tools to address machine learning problems in general, and specifically, the classification problem.

Curriculum

Module 1:

Lecture 1 Introduction & Overview

In this lecture we will look at what machine learning is, and look at some examples of its application in our everyday life.

Lecture 2 What is Supervised Learning

In this lecture, we will define what the classification problem is within Machine Learning, see some examples and briefly outline the models we will be looking at.

Lecture 3 Measuring Classifier Performance

In this lecture, we review the metrics that are commonly used for measuring model performance, and provide pointers to additional resources to complement the course material.

Module 2:

Lecture 4 Data Sets & Visualization

In this lecture, we will get familiar with the programming language and environment we will be using, as well as some of the data sets we will use to illustrate the classification models we will be exploring in this course.

Quiz 1 Seeds Data Exploration

Observe the relationships between the features in the Seeds data and identify two features that you consider most likely to provide good separation between the classes.

Module 3:

Lecture 5 K Nearest Neighbors

In this lecture, we will explore the K Nearest Neighbor classifier. We look at the underlying intuition, and the parameters needed to obtain the model. We present the classification algorithm and illustrate this approach using the Iris data set with different parameters and observe the results. We will then review the pros and cons of using this approach.

Quiz 2 KNN - Creating Test Data and Measuring Test Accuracy

Create test data by using a portion of the Iris data set for testing. Use the remaining data for training. Measure test accuracy for the models created in class and compare them to the training accuracy.

Module 4:

Lecture 6 Decision Trees for Classification

In this lecture we explore another classification model – the Decision Tree. Decision trees are used for regression as well as classification. You will learn the basic characteristics of decision trees and how they may be applied to classification problems.

Quiz 3

Module 5:

Lecture 7 Parametric Models: Bayes' Theorem & Naive Bayes Classifier

We now move to exploring a number of parametric models for classification. First we will review Bayes’ theorem and how it is applied in arriving at the conditional probabilities we are attempting to estimate for our classifiers. We will then look at the Naive Bayes classifier, reviewing the underlying assumptions, the model, its application to the Iris dataset.

Lecture 8 Parametric Models: Linear & Quadratic Discriminant Analysis

In this lecture, we will look at two parametric models - Linear Discriminant Analysis and Quadratic Discriminant Analysis. We will look at how these models differ from Naive Bayes by reviewing their underlying assumptions. We will then apply these models to a sample dataset and review their pros and cons.

Quiz 4 Comparing Probabilistic Classifiers

In this exercise, we will compare the performance of the 3 probabilistic classifiers introduced in this module - NB, LDA and QDA - on the Seeds data set.

Module 6:

Lecture 9 Logistic Regression

In this section we will explore Logistic Regression, another probabilistic classification model that is widely applied to binary classification problems. We will learn about the Logistic Function and the role it plays in the Logistic Regression Model. We will examine what is involved in learning the Logistic Regression Model from training data and apply it to a sample dataset.

Quiz 5 Binary Classification of Seeds Data using Logistic Regression

In this exercise, you will build and evaluate a logistic regression model for the seeds dataset. We combine two of the target classes reducing the problem to one of binary classification. You will build and evaluate the logistic regression model for different values of the parameter C.

Module 7:

Lecture 10 Support Vector Machines

We will look at three closely related classifier models – the maximum margin classifier, the support vector classifier, and support vector machines. We will examine the underlying concepts and how the classifier models are learnt from training data. We discuss basis expansion and Kernel functions and how they are used to extend the model to handle non-linear decision boundaries.

Quiz 6

Module 8:

Lecture 11 Classification Summary

In this final lecture, we will look at what it means to arrive at the best model for the problem at hand. Specifically, we will review the concepts of model capacity, generalization, capacity control, overfitting, underfitting, bias, variance, and the bias-variance tradeoff. We then examine hyperparameters and regularization, and their role in capacity control.

Quiz 7 Final Mini Project: Evaluating Classifier Models

In this short project, you will work with the Income dataset. The dataset contains information about adults including age, level of education, class of work, marital status, occupation, and nationality. The objective is to predict the income category as either > $50K or <= $50K. You will look at 3 different classifier models and compare them in terms of their test accuracy, training time, and test time.

Supervised Learning: Classification

Certification

Need Custom Training for Your Team?

Call Us

Inquire About This Course

Instructor

Dr. Rukmini Vijaykumar

Instructor: Dr. Rukmini Vijaykumar

Classification Methods, Algorithms and Tools

Duration: 2h 30m

About Course

Prerequisites

Curriculum

Course Description

What am I going to get from this course?

Prerequisites and Target Audience

What will students need to know or do before starting this course?

Who should take this course? Who should not?

Curriculum

Module 1:

Module 2:

Module 3:

Module 4:

Module 5:

Module 6:

Module 7:

Module 8: