{"id":22613,"date":"2021-02-09T10:48:47","date_gmt":"2021-02-09T10:48:47","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/learning-decision-trees\/"},"modified":"2023-09-05T07:44:06","modified_gmt":"2023-09-05T07:44:06","slug":"learning-decision-trees","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/learning-decision-trees\/","title":{"rendered":"Learning Decision Trees"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22613\" class=\"elementor elementor-22613\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-9835808 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9835808\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-cd34e37\" data-id=\"cd34e37\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-cf7569d elementor-widget elementor-widget-text-editor\" data-id=\"cf7569d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"4680\">In the context of supervised learning, a decision tree is a tree for predicting the output for a given input. We start from the root of the tree and ask a particular question about the input. Depending on the answer, we go down to one or another of its children. The child we visit is the root of another tree. So we repeat the process, i.e. ask another question here. Eventually, we reach a leaf, i.e. a node with no children. This node contains the final answer which we output and stop.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4da0916 elementor-widget elementor-widget-heading\" data-id=\"4da0916\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">This process is depicted below.<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-102c9a6 elementor-widget elementor-widget-image\" data-id=\"102c9a6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"408\" height=\"337\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0WL4V7w9oHVk2hMsq.png\" class=\"attachment-large size-large wp-image-18639\" alt=\"Learning Decision Trees\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0WL4V7w9oHVk2hMsq.png 408w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0WL4V7w9oHVk2hMsq-300x248.png 300w\" sizes=\"(max-width: 408px) 100vw, 408px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f967cca elementor-widget elementor-widget-text-editor\" data-id=\"f967cca\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"919f\"><strong>Let\u2019s see an example.<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d868855 elementor-widget elementor-widget-image\" data-id=\"d868855\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"655\" height=\"530\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/08CVi5x_39SKi1wrc.png\" class=\"attachment-large size-large wp-image-18640\" alt=\"A decision tree with categorical predictor variables\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/08CVi5x_39SKi1wrc.png 655w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/08CVi5x_39SKi1wrc-300x243.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/08CVi5x_39SKi1wrc-610x494.png 610w\" sizes=\"(max-width: 655px) 100vw, 655px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">A decision tree with categorical predictor variables<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3077acd elementor-widget elementor-widget-text-editor\" data-id=\"3077acd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"fe53\">In machine learning, decision trees are of interest because they can be learned automatically from labeled data. A labeled data set is a set of pairs (<strong>x<\/strong>,&nbsp;<em>y<\/em>). Here&nbsp;<strong>x<\/strong>&nbsp;is the input vector and&nbsp;<em>y<\/em>&nbsp;the target output.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-44bfb2d elementor-widget elementor-widget-heading\" data-id=\"44bfb2d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Below is a labeled data set for our example.<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dbe76b1 elementor-widget elementor-widget-text-editor\" data-id=\"dbe76b1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<pre class=\"wp-block-preformatted\"><strong>Weather Temperature Action<\/strong><br>Sunny   Warm        Go to park<br>Sunny   Cold        Stay at home<br>Sunny   Hot         Stay at home<br>Rainy   Warm        Stay at home<br>Rainy   Cold        Stay at home<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8bf2ec1 elementor-widget elementor-widget-heading\" data-id=\"8bf2ec1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Why Decision Trees<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6f1ef1e elementor-widget elementor-widget-text-editor\" data-id=\"6f1ef1e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"47be\">Learned decision trees often produce good predictors. Because they operate in a tree structure, they can capture interactions among the predictor variables. (This will register as we see more examples.)<\/p>\n<p id=\"6af3\">The added benefit is that the learned models are transparent. That is, we can inspect them and deduce how they predict. By contrast, neural networks are opaque. Deep ones even more so.<\/p>\n<p id=\"a4be\">Of course, when prediction accuracy is paramount, opaqueness can be tolerated. Decision trees cover this too. Ensembles of decision trees (specifically Random Forest) have state-of-the-art accuracy.<\/p>\n<p id=\"e60b\">So either way, it\u2019s good to learn about decision tree learning.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c64cebe elementor-widget elementor-widget-heading\" data-id=\"c64cebe\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Learning Decision Trees<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c57c8b5 elementor-widget elementor-widget-text-editor\" data-id=\"c57c8b5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"12f8\">Okay, let\u2019s get to it. We\u2019ll focus on binary classification as this suffices to bring out the key ideas in learning. For completeness, we will also discuss how to morph a binary classifier to a multi-class classifier or to a regressor.<\/p>\n<p id=\"f35b\">The pedagogical approach we take below mirrors the process of induction. We\u2019ll start with learning base cases, then build out to more elaborate ones.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e1049ff elementor-widget elementor-widget-heading\" data-id=\"e1049ff\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Learning Base Case 1: Single Numeric Predictor<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e465dc0 elementor-widget elementor-widget-text-editor\" data-id=\"e465dc0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"5577\">Consider the following problem. The input is a temperature. The output is a subjective assessment by an individual or a collective of whether the temperature is HOT or NOT.<\/p>\n<p id=\"38c2\">Let\u2019s depict our labeled data as follows, with &#8211; denoting NOT and + denoting HOT. The temperatures are implicit in the order in the horizontal line.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c06c9d2 elementor-widget elementor-widget-text-editor\" data-id=\"c06c9d2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<pre class=\"wp-block-preformatted\">- - - - - + + + + +<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8fdb41a elementor-widget elementor-widget-text-editor\" data-id=\"8fdb41a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"ab18\">This data is linearly separable. All the -s come before the +s.<\/p>\n<p id=\"1e28\">In general, it need not be, as depicted below.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-98b3557 elementor-widget elementor-widget-text-editor\" data-id=\"98b3557\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<pre class=\"wp-block-preformatted\">- - - - - + - + - - - + - + + - + + - + + + + + + + +<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7b55c4a elementor-widget elementor-widget-text-editor\" data-id=\"7b55c4a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"e161\">Perhaps the labels are aggregated from the opinions of multiple people. There might be some disagreement, especially near the boundary separating most of the -s from most of the +s.<\/p>\n<p id=\"f048\">Our job is to learn a threshold that yields the best decision rule. What do we mean by \u201cdecision rule\u201d. For any threshold T, we define this as<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-778d322 elementor-widget elementor-widget-text-editor\" data-id=\"778d322\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<pre class=\"wp-block-preformatted\">IF temperature &lt; T<br>  return NOT<br>ELSE<br>  return HOT<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-903e2de elementor-widget elementor-widget-text-editor\" data-id=\"903e2de\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"e8e9\">The accuracy of this decision rule on the training set depends on T.<\/p>\n<p id=\"babd\">The objective of learning is to find the T that gives us the most accurate decision rule. Such a T is called an optimal split.<\/p>\n<p id=\"0836\">In fact, we have just seen our first example of learning a decision tree. The decision tree is depicted below.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1f9335d elementor-widget elementor-widget-image\" data-id=\"1f9335d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"415\" height=\"319\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0NmpgMq3Yrns7z44Q.png\" class=\"attachment-large size-large wp-image-18641\" alt=\"A decision-tree with one internal node\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0NmpgMq3Yrns7z44Q.png 415w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0NmpgMq3Yrns7z44Q-300x231.png 300w\" sizes=\"(max-width: 415px) 100vw, 415px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">A decision tree with one internal node<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-35b48c5 elementor-widget elementor-widget-heading\" data-id=\"35b48c5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Learning Base Case 2: Single Categorical Predictor<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fb50306 elementor-widget elementor-widget-text-editor\" data-id=\"fb50306\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"488e\">Consider&nbsp;<em>season<\/em>&nbsp;as a predictor and&nbsp;<em>sunny<\/em>&nbsp;or&nbsp;<em>rainy<\/em>&nbsp;as the binary outcome. Say we have a training set of daily recordings. For each day, whether the day was sunny or rainy is recorded as the outcome to predict. The season the day was in is recorded as the predictor.<\/p>\n<p id=\"a12f\">This problem is simpler than Learning Base Case 1. The predictor has only a few values. The four seasons. No optimal split to be learned. It\u2019s as if all we need to do is to fill in the \u2018predict\u2019 portions of the case statement.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fd80b0e elementor-widget elementor-widget-text-editor\" data-id=\"fd80b0e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<pre class=\"wp-block-preformatted\">case season<br>when summer: Predict sunny<br>when winter: Predict rainy<br>when fall: ??<br>when spring: ??<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-917ecaa elementor-widget elementor-widget-text-editor\" data-id=\"917ecaa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"80c4\">That said, we do have the issue of \u201cnoisy labels\u201d. This just means that the outcome cannot be determined with certainty. Summer can have rainy days.<\/p>\n<p id=\"aad4\">This issue is easy to take care of. At a leaf of the tree, we store the distribution over the counts of the two outcomes we observed in the training set. This is depicted below.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5ec7038 elementor-widget elementor-widget-image\" data-id=\"5ec7038\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"953\" height=\"376\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0x77qdLF5p6-3tKXZ.png\" class=\"attachment-large size-large wp-image-18642\" alt=\"Learning Decision Trees\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0x77qdLF5p6-3tKXZ.png 953w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0x77qdLF5p6-3tKXZ-300x118.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0x77qdLF5p6-3tKXZ-768x303.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0x77qdLF5p6-3tKXZ-610x241.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0x77qdLF5p6-3tKXZ-750x296.png 750w\" sizes=\"(max-width: 953px) 100vw, 953px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9fca83f elementor-widget elementor-widget-text-editor\" data-id=\"9fca83f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3747\">Say the season was summer. The relevant leaf shows 80: sunny and 5: rainy. So we would predict sunny with a confidence 80\/85.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-27c4b8b elementor-widget elementor-widget-heading\" data-id=\"27c4b8b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Learning General Case 1: Multiple Numeric Predictors<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-61d4e48 elementor-widget elementor-widget-text-editor\" data-id=\"61d4e48\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"f21d\">Not surprisingly, this is more involved.<\/p>\n<p id=\"ae98\">We start by imposing the simplifying constraint that the decision rule at any node of the tree tests only for a single dimension of the input.<\/p>\n<p id=\"034a\">Call our predictor variables&nbsp;<em>X<\/em>1, \u2026,&nbsp;<em>X<\/em>n. This means that at the tree\u2019s root we can test for exactly one of these. The question is, which one?<\/p>\n<p id=\"5dbb\">We answer this as follows. For each of the&nbsp;<em>n<\/em>&nbsp;predictor variables, we consider the problem of predicting the outcome solely from that predictor variable. This gives us&nbsp;<em>n<\/em>&nbsp;one-dimensional predictor problems to solve. We compute the optimal splits&nbsp;<em>T<\/em>1, \u2026,&nbsp;<em>T<\/em>n for these, in the manner described in the first base case. While doing so we also record the accuracies on the training set that each of these splits delivers.<\/p>\n<p id=\"1095\">At the root of the tree, we test for that&nbsp;<em>X<\/em>i whose optimal split&nbsp;<em>T<\/em>i yields the most accurate (one-dimensional) predictor. This is depicted below.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a8c7dfc elementor-widget elementor-widget-image\" data-id=\"a8c7dfc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"381\" height=\"302\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0ScB1gS1aPXsKw3gs.png\" class=\"attachment-large size-large wp-image-18643\" alt=\"The best predictor of the outcome from Xi alone\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0ScB1gS1aPXsKw3gs.png 381w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0ScB1gS1aPXsKw3gs-300x238.png 300w\" sizes=\"(max-width: 381px) 100vw, 381px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">The best predictor of the outcome from Xi alone<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3dead6a elementor-widget elementor-widget-text-editor\" data-id=\"3dead6a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"0425\">So now we need to repeat this process for the two children A and B of this root. Towards this, first, we derive training sets for A and B as follows. The training set for A (B) is the restriction of the parent\u2019s training set to those instances in which&nbsp;<em>X<\/em>i is less than&nbsp;<em>T<\/em>&nbsp;(&gt;=&nbsp;<em>T<\/em>). Let\u2019s also delete the&nbsp;<em>X<\/em>i dimension from each of the training sets.<\/p>\n<p id=\"448d\">Now we have two instances of exactly the same learning problem. So we recurse.<\/p>\n<p id=\"4a27\">And so it goes until our training set has no predictors. Only binary outcomes. The node to which such a training set is attached is a leaf. Nothing to test. The data on the leaf are the proportions of the two outcomes in the training set. This suffices to predict both the best outcome at the leaf and the confidence in it.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6146874 elementor-widget elementor-widget-heading\" data-id=\"6146874\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Learning General Case 2: Multiple Categorical Predictors<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-01c728a elementor-widget elementor-widget-text-editor\" data-id=\"01c728a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"37ff\">Here we have&nbsp;<em>n<\/em>&nbsp;categorical predictor variables&nbsp;<em>X<\/em>1, \u2026,&nbsp;<em>X<\/em>n. As we did for multiple numeric predictors, we derive&nbsp;<em>n<\/em>&nbsp;univariate prediction problems from this, solve each of them, and compute their accuracies to determine the most accurate univariate classifier. The predictor variable of this classifier is the one we place at the decision tree\u2019s root.<\/p>\n<p id=\"1673\">Next, we set up the training sets for this root\u2019s children.<\/p>\n<p id=\"0bbe\">There is one child for each value&nbsp;<em>v<\/em>&nbsp;of the root\u2019s predictor variable&nbsp;<em>X<\/em>i. The training set at this child is the restriction of the root\u2019s training set to those instances in which&nbsp;<em>X<\/em>i equals&nbsp;<em>v<\/em>. We also delete attribute&nbsp;<em>X<\/em>i from this training set.<\/p>\n<p id=\"2e8e\">Now we recurse as we did with multiple numeric predictors.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-38adaf5 elementor-widget elementor-widget-heading\" data-id=\"38adaf5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Example<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ad9b97e elementor-widget elementor-widget-text-editor\" data-id=\"ad9b97e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"8547\">Let\u2019s illustrate this learning on a slightly enhanced version of our first example, below. We\u2019ve named the two outcomes&nbsp;<em>O<\/em>&nbsp;and&nbsp;<em>I<\/em>, to denote outdoors and indoors respectively. We\u2019ve also attached counts to these two outcomes. A row with a count of&nbsp;<em>o&nbsp;<\/em>for&nbsp;<em>O<\/em>&nbsp;and&nbsp;<em>i<\/em>&nbsp;for&nbsp;<em>I<\/em>&nbsp;denotes&nbsp;<em>o<\/em>&nbsp;instances labeled&nbsp;<em>O<\/em>&nbsp;and&nbsp;<em>i<\/em>&nbsp;instances labeled&nbsp;<em>I<\/em>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3959fc9 elementor-widget elementor-widget-text-editor\" data-id=\"3959fc9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<pre class=\"wp-block-preformatted\"><strong>Weather Temperature Action<br><\/strong>Sunny   Warm        O (50), I (10)<br>Sunny   Cold        O (10), I (40)<br>Sunny   Hot         O (5), I (30)<br>Rainy   Warm        O (10), I (25)<br>Rainy   Cold        O (1), I (35)<br>Rainy   Hot         O (2), I(7)<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4dd763c elementor-widget elementor-widget-text-editor\" data-id=\"4dd763c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"94d4\">So what predictor variable should we test at the tree\u2019s root? Well,&nbsp;<em>weather<\/em>&nbsp;being&nbsp;<em>rainy<\/em>&nbsp;predicts&nbsp;<em>I<\/em>. (That is, we stay indoors.) Weather being sunny is not predictive on its own. Now consider&nbsp;<em>Temperature<\/em>. Not surprisingly, the temperature is hot or cold also predicts&nbsp;<em>I<\/em>.<\/p>\n<p id=\"d6eb\">Which variable is the winner? Not clear. Let\u2019s give the nod to&nbsp;<em>Temperature<\/em>&nbsp;since two of its three values predict the outcome. (This is a subjective preference. Don\u2019t take it too literally.)<\/p>\n<p id=\"880b\">Now we recurse.<\/p>\n<p id=\"7bdd\">The final tree that results is below.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9b0f16f elementor-widget elementor-widget-image\" data-id=\"9b0f16f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"931\" height=\"694\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0Eo4vzK_6pV3t8omd.png\" class=\"attachment-large size-large wp-image-18644\" alt=\"A learned decision tree with two categorical predictor variables and a binary response\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0Eo4vzK_6pV3t8omd.png 931w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0Eo4vzK_6pV3t8omd-300x224.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0Eo4vzK_6pV3t8omd-768x572.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0Eo4vzK_6pV3t8omd-610x455.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0Eo4vzK_6pV3t8omd-750x559.png 750w\" sizes=\"(max-width: 931px) 100vw, 931px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">A learned decision tree with two categorical predictor variables and a binary response<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7734cf5 elementor-widget elementor-widget-heading\" data-id=\"7734cf5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Mixed Predictor Variables<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5cf35e3 elementor-widget elementor-widget-text-editor\" data-id=\"5cf35e3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"fd2d\">What if we have both numeric and categorical predictor variables? A reasonable approach is to ignore the difference. To figure out which variable to test for at a node, just determine, as before, which of the available predictor variables predicts the outcome the best. For a numeric predictor, this will involve finding an optimal split first.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7739ee5 elementor-widget elementor-widget-heading\" data-id=\"7739ee5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">The Learning Algorithm: Abstracting Out The Key Operations<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c588bb8 elementor-widget elementor-widget-text-editor\" data-id=\"c588bb8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9b0f\">Let\u2019s abstract out the key operations in our learning algorithm. These abstractions will help us in describing its extension to the multi-class case and to the regression case.<\/p>\n<p id=\"263e\"><strong>The key operations are<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-707ad60 elementor-widget elementor-widget-text-editor\" data-id=\"707ad60\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol><li>Evaluate how accurately any one variable predicts the response.<\/li><li>Derive child training sets from those of the parent.<\/li><\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-afe0e9e elementor-widget elementor-widget-heading\" data-id=\"afe0e9e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Multi-class Case<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4c6809e elementor-widget elementor-widget-text-editor\" data-id=\"4c6809e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"ae61\">What if our response variable has more than two outcomes? As an example, say on the problem of deciding what to do based on the weather and the temperature we add one more option:&nbsp;<em>go to the Mall<\/em>.<\/p>\n<p id=\"0f55\">Fundamentally nothing changes. Adding more outcomes to the response variable does not affect our ability to do operation 1. (The evaluation metric might differ though.) Operation 2 is not affected either, as it doesn\u2019t even look at the response.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7728a37 elementor-widget elementor-widget-heading\" data-id=\"7728a37\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Regression<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c64ade3 elementor-widget elementor-widget-text-editor\" data-id=\"c64ade3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"945b\">What if our response variable is numeric? Here is one example. Predict the day\u2019s high temperature from the month of the year and the latitude.<\/p>\n<p id=\"d599\">Can we still evaluate the accuracy with which any single predictor variable predicts the response? This raises a question. How do we even predict a numeric response if any of the predictor variables are categorical?<\/p>\n<p id=\"d6b7\">Let\u2019s start by discussing this. First, we look at<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1f71aa9 elementor-widget elementor-widget-heading\" data-id=\"1f71aa9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Base Case 1: Single Categorical Predictor Variable<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0485423 elementor-widget elementor-widget-text-editor\" data-id=\"0485423\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3d8c\">For each value of this predictor, we can record the values of the response variable we see in the training set. A sensible prediction is the mean of these responses.<\/p>\n<p id=\"01c2\">Let\u2019s write this out formally. Let&nbsp;<em>X<\/em>&nbsp;denote our categorical predictor and&nbsp;<em>y<\/em>&nbsp;the numeric response. Our prediction of&nbsp;<em>y<\/em>&nbsp;when&nbsp;<em>X<\/em>&nbsp;equals&nbsp;<em>v<\/em>&nbsp;is an estimate of the value we expect in this situation, i.e. E[<em>y<\/em>|<em>X<\/em>=<em>v<\/em>].<\/p>\n<p id=\"d4d2\">Let\u2019s see a numeric example. Consider the training set<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7131be2 elementor-widget elementor-widget-text-editor\" data-id=\"7131be2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<pre class=\"wp-block-preformatted\">X A A B B<br>y 1 2 4 5<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c0f944e elementor-widget elementor-widget-text-editor\" data-id=\"c0f944e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"7eb7\">Our predicted&nbsp;<em>y<\/em>s for&nbsp;<em>X<\/em>&nbsp;= A and&nbsp;<em>X<\/em>&nbsp;= B are 1.5 and 4.5 respectively.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-89d72fd elementor-widget elementor-widget-heading\" data-id=\"89d72fd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Base Case 2: Single Numeric Predictor Variable<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1ba769c elementor-widget elementor-widget-text-editor\" data-id=\"1ba769c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9cca\">For any particular split&nbsp;<em>T<\/em>, a numeric predictor operates as a boolean categorical variable. So the previous section covers this case as well.<\/p>\n<p id=\"ba9e\">Except that we need an extra loop to evaluate various candidate&nbsp;<em>T<\/em>\u2019s and pick the one which works the best. Speaking of \u201cworks the best\u201d, we haven\u2019t covered this yet. We do this below.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0cad0fa elementor-widget elementor-widget-heading\" data-id=\"0cad0fa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Evaluation Metric<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8808c56 elementor-widget elementor-widget-text-editor\" data-id=\"8808c56\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"07c8\">Both the response and its predictions are numeric. We just need a metric that quantifies how close to the target response the predicted one is. A sensible metric may be derived from the sum of squares of the discrepancies between the target response and the predicted response.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3437a9c elementor-widget elementor-widget-heading\" data-id=\"3437a9c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">General Case<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4f8784d elementor-widget elementor-widget-text-editor\" data-id=\"4f8784d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"1819\">We have covered operation 1, i.e. evaluating the quality of a predictor variable towards a numeric response. Operation 2, deriving child training sets from a parent\u2019s, needs no change. As noted earlier, this derivation process does not use the response at all.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5cd2bff elementor-widget elementor-widget-heading\" data-id=\"5cd2bff\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">The Response Predicted At A Leaf<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4fe51da elementor-widget elementor-widget-text-editor\" data-id=\"4fe51da\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"6211\">As in the classification case, the training set attached at a leaf has no predictor variables, only a collection of outcomes. As noted earlier, a sensible prediction at the leaf would be the mean of these outcomes. So this is what we should do when we arrive at a leaf.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d7b9d34 elementor-widget elementor-widget-heading\" data-id=\"d7b9d34\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Modeling Tradeoffs<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a200898 elementor-widget elementor-widget-text-editor\" data-id=\"a200898\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"02d6\">Consider our regression example: predict the day\u2019s high temperature from the month of the year and the latitude.<\/p>\n<p id=\"b9d4\">Consider the month of the year. We could treat it as a categorical predictor with values&nbsp;<em>January<\/em>,&nbsp;<em>February<\/em>,&nbsp;<em>March<\/em>, \u2026 Or as a numeric predictor with values&nbsp;<em>1<\/em>,&nbsp;<em>2<\/em>,&nbsp;<em>3<\/em>, \u2026<\/p>\n<p id=\"0445\">What are the tradeoffs? Treating it as a numeric predictor lets us leverage the order in the months.&nbsp;<em>February<\/em>&nbsp;is near&nbsp;<em>January<\/em>&nbsp;and far away from&nbsp;<em>August<\/em>. That said, how do we capture that&nbsp;<em>December<\/em>&nbsp;and&nbsp;<em>January<\/em>&nbsp;are neighboring months?&nbsp;<em>12<\/em>&nbsp;and&nbsp;<em>1<\/em>&nbsp;as numbers are far apart.<\/p>\n<p id=\"418f\">Perhaps more importantly, decision tree learning with a numeric predictor operates only via splits. That would mean that a node on a tree that tests for this variable can only make binary decisions. By contrast, using the categorical predictor gives us 12 children. In principle, this is capable of making finer-grained decisions.<\/p>\n<p id=\"51c3\">Now consider latitude. We can treat it as a numeric predictor. Or as a categorical one induced by a certain binning, e.g. in units of + or &#8211; 10 degrees. The latter enables finer-grained decisions in a decision tree.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a075e49 elementor-widget elementor-widget-heading\" data-id=\"a075e49\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Summary<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-669468f elementor-widget elementor-widget-text-editor\" data-id=\"669468f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"4432\">In this post, we have described learning decision trees with intuition, examples, and pictures. We have covered both decision trees for both classification and regression problems. We have also covered both numeric and categorical predictor variables.<\/p>\n<p id=\"7e71\"><strong>Further Reading<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-48ed85c elementor-widget elementor-widget-text-editor\" data-id=\"48ed85c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol><li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Decision_tree_learning\" target=\"_blank\" rel=\"noreferrer noopener\">Decision tree learning<\/a><\/li><\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>In the context of supervised learning, a decision tree is a tree for predicting the output for a given input. Learned decision trees often produce good predictors.<\/p>\n","protected":false},"author":1044,"featured_media":18645,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[1324,92,1325,664],"ppma_author":[3691],"class_list":["post-22613","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-decision-trees","tag-machine-learning","tag-regression-tree","tag-supervised-learning"],"authors":[{"term_id":3691,"user_id":1044,"is_guest":0,"slug":"arun-jagota","display_name":"Arun Jagota","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Arun-Jagota-150x150.jpeg","user_url":"https:\/\/www.salesforce.com\/in\/?ir=1","last_name":"Jagota","first_name":"Arun","job_title":"","description":"Arun Jagota is Director of Data Science at Salesforce.com. A PhD in computer science, he has taught undergraduate, graduate, and continuing education courses in Computer Science at many US Universities from 1992 through 2006. He has written a number of books, most available at Amazon.com, 50 academic publications and has 17+ patents issued."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22613","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1044"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22613"}],"version-history":[{"count":4,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22613\/revisions"}],"predecessor-version":[{"id":32255,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22613\/revisions\/32255"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/18645"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22613"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22613"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22613"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22613"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}