{"id":1393,"date":"2019-02-15T10:32:07","date_gmt":"2019-02-15T10:32:07","guid":{"rendered":"http:\/\/kusuaks7\/?p=998"},"modified":"2023-09-13T09:29:26","modified_gmt":"2023-09-13T09:29:26","slug":"statistical-learning-for-data-science","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/statistical-learning-for-data-science\/","title":{"rendered":"Statistical Learning for Data Science"},"content":{"rendered":"<p><strong><em>Ready to learn Data Science? Browse\u00a0<a href=\"https:\/\/www.experfy.com\/training\/tracks\/data-science-training-certification\">Data Science Training and Certification<\/a> courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.TT<\/em><\/strong><\/p>\n<section>\n<p id=\"79b6\"><em>This post covers these topics related to Statistical Learning and their significance in data science.<\/em><\/p>\n<ul>\n<li id=\"3784\"><em>Introduction<\/em><\/li>\n<li id=\"881c\"><em>Prediction &amp; Inference<\/em><\/li>\n<li id=\"456c\"><em>Parametric &amp; Non-parametric methods<\/em><\/li>\n<li id=\"a043\"><em>Prediction Accuracy and Model Interpretability<\/em><\/li>\n<li id=\"a96c\"><em>Bias-Variance Trade-Off<\/em><\/li>\n<\/ul>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"2fd1\"><strong>Introduction<\/strong><\/h3>\n<p id=\"5f95\"><em>Statistical learning<\/em>\u00a0is a framework for understanding data based on statistics, which can be classified as\u00a0<em>supervised<\/em>\u00a0or\u00a0<em>unsupervised<\/em>. S<em>upervised statistical learning<\/em>\u00a0involves building a statistical model for predicting, or estimating, an output based on one or more inputs, while in\u00a0<em>unsupervised statistical learning<\/em>, there are inputs but no supervising output; but we can learn relationships and structure from such data.<\/p>\n<p id=\"34f0\">One of the simple way to understand statistical learning is to determine association between\u00a0<em>predictors (independent variables, features) &amp; response<\/em><em>(dependent variable)\u00a0<\/em>and developing a accurate model that can predict\u00a0<em>response variable (Y)\u00a0<\/em>on basis of\u00a0<em>predictor variables (X)<\/em>.<\/p>\n<p id=\"db88\"><em>Y = f(X) + \u025b where X = (X1,X2,\u00a0.\u00a0.\u00a0.,Xp)<\/em>, f is an\u00a0<em>unknown function<\/em>\u00a0&amp; \u025b is\u00a0<em>random error (reducible &amp; irreducible)<\/em>.<\/p>\n<p>&nbsp;<\/p>\n<h3 id=\"cd3c\"><strong>Prediction &amp; Inference<\/strong><\/h3>\n<p id=\"f056\">In situations where a set of inputs X are readily available, but the output Y is not known, we often treat f as black box (not concerned with the exact form of f), as long as it yields accurate predictions for Y\u00a0. This is\u00a0<em>prediction<\/em>.<\/p>\n<p id=\"b88c\">There are situations where we are interested in understanding the way that Y is affected as X change. In this situation we wish to estimate f, but our goal is not necessarily to make predictions for Y\u00a0. Here we are more interested in understanding relationship between X and Y. Now f cannot be treated as a black box, because we need to know its exact form. This is\u00a0<em>inference<\/em>.<\/p>\n<p id=\"8952\">In real life, will see a number of problems that fall into the\u00a0<em>prediction<\/em>\u00a0setting, the\u00a0<em>inference<\/em>\u00a0setting, or a combination of the two.<\/p>\n<figure id=\"f53f\"><canvas width=\"75\" height=\"56\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/0*Xd3qFalBR59aGzXH.jpg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/0*Xd3qFalBR59aGzXH.jpg\" \/><\/figure>\n<p style=\"text-align: center;\">Courtesy:\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=w09Ifi62p8k\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/www.youtube.com\/watch?v=w09Ifi62p8k\" data->https:\/\/www.youtube.com\/watch?v=w09Ifi62p8k<\/a><\/p>\n<h3 id=\"8eac\"><strong>Parametric &amp; Non-parametric methods<\/strong><\/h3>\n<p id=\"9e23\">When we make an assumption about the functional form of f and try to estimate f by estimating the set of parameters, these methods are called\u00a0<em>parametric methods<\/em>.<\/p>\n<p id=\"523d\"><em>f(X) = \u03b20 + \u03b21X1 + \u03b22X2 +\u00a0.\u00a0.\u00a0. + \u03b2pXp<\/em><\/p>\n<p id=\"b726\"><em>Non-parametric methods<\/em>\u00a0do not make explicit assumptions about the form of f, instead they seek an estimate of f that gets as close to the data points as possible.<\/p>\n<figure id=\"820f\"><canvas width=\"75\" height=\"56\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/0*8EwFq6wejo9qv-vF\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/0*8EwFq6wejo9qv-vF\" \/><\/figure>\n<p style=\"text-align: center;\">Courtesy:\u00a0<a href=\"https:\/\/www.slideshare.net\/zukun\/icml2004-tutorial-on-bayesian-methods-for-machine-learning\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/www.slideshare.net\/zukun\/icml2004-tutorial-on-bayesian-methods-for-machine-learning\" data->https:\/\/www.slideshare.net\/zukun\/icml2004-tutorial-on-bayesian-methods-for-machine-learning<\/a><\/p>\n<h3 id=\"22de\"><strong>Prediction Accuracy and Model Interpretability<\/strong><\/h3>\n<p id=\"edf4\">Of the many methods that we use for statistical learning, some are less flexible, or more restrictive. When inference is the goal, there are clear advantages to using simple and relatively inflexible statistical learning methods. When we are only interested in prediction, we use flexible models available.<\/p>\n<figure id=\"6d9b\"><canvas width=\"75\" height=\"46\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/0*8oazi9lCF_MB2jnU.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/0*8oazi9lCF_MB2jnU.png\" \/><\/figure>\n<p style=\"text-align: center;\">Courtesy:\u00a0<a href=\"https:\/\/towardsdatascience.com\/a-complete-machine-learning-walk-through-in-python-part-three-388834e8804b\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/towardsdatascience.com\/a-complete-machine-learning-walk-through-in-python-part-three-388834e8804b\" data->https:\/\/towardsdatascience.com\/a-complete-machine-learning-walk-through-in-python-part-three-388834e8804b<\/a><\/p>\n<h3 id=\"7c8f\"><strong>Assessing Model Accuracy<\/strong><\/h3>\n<p id=\"2991\"><em>There is no free lunch in statistics,\u00a0<\/em>which means no one method dominates all others over all possible data sets. In the regression setting, the most commonly-used measure is the\u00a0<em>mean squared error (MSE).\u00a0<\/em>In the classification setting, the most commonly-used measure is the\u00a0<em>confusion matrix. F<\/em>undamental property of statistical learning is that, as model flexibility increases,\u00a0<em>training error<\/em>\u00a0will decrease, but the\u00a0<em>test error<\/em>\u00a0may not.<\/p>\n<figure id=\"e38c\"><canvas width=\"75\" height=\"31\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/0*HEY1-wkfMdnI0aaR.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/0*HEY1-wkfMdnI0aaR.png\" \/><\/figure>\n<p style=\"text-align: center;\">Courtesy:\u00a0https:\/\/www.researchgate.net\/figure\/Basic-steps-to-assess-model-accuracy-Assuming-a-technically-approved-model-the-next-step_fig1_276365016<\/p>\n<h3 id=\"23fc\"><strong>Bias &amp; Variance<\/strong><\/h3>\n<p id=\"10d2\"><em>Bias<\/em>\u00a0are the simplifying assumptions made by a model to make the target function easier to learn.\u00a0<em>Parametric models<\/em>\u00a0have a\u00a0<em>high bias<\/em>\u00a0making them fast to learn and easier to understand but generally\u00a0<em>less flexible<\/em>.\u00a0<em>Decision Trees, k-Nearest Neighbors and Support Vector Machines\u00a0<\/em>are<em>\u00a0<\/em>low-bias machine learning algorithms.\u00a0<em>Linear Regression, Linear Discriminant Analysis and Logistic Regression<\/em>\u00a0are<em>\u00a0<\/em>high-bias machine learning algorithms.<\/p>\n<p id=\"2933\"><em>Variance<\/em>\u00a0is the amount that the estimate of the target function will change if different training data was used.\u00a0<em>Non-parametric models<\/em>\u00a0that have a lot of\u00a0<em>flexibility<\/em>\u00a0have a\u00a0<em>high variance<\/em>.\u00a0<em>Linear Regression, Linear Discriminant Analysis and Logistic Regression\u00a0<\/em>are<em>\u00a0<\/em>low-variance machine learning algorithms.\u00a0<em>Decision Trees, k-Nearest Neighbors and Support Vector Machines<\/em>\u00a0are<em>\u00a0<\/em>high-variance machine learning algorithms.<\/p>\n<figure id=\"9f8e\"><canvas width=\"75\" height=\"65\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*GiWb2klJvfJBqqe9AG4fjQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*GiWb2klJvfJBqqe9AG4fjQ.png\" \/><\/figure>\n<p style=\"text-align: center;\">Courtesy:\u00a0<a href=\"http:\/\/scott.fortmann-roe.com\/docs\/BiasVariance.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"http:\/\/scott.fortmann-roe.com\/docs\/BiasVariance.html\" data->http:\/\/scott.fortmann-roe.com\/docs\/BiasVariance.html<\/a><\/p>\n<h3 id=\"4c7a\"><strong>Bias-Variance Trade-Off<\/strong><\/h3>\n<p id=\"78a5\">The relationship between bias and variance in statistical learning is such that:<\/p>\n<ul>\n<li id=\"d1a9\"><em>Increasing bias<\/em>\u00a0will\u00a0<em>decrease variance<\/em>.<\/li>\n<li id=\"3f6e\"><em>Increasing variance<\/em>\u00a0will\u00a0<em>decrease bias<\/em>.<\/li>\n<\/ul>\n<p id=\"14ec\">There is a\u00a0<em>trade-off<\/em>\u00a0at play between these two concerns and the\u00a0<em>models<\/em>\u00a0we choose and the way we choose to\u00a0<em>configure<\/em>\u00a0them are finding different balances in this trade-off for our problem.<\/p>\n<p id=\"2686\">In both the\u00a0<em>regression<\/em>\u00a0and\u00a0<em>classification<\/em>\u00a0settings, choosing the correct level of\u00a0<em>flexibility<\/em>\u00a0is critical to the success of any statistical learning method. The\u00a0<em>bias-variance trade-off<\/em>, and the resulting\u00a0<em>U-shape<\/em>\u00a0in the\u00a0<em>test error<\/em>, can make this a difficult task.<\/p>\n<figure id=\"2d9f\"><canvas width=\"75\" height=\"46\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/0*nwzwSPT2QNkypRKE.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/0*nwzwSPT2QNkypRKE.png\" \/><\/figure>\n<p style=\"text-align: center;\"><em>Courtesy:\u00a0<\/em><a href=\"http:\/\/scott.fortmann-roe.com\/docs\/BiasVariance.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"http:\/\/scott.fortmann-roe.com\/docs\/BiasVariance.html\" data-><em>http:\/\/scott.fortmann-roe.com\/docs\/BiasVariance.html<\/em><\/a><\/p>\n<p style=\"text-align: center;\">\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>Statistical learning&nbsp;is a framework for understanding data based on statistics, which can be classified as&nbsp;supervised&nbsp;or&nbsp;unsupervised. Supervised statistical learning&nbsp;involves building a statistical model for predicting, or estimating, an output based on one or more inputs, while in&nbsp;unsupervised statistical learning, there are inputs but no supervising output; but we can learn relationships and structure from such data. One of the simple way to understand statistical learning is to determine association between&nbsp;predictors) &amp; response and developing a accurate model that can predict&nbsp;response variable on basis of&nbsp;predictor variables.<\/p>\n","protected":false},"author":280,"featured_media":3258,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187],"tags":[94],"ppma_author":[1811],"class_list":["post-1393","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":1811,"user_id":280,"is_guest":0,"slug":"ankit-rathi","display_name":"Ankit Rathi","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Rathi","first_name":"Ankit","job_title":"","description":"Ankit Rathi is Lead Architect at SITA, the leading &amp; innovative IT organization in ATI, delivering end-to-end analytics platforms using Data Science, Big Data &amp; Cloud. He is a Data Science Architect with extensive experience is designing &amp; developing data-intensive technology solutions including data architecture, data science, big data &amp; cloud."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1393","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/280"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1393"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1393\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3258"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1393"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1393"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1393"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1393"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}