{"id":2254,"date":"2020-02-12T05:18:55","date_gmt":"2020-02-12T02:18:55","guid":{"rendered":"http:\/\/kusuaks7\/?p=1859"},"modified":"2024-01-11T07:53:30","modified_gmt":"2024-01-11T07:53:30","slug":"understanding-dataset-shift","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/understanding-dataset-shift\/","title":{"rendered":"Understanding Dataset Shift"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"2254\" class=\"elementor elementor-2254\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-25e428a9 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"25e428a9\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-6a3b22bb\" data-id=\"6a3b22bb\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-4859daf elementor-widget elementor-widget-heading\" data-id=\"4859daf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3 id=\"8ff3\" style=\"color: #aaa;font-style: italic\" data-selectable-paragraph=\"\">How to not be fooled by the tricks data plays on you.<\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5abcfca elementor-widget elementor-widget-text-editor\" data-id=\"5abcfca\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<blockquote>\n<p id=\"a705\" data-selectable-paragraph=\"\">Dataset shift is a challenging situation where the joint distribution of inputs and outputs differs between the training and test stages.\u00a0<strong>\u2014\u00a0<\/strong><a href=\"https:\/\/cs.nyu.edu\/~roweis\/papers\/invar-chapter.pdf\" target=\"_blank\" rel=\"noopener nofollow noreferrer\" class=\"broken_link\"><strong><em>Dataset Shift, The MIT Press.<\/em><\/strong><\/a><\/p>\n<\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-74d6b4c elementor-widget elementor-widget-text-editor\" data-id=\"74d6b4c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"33bf\" data-selectable-paragraph=\"\"><a href=\"https:\/\/cs.nyu.edu\/~roweis\/papers\/invar-chapter.pdf\" target=\"_blank\" rel=\"noopener nofollow noreferrer\" class=\"broken_link\">Dataset shifting<\/a>\u00a0is one of those topics which is simple, perhaps so simple that it is considered trivially obvious. In my own data science classes the idea was discussed briefly, however, I think a deeper discussion of the causes and manifestations of dataset shift are of benefit to the data science community.<\/p>\n<p id=\"b521\" data-selectable-paragraph=\"\">The key theme of this article can be summarized in a single sentence:<\/p>\n<p id=\"94c4\" data-selectable-paragraph=\"\"><strong>Dataset shift is when the training and test distributions are different.<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-707b18b elementor-widget elementor-widget-image\" data-id=\"707b18b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1102\/1*0giRszXm8eg951HKxzvWZA.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-73d4ea5 elementor-widget elementor-widget-text-editor\" data-id=\"73d4ea5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAn example of differing training and test distributions.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8723b8b elementor-widget elementor-widget-image\" data-id=\"8723b8b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1570\/1*FaWL_bpexYoPmg9mQJCThA.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1e3cfae elementor-widget elementor-widget-text-editor\" data-id=\"1e3cfae\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"08d8\" data-selectable-paragraph=\"\">Whilst you may scoff at the triviality of such a statement, this is possibly the most common problem I see when viewing solutions to Kaggle challenges. In some ways, a deep understanding of dataset shifting is key to winning Kaggle competitions.<\/p>\n<p id=\"85e2\" data-selectable-paragraph=\"\">Dataset shift is not a standard term and is sometimes referred to as\u00a0<strong>concept shift<\/strong>\u00a0or\u00a0<strong>concept drift<\/strong>,\u00a0<strong>changes of classification<\/strong>,\u00a0<strong>changing environments<\/strong>,\u00a0<strong>contrast mining in classification learning<\/strong>,\u00a0<strong>fracture points<\/strong>\u00a0and\u00a0<strong>fractures between data.<\/strong><\/p>\n<p id=\"616e\" data-selectable-paragraph=\"\">Dataset shifting occurs predominantly within the machine learning paradigm of supervised and the hybrid paradigm of semi-supervised learning.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2e8fe07 elementor-widget elementor-widget-text-editor\" data-id=\"2e8fe07\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"8762\" data-selectable-paragraph=\"\">The problem of dataset shift can stem from the way input features are utilized, the way training and test sets are selected, data sparsity, shifts in the data distribution due to non-stationary environments, and also from changes in the activation patterns within layers of deep neural networks.<\/p>\n<p id=\"e7a5\" data-selectable-paragraph=\"\">Why is dataset shift important?<\/p>\n<p id=\"bb76\" data-selectable-paragraph=\"\">It is application-dependent and thus relies largely on the skill of the data scientist to examine and resolve. For example, how does one determine when the dataset has shifted sufficiently to pose a problem to our algorithms? If only certain features begin to diverge, how do we determine the trade-off between the loss of accuracy by removing features and the loss of accuracy by a misrepresented data distribution?<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-115d5e4 elementor-widget elementor-widget-text-editor\" data-id=\"115d5e4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"03b9\" data-selectable-paragraph=\"\">In this article, I will discuss the different types of dataset shift, problems that can arise from their presence, and current best practices that one can use to avoid them. This article contains no code examples and is purely conceptual. Classification examples will be used for ease of demonstration.<\/p>\n<p id=\"89c0\" data-selectable-paragraph=\"\">There are multiple manifestations of dataset shift that we will examine:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d1556d8 elementor-widget elementor-widget-text-editor\" data-id=\"d1556d8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n \t<li id=\"9f83\" data-selectable-paragraph=\"\">Covariate shift<\/li>\n \t<li id=\"f617\" data-selectable-paragraph=\"\">Prior probability shift<\/li>\n \t<li id=\"6ec7\" data-selectable-paragraph=\"\">Concept shift<\/li>\n \t<li id=\"d655\" data-selectable-paragraph=\"\">Internal covariate shift (an important subtype of covariate shift)<\/li>\n<\/ul>\n<p id=\"3156\" data-selectable-paragraph=\"\">This is a huge and important topic in machine learning so do not expect a comprehensive overview of this area. If the reader is interested in this subject then are a plethora of research articles on the topic \u2014 the vast majority of which focus on covariate shift.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b3692d7 elementor-widget elementor-widget-heading\" data-id=\"b3692d7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"de77\" data-selectable-paragraph=\"\">Covariate shift<\/h1><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-faceef3 elementor-widget elementor-widget-text-editor\" data-id=\"faceef3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"a428\" data-selectable-paragraph=\"\">Of all the manifestations of dataset shift, the simplest to understand is covariate shift.<\/p>\n\n<blockquote>\n<p id=\"e46a\" data-selectable-paragraph=\"\">Covariate shift is the change in the distribution of the\u00a0<em>covariates\u00a0<\/em>specifically, that is, the independent variables. This is normally due to changes in state of latent variables, which could be temporal (even changes to the stationarity of a temporal process), or spatial, or less obvious. \u2014\u00a0<a href=\"https:\/\/www.quora.com\/What-is-Covariate-shift\" target=\"_blank\" rel=\"noopener nofollow noreferrer\" class=\"broken_link\"><strong>Quora<\/strong><\/a><\/p>\n<\/blockquote>\n<p id=\"74a7\" data-selectable-paragraph=\"\">Covariate shift is the scholarly term for when the distribution of the data (i.e. our input features) changes.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-65b5cc5 elementor-widget elementor-widget-image\" data-id=\"65b5cc5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1638\/1*vWURgfF6EJ0QpWaVm5grhw.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-426ef19 elementor-widget elementor-widget-image\" data-id=\"426ef19\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1296\/1*3MLwY2rMHziKHPfZEkIuzA.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e51ddbd elementor-widget elementor-widget-text-editor\" data-id=\"e51ddbd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"8af9\" data-selectable-paragraph=\"\">Here are some examples where covariate shift is likely to cause problems:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bc442ba elementor-widget elementor-widget-text-editor\" data-id=\"bc442ba\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<ul>\n \t<li id=\"6325\" data-selectable-paragraph=\"\">Face recognition algorithms that are trained predominantly on younger faces, yet the dataset has a much larger proportion of older faces in it.<\/li>\n \t<li id=\"f318\" data-selectable-paragraph=\"\">Predicting life expectancy but having very few samples in the training set of individuals that smoke, and many more samples of this in the training set.<\/li>\n \t<li id=\"787a\" data-selectable-paragraph=\"\">Classifying images as either cats or dogs and omitting certain species from the training set that are seen in the test set.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d3365d1 elementor-widget elementor-widget-text-editor\" data-id=\"d3365d1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"2006\" data-selectable-paragraph=\"\">In this case, there is no change in the underlying relationship between the input and output (the regression line is still the same), yet part of that relationship is data-sparse, omitted, or misrepresented such that the test set and training set do not reflect the same distribution.<\/p>\n<p id=\"f4fa\" data-selectable-paragraph=\"\">Covariance shift can cause a lot of problems when performing cross-validation. Cross-validation is almost unbiased without covariate shift but it is heavily biased under covariate shift!<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bf289ba elementor-widget elementor-widget-heading\" data-id=\"bf289ba\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"1e81\" data-selectable-paragraph=\"\">Prior Probability Shift<\/h1><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bc9f9da elementor-widget elementor-widget-text-editor\" data-id=\"bc9f9da\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"fde4\" data-selectable-paragraph=\"\">Whilst covariate shift focuses on changes in the feature (<strong><em>x<\/em><\/strong>) distribution, prior probability shift focuses on changes in the distribution of the class variable\u00a0<strong><em>y<\/em><\/strong>.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-51e7fea elementor-widget elementor-widget-image\" data-id=\"51e7fea\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1724\/1*cB8oiGdLbvhMZhHqgCiXtw.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-451979e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"451979e\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-f79f118\" data-id=\"f79f118\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2264c6a elementor-widget elementor-widget-image\" data-id=\"2264c6a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1622\/1*a8N2VA7V0WeB3oGRFE6dIw.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-83a5f47 elementor-widget elementor-widget-text-editor\" data-id=\"83a5f47\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"2e1c\" data-selectable-paragraph=\"\">This type of shifting may seem slightly more confusing but is it essentially the reverse of covariate shift. An intuitive way to think about it might be to consider an unbalanced dataset.<\/p>\n<p id=\"0c76\" data-selectable-paragraph=\"\">If the training set has equal prior probabilities on the number of spam emails that you receive (i.e. the probability of an email being spam is 0.5), then we would expect 50% of the training set to contain spam emails and 50% to contain non-spam.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1ce3d3c elementor-widget elementor-widget-text-editor\" data-id=\"1ce3d3c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d3cf\" data-selectable-paragraph=\"\">If, in reality, only 90% of our emails are spam (perhaps not unlikely), then our prior probability of the class variables has changed. This idea has relations to data sparsity and biased feature selection that are factors in causing covariance shift, but instead of influencing our input distribution, they instead influence our output distribution.<\/p>\n<p id=\"6636\" data-selectable-paragraph=\"\">This problem only occurs in Y \u2192 X problems and is commonly associated with naive Bayes (hence the spam example, since naive Bayes is commonly used to filter spam emails).<\/p>\n<p id=\"697e\" data-selectable-paragraph=\"\">The below figure on prior probability shift is taken from the\u00a0<a href=\"http:\/\/www.acad.bg\/ebook\/ml\/The.MIT.Press.Dataset.Shift.in.Machine.Learning.Feb.2009.eBook-DDU.pdf\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">Dataset Shift in Machine Learning<\/a>\u00a0book and illustrates this case nicely.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c84fd6a elementor-widget elementor-widget-image\" data-id=\"c84fd6a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1920\/1*WbFYon664l1jjqbYPsu4iw.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9201bdb elementor-widget elementor-widget-heading\" data-id=\"9201bdb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"0e29\" data-selectable-paragraph=\"\"><strong>Concept Drift<\/strong><\/h1><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-de65f19 elementor-widget elementor-widget-text-editor\" data-id=\"de65f19\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3d16\" data-selectable-paragraph=\"\">Concept drift is different from covariate and prior probability shift in that it is not related to the data distribution or the class distribution but instead is related to the relationship between the two variables.<\/p>\n<p id=\"8a14\" data-selectable-paragraph=\"\">An intuitive way to think about this idea is by looking at time series analysis.<\/p>\n<p id=\"e342\" data-selectable-paragraph=\"\">In time series analysis, it is common to examine whether the time series is stationary before performing any analysis, as stationary time series are much easier to analyze than non-stationary time series.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2e4926b elementor-widget elementor-widget-image\" data-id=\"2e4926b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/725\/0*5ods4McATY6DHK84.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8da11d1 elementor-widget elementor-widget-text-editor\" data-id=\"8da11d1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"475a\" data-selectable-paragraph=\"\">Why is this the case?<\/p>\n<p id=\"fea7\" data-selectable-paragraph=\"\">This is easier because the relationship between the input and output is not consistently changing! There are ways of detrending a time series to make it stationary, but this does not always work (such as in the case of stock indices that generally contain little autocorrelation or secular variation).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f7ffa46 elementor-widget elementor-widget-image\" data-id=\"f7ffa46\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1574\/1*Q_zYnVXvHpD86BCcTENkNw.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-28ee3d4 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"28ee3d4\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3b156b0\" data-id=\"3b156b0\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ece9233 elementor-widget elementor-widget-text-editor\" data-id=\"ece9233\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"239b\" data-selectable-paragraph=\"\">To give a more concrete example, let\u2019s say we examined the profits of companies before the 2008 financial crisis and made an algorithm to predict the profit based on factors such as the industry, number of employees, information about products, and so on. If our algorithm is trained on data from 2000\u20132007, but are not using it to predict the same information after the financial crisis, it is likely to perform poorly.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-05816c7 elementor-widget elementor-widget-text-editor\" data-id=\"05816c7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"ba35\" data-selectable-paragraph=\"\">So what changed? Clearly, the overall relationship between the inputs and outputs changed due to the new socio-economic environment, and, if these are not reflected in our variables (such as having a dummy variable for the date that the financial crisis occurred and training data before and after this date) then our model is going to suffer the consequences of concept shift.<\/p>\n<p id=\"33e8\" data-selectable-paragraph=\"\">In our specific case, we would expect to see profits change markedly in the years after the financial crisis (this is an example of an\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Interrupted_time_series\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">interrupted time series<\/a>).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-869e60d elementor-widget elementor-widget-heading\" data-id=\"869e60d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"9a04\" data-selectable-paragraph=\"\">Internal Covariate Shift<\/h1><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6f8c29d elementor-widget elementor-widget-text-editor\" data-id=\"6f8c29d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"6290\" data-selectable-paragraph=\"\">One reason this topic has gained interest recently is due to the suspected influence of covariance shift in the hidden layers of deep neural networks (hence the word \u2018internal\u2019).<\/p>\n<p id=\"f630\" data-selectable-paragraph=\"\">Researchers found that due to the variation in the distribution of activations from the output of a given hidden layer, which are used as the input to a subsequent layer, the network layers can suffer from covariate shift which can impede the training of deep neural networks.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c2e3a25 elementor-widget elementor-widget-image\" data-id=\"c2e3a25\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/proxy\/1*X76xBHMpT4rzws_-L2s1TA.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-db82643 elementor-widget elementor-widget-text-editor\" data-id=\"db82643\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\" data-selectable-paragraph=\"\">The situation without batch normalization, network activations are exposed to varying data input distributions that propagate through the network and distort the learned distributions.<\/p>\n<p id=\"40ad\" data-selectable-paragraph=\"\">This idea is the stimulus of\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Batch_normalization\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">batch normalization<\/a>, proposed by Christian Szegedy and Sergey Ioffe in their 2015 paper\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1502.03167.pdf\" target=\"_blank\" rel=\"noopener nofollow noreferrer\"><em>\u201cBatch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift\u201d<\/em><\/a><em>.<\/em><\/p>\n<p id=\"3467\" data-selectable-paragraph=\"\">The authors propose that internal covariate shift in the hidden layers slows down training and requires lower learning rates and careful parameter initialization. They resolve this by normalizing the inputs to hidden layers by adding a batch normalization layer.<\/p>\n<p id=\"e0b2\" data-selectable-paragraph=\"\">This batch norm layer takes the mean and standard deviation of a batch of samples and uses them to standardize the input. This also adds some noise to the inputs (because of the noise inherent in the mean and standard deviation between different batches) which helps to regularize the network.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2f1f677 elementor-widget elementor-widget-image\" data-id=\"2f1f677\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1276\/1*y-hdspSxBxe9EL8vz76Drw.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a65cbaf elementor-widget elementor-widget-text-editor\" data-id=\"a65cbaf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\" data-selectable-paragraph=\"\">How batch normalization fits within the network architecture of deep neural networks.<\/p>\n<p id=\"8d04\" data-selectable-paragraph=\"\">This problem acts to translate the varying distribution to more stable internal data distributions (less drift\/oscillations) that helps to stabilize learning.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5379cbe elementor-widget elementor-widget-image\" data-id=\"5379cbe\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/638\/0*q68JPcHskgSUBAgr\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7e59a10 elementor-widget elementor-widget-text-editor\" data-id=\"7e59a10\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\" data-selectable-paragraph=\"\">Varying data distributions across batches are normalized via a batch normalization layer in order to stabilize the data distribution used as input to subsequent layers in a deep neural network.<\/p>\n<p id=\"a3c5\" data-selectable-paragraph=\"\">Batch normalization is now well adopted in the deep learning community, although a\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1805.11604.pdf\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">recent paper<\/a>\u00a0alluded that the improved results obtained from this technique may not be purely due to the suppression of internal covariate shift, and may instead be a result of smoothing the loss landscape of the network.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5248764 elementor-widget elementor-widget-text-editor\" data-id=\"5248764\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"0662\" data-selectable-paragraph=\"\">For those unfamiliar with batch normalization, its purpose, and its implementation, I recommend looking at the relevant Youtube videos of Andrew Ng, one of which is linked below.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-13cc2fc elementor-widget elementor-widget-text-editor\" data-id=\"13cc2fc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure><iframe title=\"Normalizing Activations in a Network (C2W3L04)\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FtNIpEZLv_eg%3Ffeature%3Doembed&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DtNIpEZLv_eg&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FtNIpEZLv_eg%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube\" width=\"700\" height=\"380\" frameborder=\"0\" scrolling=\"no\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-7cf1562 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"7cf1562\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-0f54a99\" data-id=\"0f54a99\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-7e77ecd elementor-widget elementor-widget-heading\" data-id=\"7e77ecd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"3e67\" data-selectable-paragraph=\"\">Major Causes of Dataset Shift<\/h1><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a6b2387 elementor-widget elementor-widget-text-editor\" data-id=\"a6b2387\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"7a83\" data-selectable-paragraph=\"\">The two most common causes of dataset shift are (1)\u00a0<strong>sample selection bias<\/strong>\u00a0and (2)\u00a0<strong>non-stationary environments<\/strong>.<\/p>\n<p id=\"5de8\" data-selectable-paragraph=\"\">It is important to note that these are not types of dataset shift, and do not always result in dataset shift. They are merely potential reasons that dataset shift can occur in our data.<\/p>\n<p id=\"2444\" data-selectable-paragraph=\"\"><strong>Sample selection bias:\u00a0<\/strong>the discrepancy in distribution is due to training data having been obtained through a biased method, and thus do not represent reliably the operating environment where the classifier is to be deployed (which, in machine learning terms, would constitute the test set).<\/p>\n<p id=\"1276\" data-selectable-paragraph=\"\"><strong>Non -stationary environments:\u00a0<\/strong>when the training environment is different from the test one, whether it is due to a temporal or a spatial change.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d309d01 elementor-widget elementor-widget-heading\" data-id=\"d309d01\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"a3fe\" data-selectable-paragraph=\"\">Sample Selection Bias<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-00dc3ee elementor-widget elementor-widget-text-editor\" data-id=\"00dc3ee\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"f0c8\" data-selectable-paragraph=\"\">Sample selection bias is not a flaw with any algorithm or handling of the data. It is purely a systematic flaw in the process of data collection or labeling which causes nonuniform selection of training examples from a population, which causes biases to form during training.<\/p>\n<p id=\"a290\" data-selectable-paragraph=\"\">Sample selection bias is a form of covariance shift since we are influencing our data distribution.<\/p>\n<p id=\"c1f8\" data-selectable-paragraph=\"\">This can be thought of as a misrepresentation of the operating environment such that our model optimizes its training environment to a factitious or cherry-picked operating environment.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0c1741b elementor-widget elementor-widget-image\" data-id=\"0c1741b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1724\/1*u4noYjd10_Yjz6hCJHFMJA.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-82f7010 elementor-widget elementor-widget-text-editor\" data-id=\"82f7010\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"707d\" data-selectable-paragraph=\"\">Dataset shift resulting from sample selection bias is especially relevant when dealing with imbalanced classification, because, in highly imbalanced domains, the minority class is particularly sensitive to singular classification errors, due to the typically low number of samples it presents.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a9f655c elementor-widget elementor-widget-image\" data-id=\"a9f655c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1542\/1*TO_p8KEzl8325wp69q0rIg.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-13c49b4 elementor-widget elementor-widget-text-editor\" data-id=\"13c49b4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\" data-selectable-paragraph=\"\">Example of the impact of dataset shift in imbalanced domains.<\/p>\n<p id=\"4bcf\" data-selectable-paragraph=\"\">In the most extreme cases, a single misclassified example of the minority class can create a significant drop in performance.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d87e76c elementor-widget elementor-widget-heading\" data-id=\"d87e76c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"fd1b\" data-selectable-paragraph=\"\">Non -stationary environments<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-da8aea2 elementor-widget elementor-widget-text-editor\" data-id=\"da8aea2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"5063\" data-selectable-paragraph=\"\">In real-world applications it world applications, it is often the case that the data is not (time- or space-) stationary.<\/p>\n<p id=\"634c\" data-selectable-paragraph=\"\">One of the most relevant non-stationary scenarios involves adversarial classification problems, such as spam filtering and network intrusion detection.<\/p>\n<p id=\"0516\" data-selectable-paragraph=\"\">This type of problem is receiving an increasing increasing amount of attention in the machine learning field and usually copes with non-stationary environments due to the existence of an adversary that tries to work around the existing classifier\u2019s learned concepts. In terms of the machine learning task, this adversary warps the test set so that it becomes different from the training set, thus introducing any possible kind of dataset shift.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2a7c0ff elementor-widget elementor-widget-heading\" data-id=\"2a7c0ff\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"ea17\" data-selectable-paragraph=\"\">Identifying Dataset Shift<\/h1><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dd0d585 elementor-widget elementor-widget-text-editor\" data-id=\"dd0d585\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"bd78\" data-selectable-paragraph=\"\">There are several methods that can be used to determine whether shifting is present in a dataset and its severity.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ab9403c elementor-widget elementor-widget-image\" data-id=\"ab9403c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1760\/1*uwp8xm8T9Hney1WZtHWwWQ.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b9a4674 elementor-widget elementor-widget-text-editor\" data-id=\"b9a4674\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\" data-selectable-paragraph=\"\">Tree diagram showing the methods of identifying dataset shift.<\/p>\n<p id=\"9d8e\" data-selectable-paragraph=\"\">Unsupervised methods are perhaps the most useful ways of identifying dataset shift, as they do not require post-hoc analysis to be done, the latency of which cannot be afforded in some production systems. Supervised methods exist which essentially look at growing errors as the model runs and the performance on an external holdout (validation set).<\/p>\n<p id=\"8249\" data-selectable-paragraph=\"\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-35bd176 elementor-widget elementor-widget-text-editor\" data-id=\"35bd176\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"8249\" data-selectable-paragraph=\"\"><strong>Statistical Distance<\/strong>\nThe\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Statistical_distance\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">s<em>tatistical distance<\/em><\/a>\u00a0method is useful for detecting if your model predictions change over time. This is done by creating and using histograms. By making histograms, you are not only able to detect whether your model predictions change over time, but also check if your most important features change over time. Simply put, you form histograms of your training data, keep track of them over time, and compare them to see any changes. This method is used commonly by financial institutions on credit-scoring models.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b919966 elementor-widget elementor-widget-image\" data-id=\"b919966\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/800\/0*8pehAal_Mwbgaw-S.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0446459 elementor-widget elementor-widget-text-editor\" data-id=\"0446459\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\" data-selectable-paragraph=\"\">Two distributions are their KL-divergence (effectively the \u2018distance\u2019 between the two distributions). If the two distributions overlap, they are effectively the same distribution and the KL-divergence is zero.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7573a8a elementor-widget elementor-widget-text-editor\" data-id=\"7573a8a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9857\" data-selectable-paragraph=\"\">There are several metrics which can be used to monitor the change in model predictions over time. These include the\u00a0<a href=\"https:\/\/www.quora.com\/What-is-population-stability-index\" target=\"_blank\" rel=\"noopener nofollow noreferrer\" class=\"broken_link\"><strong>Population Stability Index<\/strong><\/a><strong>\u00a0(PSI)<\/strong>,\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Kolmogorov%E2%80%93Smirnov_test\" target=\"_blank\" rel=\"noopener nofollow noreferrer\"><strong>Kolmogorov-Smirnov statistic<\/strong><\/a>,\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence\" target=\"_blank\" rel=\"noopener nofollow noreferrer\"><strong>Kullback-Lebler divergence<\/strong><\/a>\u00a0(or other\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/F-divergence\" target=\"_blank\" rel=\"noopener nofollow noreferrer\"><em>f-<\/em>divergences<\/a>), and\u00a0<a href=\"http:\/\/blog.datadive.net\/histogram-intersection-for-change-detection\/\" target=\"_blank\" rel=\"noopener nofollow noreferrer\"><strong>histogram intersection<\/strong><\/a>.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-ba376b2 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ba376b2\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-da9d574\" data-id=\"da9d574\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5163291 elementor-widget elementor-widget-image\" data-id=\"5163291\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/640\/0*5xvLZOGQZilh5U_S.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-1ac2fdc elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"1ac2fdc\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-862e55c\" data-id=\"862e55c\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-b574010 elementor-widget elementor-widget-text-editor\" data-id=\"b574010\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\" data-selectable-paragraph=\"\">Data plotted along one feature axis for a training and test set. There is ~72% intersection of the distributions which indicates a reasonable level of covariate shift between the distributions.<\/p>\n<p id=\"0520\" data-selectable-paragraph=\"\">The major disadvantage of this method is that is not great for high-dimensional or sparse features. However, it can be very useful and in my opinion should be the first thing to try when dealing with this issue.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1d60f56 elementor-widget elementor-widget-image\" data-id=\"1d60f56\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1672\/1*ys9UIdNFJmmbMsphe-U2wg.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3d5c7e7 elementor-widget elementor-widget-text-editor\" data-id=\"3d5c7e7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\" data-selectable-paragraph=\"\">A comparison between KL-divergence, KS statistic, PSI, and histogram intersection for two examples. The left example shows little to no covariate shift, whilst the right example shows a substantial covariate shift. Notice how it affects the expected values of the statistical distances.<\/p>\n<p id=\"8c66\" data-selectable-paragraph=\"\"><strong>2) Novelty Detection<\/strong>\nA method that is more amenable to fairly complex domains such as computer vision, is\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Novelty_detection\" target=\"_blank\" rel=\"noopener nofollow noreferrer\"><em>novelty detection<\/em><\/a>. The idea is to create a model for modeling source distribution. Given a new data point, you try to test what is the likelihood that this data point is drawn from the source distribution. For this method, you can use various techniques such as a one-class support vector machine, available in most common libraries.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e5918cf elementor-widget elementor-widget-image\" data-id=\"e5918cf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/640\/0*u_NPdjKgwRSg1qAJ.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7b1aaa3 elementor-widget elementor-widget-text-editor\" data-id=\"7b1aaa3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"2c98\" data-selectable-paragraph=\"\">If you are in a regime of homogenous but very complex interactions (e.g. visual, audio, or remote sensing), then this is a method you should look into, because in that case, the statistical distance (histogram method) won\u2019t be as effective a method.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a25f320 elementor-widget elementor-widget-text-editor\" data-id=\"a25f320\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"96ff\" data-selectable-paragraph=\"\">The major disadvantage of this method is that it cannot tell you explicitly what has changed, only that there has been a change.<\/p>\n<p id=\"3798\" data-selectable-paragraph=\"\"><strong>3) Discriminative Distance<\/strong>\nThe\u00a0<a href=\"https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0031320313000307\" target=\"_blank\" rel=\"noopener nofollow noreferrer\" class=\"broken_link\"><em>discriminative distance<\/em><\/a>\u00a0method is less common, nonetheless, it can be effective. The intuition is that you want to train a classifier to detect whether or not an example is from the source or target domain. You can use the training error as proxy of the distance between those two distributions. The higher the error, the closer they are (i.e. the classifier cannot discriminate between the source and target domain).<\/p>\n<p id=\"f642\" data-selectable-paragraph=\"\">Discriminative distance is widely applicable and high dimensional. Though it takes time and can be very complicated, this method is a useful technique if you are doing domain adaptation (and for some deep learning methods, this may be the only feasible technique that exists).<\/p>\n<p id=\"9754\" data-selectable-paragraph=\"\">This method is good for high-dimensional and sparse data, and is widely applicable. However, it can only be done offline and is more complicated to implement than the previous methods.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-466c453 elementor-widget elementor-widget-heading\" data-id=\"466c453\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"fb7a\" data-selectable-paragraph=\"\">Handling Dataset Shift<\/h1><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2cb2fd9 elementor-widget elementor-widget-text-editor\" data-id=\"2cb2fd9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3995\" data-selectable-paragraph=\"\">How do you correct dataset shift? If possible, you should always retrain. Of course, in some situations, it may not be possible, for example, if there are latency problems with retraining. In such cases, there are several techniques for correcting dataset shift.<\/p>\n<p id=\"b200\" data-selectable-paragraph=\"\"><strong>1) Feature Removal<\/strong><\/p>\n<p id=\"400b\" data-selectable-paragraph=\"\">By utilizing the statistical distance methods discussed above which are used to identify covariate shift, we can use these as measures of the extent of the shifting. We can set a boundary on what is deemed an acceptable level of shift, and analyzing individual features or through an ablation study, we can determine which features are most responsible for the shifting and remove these from the dataset.<\/p>\n<p id=\"726c\" data-selectable-paragraph=\"\">As you may expect, there is a trade-off between removing features that contribute to the covariate shift and having additional features and tolerating some covariate shift. This trade-off is something that the data scientist would need to assess on a case-by-case basis.<\/p>\n<p id=\"eb3c\" data-selectable-paragraph=\"\">A feature that differs a lot during training and test, but does not give you a lot of predictive power, should always be dropped.<\/p>\n<p id=\"6db3\" data-selectable-paragraph=\"\">As an example, PSI is used in risk management and an arbitrary value of 0.25 is used as the limit, above which this is deemed as a major shift.<\/p>\n<p id=\"5ec9\" data-selectable-paragraph=\"\"><strong>2) Importance Reweighting<\/strong>\nThe main idea with importance reweighting is that you want to upweight training instances that are very similar to your test instances. Essentially, you try to change your training data set such that it looks like it was drawn from the test data set. The only thing required for this method is unlabeled examples for the test domain. This may result in data leakage from the test set.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a3091c9 elementor-widget elementor-widget-image\" data-id=\"a3091c9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/2408\/1*e8jBqBE9E6Dq9I1WaeK_TQ.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-91d27a7 elementor-widget elementor-widget-text-editor\" data-id=\"91d27a7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\" data-selectable-paragraph=\"\">On the left, we have our typical training set and in the center our test set. We estimate the data probability of the training and test sets and use this to rescale our training set to produce the training set on the right (notice the size of the points has got larger, this represents the \u2018weight\u2019 of the training example).<\/p>\n<p id=\"8727\" data-selectable-paragraph=\"\">To make it clear how this works, we basically reweight each of the training examples by the relative probability of the training and test set. We can do this by density estimation, through kernel methods such as kernel mean matching, or through discriminative reweighting.<\/p>\n<p id=\"a8e6\" data-selectable-paragraph=\"\"><strong>3) Adversarial Search<\/strong><\/p>\n<p id=\"655c\" data-selectable-paragraph=\"\">The\u00a0<em>adversarial search<\/em>\u00a0method uses an adversarial model where the learning algorithm attempts to construct a predictor that is robust to the deletion of features at test time.<\/p>\n<p id=\"70fd\" data-selectable-paragraph=\"\">The problem is formulated as finding the optimal minimax strategy with respect to an adversary which deletes features and shows that the optimal strategy may be found by either solving a quadratic program or using efficient bundle methods for optimization.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f859685 elementor-widget elementor-widget-text-editor\" data-id=\"f859685\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section>\n\n<hr \/>\n\n<section>\n<p id=\"cd9e\" data-selectable-paragraph=\"\">Covariate shift has been extensively studied in the literature, and a number of proposals to work under it have been published. Some of the most important ones include:<\/p>\n\n<ul>\n \t<li id=\"510a\" data-selectable-paragraph=\"\">Weighting the log-likelihood function (Shimodaira, 2000)<\/li>\n \t<li id=\"8aec\" data-selectable-paragraph=\"\">Importance weighted cross-validation (Sugiyama et al, 2007 JMLR)<\/li>\n \t<li id=\"f00e\" data-selectable-paragraph=\"\">Integrated optimization problem. Discriminative learning. (Bickel et al, 2009 JMRL)<\/li>\n \t<li id=\"466c\" data-selectable-paragraph=\"\">Kernel mean matching (<a href=\"http:\/\/www.gatsby.ucl.ac.uk\/~gretton\/papers\/covariateShiftChapter.pdf\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">Gretton et al., 2009<\/a>)<\/li>\n \t<li id=\"bad0\" data-selectable-paragraph=\"\">Adversarial search (<a href=\"http:\/\/www.acad.bg\/ebook\/ml\/The.MIT.Press.Dataset.Shift.in.Machine.Learning.Feb.2009.eBook-DDU.pdf\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">Globerson et al, 2009<\/a>)<\/li>\n \t<li id=\"0d1d\" data-selectable-paragraph=\"\">Frank-Wolfe algorithm (<a href=\"https:\/\/webdocs.cs.ualberta.ca\/~dale\/papers\/ijcai15.pdf\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">Wen et al., 2015<\/a>)<\/li>\n<\/ul>\n<\/section>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7e4d3c5 elementor-widget elementor-widget-heading\" data-id=\"7e4d3c5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"40b4\" data-selectable-paragraph=\"\">Final Comments<\/h1><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ff099af elementor-widget elementor-widget-text-editor\" data-id=\"ff099af\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"6532\" data-selectable-paragraph=\"\">Dataset shift is a topic that is, in my estimation, extremely important and yet undervalued by people in the field of data science and machine learning.<\/p>\n<p id=\"057f\" data-selectable-paragraph=\"\">Given the impact it can have on the performance of our algorithms, I suggest spending some time working out how to handle data properly in order to give you more confidence in your models, and, hopefully, superior performance.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9f4c203 elementor-widget elementor-widget-heading\" data-id=\"9f4c203\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"439c\" data-selectable-paragraph=\"\">References<\/h1><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-96d1b7e elementor-widget elementor-widget-text-editor\" data-id=\"96d1b7e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"13ab\" data-selectable-paragraph=\"\">[1]\u00a0<a href=\"http:\/\/iwann.ugr.es\/2011\/pdf\/InvitedTalk-FHerrera-IWANN11.pdf\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">http:\/\/iwann.ugr.es\/2011\/pdf\/InvitedTalk-FHerrera-IWANN11.pdf<\/a><\/p>\n<p id=\"a564\" data-selectable-paragraph=\"\">[2] J.G. Moreno-Torres, T. Raeder, R. Alaiz-Rodr\u00edguez, N.V. Chawla, F. Herrera. A Unifiying view of Data Shift in Classification. Pattern Recognition, 2011, In press.<\/p>\n<p id=\"4d99\" data-selectable-paragraph=\"\">[3] J. Qui\u00f1onero Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence. Dataset Shift in Machine Learning Shift in Machine Learning. The MIT Press 2009 The MIT Press, 2009.<\/p>\n<p id=\"6b11\" data-selectable-paragraph=\"\">[4] Raeder, Hoens &amp; Chawla. Consequences of Variability in Classifier of Variability in Classifier Performance Estimates., ICDM \u201910 Proceedings of the 2010 IEEE International Conference on Data Mining.<\/p>\n<p id=\"fc12\" data-selectable-paragraph=\"\">[5] Moreno-Torres, J. G., &amp; Herrera, F. (2010). A preliminary study on overlapping and data fracture in imbalanced domains by means of genetic programming-based feature extraction. In Proceedings of the 10th International Conference on Intelligent Systems Design and Applications (ISDA 2010) (pp. 501\u2013506).<\/p>\n<p id=\"d080\" data-selectable-paragraph=\"\">[6]\u00a0<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5070592\/pdf\/f1000research-5-10228.pdf\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5070592\/pdf\/f1000research-5-10228.pdf<\/a><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Dataset shift is a topic that is extremely important and yet undervalued by people in the field of data science and machine learning. Given the impact, it can have on the performance of our algorithms, spend some time working out how to handle data properly in order to give you more confidence in your models, and, hopefully, superior performance. The key theme of this article can be summarized in a single sentence: Dataset shift is when the training and test distributions are different.<\/p>\n","protected":false},"author":682,"featured_media":3658,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[94],"ppma_author":[3471],"class_list":["post-2254","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":3471,"user_id":682,"is_guest":0,"slug":"matthew-stewart","display_name":"Matthew Stewart","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/04\/medium_c57055f3-5301-4262-af65-4cc7d40cbf3d-150x150.jpg","user_url":"https:\/\/criticalfutureglobal.com\/","last_name":"Stewart","first_name":"Matthew","job_title":"","description":"Matthew Stewart is a Machine Learning consultant on AI for\u00a0<a href=\"https:\/\/www.criticalfutureglobal.com\/\" target=\"_blank\" rel=\"noopener\">Critical Future<\/a>, and machine learning engineer at Scalable Magic, an AI-based digital media startup. He is also a Graduate Teaching Assistant and a Ph.D. Candidate at Harvard University."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2254","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/682"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=2254"}],"version-history":[{"count":10,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2254\/revisions"}],"predecessor-version":[{"id":35474,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2254\/revisions\/35474"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3658"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=2254"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=2254"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=2254"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=2254"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}