{"id":1291,"date":"2019-02-15T10:32:03","date_gmt":"2019-02-15T07:32:03","guid":{"rendered":"http:\/\/kusuaks7\/?p=896"},"modified":"2023-07-31T10:16:46","modified_gmt":"2023-07-31T10:16:46","slug":"doubt-and-verify-data-science-power-tools","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/doubt-and-verify-data-science-power-tools\/","title":{"rendered":"Doubt and Verify: Data Science Power Tools"},"content":{"rendered":"<div>\n<p><strong><em>Ready to learn Data Science? <a href=\"https:\/\/www.experfy.com\/training\/courses\">Browse courses<\/a>\u00a0like\u00a0<a href=\"https:\/\/www.experfy.com\/training\/tracks\/data-science-training-certification\">Data Science Training and Certification<\/a> developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/em><\/strong><\/p>\n<p>Doubt everything. Use evidence-\u00ad\u2010based methods to verify things that matter.<\/p>\n<p>Did Our Teachers Lie to Us? Do Doctors, Lawyers, and Other ProfessionalsLie?<\/p>\n<p>I had wonderful, engaged and engaging, deeply knowledgeable, and opinionated teachers in my rural Canadian high school, especially in history (\u201cGrannie\u201d Smith and Mr. Snider), chemistry (Mr. Fish and Mr. Thompson), and English (\u201cWild Bill\u201d Elliot). They were passionate about their topics, taught us amazing things, and made huge contributions to my \u201cworld model\u201d.<\/p>\n<p style=\"text-align: center;\">\n<p>To what extent were our teacher\u2019s facts true and their conclusions or \u201ctheories\u201d accurate reflections of the state of the world that they taught? In history, English, chemistry, and most domains there is substantial continuing research resulting in changes to understanding those aspects of the world. Was Waterloo fought at Waterloo? Were the Dark Ages truly dark in any significant way? (Ref: \u201cHistory is the unfolding of miscalculation.\u201d Barbara Tuchman, United States historian (1912-\u00ad\u20101989))<\/p>\n<\/div>\n<p>Similarly, in the 1980\u2019s Boston physicians voted Dr. Solomon, my doctor, the best doctor in Boston. He has been marvelous, providing me 33 years of wonderful healthcare. As in the humanities and sciences, there has been massive research and consequent progress in healthcare, fitness, medicine, and pharmacology leading to radically new understandings in those areas.<\/p>\n<div>\n<p>The job of a professional -\u00ad\u2010 doctor, lawyer, teacher -\u00ad\u2010 is to do the best that they can to ensure good health, the best legal protection, and the best education, based on the most current information and knowledge in their area -\u00ad\u2010 the most current medical practices, avoiding proven bad practices, the most current laws, the most current understanding of history, chemistry, and English. Whatis the chance that any professional can satisfy these requirements? Secondly, to what extent can they do their job in a neutral or unbiased way? While having a bias is highly likely, do professional understand that they have a bias and do they make clear to their \u201ccustomers\u201d that they have a bias and its nature so that customers can choose either the biased information or select information thatbest suits their needs? While doctors, lawyers, and teachers may follow a specific orientation (e.g., bias for or against using certain drugs or procedures, specific interpretations of the law or of history), in principle, they should present their material in an unbiased way to allow customers to formulate and select amongst alternative views.<\/p>\n<p>In all fields new facts and knowledge are constantly being produced based on new data, discoveries, experience, and research -\u00ad\u2010 far more than a single individual can absorb let alone put into practice. So how do professionals -\u00ad\u2010 how does anyone:<\/p>\n<ul>\n<li>Understand that they have a bias, its nature, and limitations?<\/li>\n<\/ul>\n<ul>\n<li>Weigh their bias and those of others in their thinking and actions?<\/li>\n<\/ul>\n<ul>\n<li>Re-\u00ad\u2010evaluate their knowledge (worldview) in light of new facts (\u201cground truth\u201d) and conclusions?<\/li>\n<\/ul>\n<p>Dealing with these issues is a massive undertaking both conceptually (changing their world model) and practically (keeping up to date) in addition to and often separate from their day job of patient care, legal cases, and teaching. Since facts and conclusions (theories) change so fast, it is unlikely that most professionals are \u201cup to date\u201d, notwithstanding the fact that there is no consensus as to what it means to be \u201cup to date\u201d. The Higgs Boson was not discovered overnight (proven within five sigma) in 2012 -\u00ad\u2010 a marvelous year for high energy particle physics -\u00ad\u2010 but over 40 years and indeed to ten sigma only in 2014.<\/p>\n<p style=\"text-align: center;\">\n<p>Since knowledge is the cumulative result of learning and forgetting -\u00ad\u2010 verifying constantly emerging facts and deducing (empirically or theoretically) new conclusions \u2013 our knowledge (world model) is necessarily incomplete hence highly likely to be faulty and biased (oriented towards specific beliefs). It is highly unlikely that a professional operates in an unbiased way in full knowledge of empirically proven facts and theories. Hence, as history has often proven (e.g., models of \u201celementary particle\u201d physics) much education and professional activities are based on \u201cfaulty learning\u201d.<\/p>\n<h3><strong>Conclusions<\/strong><\/h3>\n<ul>\n<li>All knowledge (e.g., professional world models) is necessarily incomplete, faulty, and biased; hence students, patients, and clients receive faulty knowledge and work products from their teachers, doctors, and lawyers. Errors in legal and some medical matters can be arbitrated in the courts. Errors in medical matters may result in harm to patients. Errors in education are more insidious (despite the Scopes Trail).<\/li>\n<\/ul>\n<ul>\n<li>Of course, this is also true of our personal world models. We should accept that our world models are best effort \u2013 based on a long history of knowledge and experience but faulty and biased \u2013 and worthy of re-examination based on meaningful new evidence.<\/li>\n<\/ul>\n<ul>\n<li>The best we can do is to recognize the limitations of our facts and knowledge and question or investigate those things that are most vital to the well being of those that we care for and to our<\/li>\n<\/ul>\n<p>The good news is that evidence-\u00ad\u2010based means are emerging to support the Doubt and Verify approach for most domains.<\/p>\n<p>Doubt Everything: Enter Evidence-\u00ad\u2010Based Reasoning<\/p>\n<p>How does one verify (i.e., prove the probabilistic likelihood of) a potentially questionable fact or conclusion (theory), e.g., information from teachers or professionals? The answer was empiricism used to establish causality or WHY a phenomenon occurs. Following the erstwhile modern Scientific Method (a.k.a. the Third Paradigm of Scientific and Engineering Discovery) and appealing to community-\u00ad\u2010accepted norms you conduct an experiment or a Randomized Clinical Trial (RCT) to verify the hypothesis in question by establishing that in all probability what circumstances produce the hypothesized phenomenon. This too is changing, in two steps. First, in 2007 Jim Gray and others identified the emerging Fourth Paradigm [of Scientific and Engineering Discovery] (a.k.a. eScience) [2]in which massive amounts of data and computational power are used to identify highly probable hypotheses or trends. This is data-\u00ad\u2010driven or evidence-\u00ad\u2010based scientific and engineering discovery that identifies significant evidence that some phenomenon, i.e., WHAT, has occurred. Second, in the intervening years, there has been an evolution of Jim\u2019s great insight. The emerging answer is a combination of evidence-\u00ad\u2010based analysis (i.e., WHAT established by Data-\u00ad\u2010Intensive Analysis a.k.a. Big Data Analytics) used to identify highly probable hypotheses followed by \u201cold fashioned\u201d empiricism to establish highly probable causality (i.e.,WHY the phenomenon occurred). Just as the Scientific Method has been applied to domains outside of conventional science and engineering, e.g., the social sciences and the humanities, the emerging Fourth Paradigm is being applied to every human endeavor. Due to its likely pervasive impact, we better get this right. We have yet to establish that results of this emerging Fourth Paradigm -\u00ad\u2010 Data-\u00ad\u2010Intensive Analysis \u2013 are (probabilistically) accurate within measures of correctness and completeness. This often overlooked verification and the associated correctness measures will take a decade or more. Recall that the Third Paradigm evolved over hundreds of years and still has significant issues (e.g., P-\u00ad\u2010values, reproducibility).<\/p>\n<\/div>\n<div>\n<p><strong>With a focus on correctness and completeness I define:<\/strong><\/p>\n<p>Data Science is a body of principles and techniques for applying data-\u00ad\u2010intensive analysis to investigate phenomena, acquire new knowledge, and correct and integrate previous knowledge with measures of correctness, completeness, and efficiency of the derived results.<\/p>\n<h3><strong>Do Teachers and Professionals Have to Lie?<\/strong><\/h3>\n<p>For some time there has been a movement to apply evidence-\u00ad\u2010based analysis to education [4] to evaluate outcomes of educational methods. Happily, evidence-\u00ad\u2010based methods can go much further. Treat all knowledge -\u00ad\u2010 all facts and theories -\u00ad\u2010 as hypotheses requiring evidence-\u00ad\u2010based evaluation relative to specific hypotheses or models \u2013 What: does the phenomenon really occur? and Why: what factors lead to the phenomenon?<\/p>\n<p>Just as science has been taught empirically using experiments to explore scientific phenomena, so too can all topics, for which data is available, be verified using the emerging Fourth Paradigm in which hypotheses can be deduced from data then investigated empirically where conditions (i.e., ground truth) allow. However, the scale and variety of Big Data and the complexity of analytical models may require new measures of significance, correctness, and completeness, a 21st century statistics.<\/p>\n<p>When a serious medical condition arises, it might be helpful for my doctor to find significant evidence for the prognosis and then significant evidence for successful treatment plans. When a serious scientific question arises, e.g., man\u2019s impact on global warming, significant evidence for What and Why are critical. When serious historical or political issues arise, e.g., the social and economic impact of immigration laws such as currently discussed worldwide, significant evidence for hypothesized outcomes could lead to more informed public debate and action. These are examples of how Data-\u00ad\u2010Intensive Analysis can transform education, medicine, and law, or more broadly our world.<\/p>\n<p>In the end, there is no truth, no ultimate ground truth, no lie-\u00ad\u2010free utterances, as everything is contextual based on incomplete facts and knowledge. All world models are flawed. The best we can do is to recognize their limitations and search where resources are warranted and available for sufficient evidence for those things most critical to the well-\u00ad\u2010being of those we love and of our planet.<\/p>\n<h3><strong>Epilogue<\/strong><\/h3>\n<p>The only novelty in these ideas is in their application to Big Data and Data-\u00ad\u2010Intensive Analysis. These ideas have been at the heart of epistemology for hundreds of years, by scientists under many terms, e.g., confirmation bias, and by psychologists also under many terms including Family of Origin. Yet, the outcome \u2013 the Fourth Paradigm -\u00ad\u2010 is novel, a new paradigm [3] that we do not yet fully understand.<\/p>\n<p>While we may believe that a tree is real, can we say that our reality is the true reality when we can sense less than one ten trillionth of the electromagnetic spectrum (see figure below)[1]? If we sense it does that make it real or true? If we cannot sense it, does it exist?<\/p>\n<\/div>\n<p>Ultimately, the message here is of how the Fourth Paradigm, supported by Data-\u00ad\u2010Intensive Analysis, may change our world \u2013 most human endeavors. To get Data Science right we need to establish its fundamentals, hence the above definition.<\/p>\n<div>\n<p style=\"text-align: center;\">\n<h3><strong>Personal Note<\/strong><\/h3>\n<p>For the past few years, I have had the luxury of being able to think about Data Science in my professional and personal lives. Initially, I was disillusioned to realize that while my mother, my teachers, and I believed that we had the greatest integrity and best intentions, i.e., we were right, my world model -\u00ad\u2010 my beliefs -\u00ad\u2010 were based, in part, on faulty learning and imperfect knowledge. This transformed in my professional and personal lives to the excitement of exploration and discovery as I diligently applied doubt followed by verification of those things that matter to me. Therein lies the power of the Fourth Paradigm, of Data-\u00ad\u2010Intensive Analysis and Data Science. When it matters, be open to question \u201cknown\u201d facts and theories in your world model and use emerging evidence-\u00ad\u2010based methods to meaningfully verify them and if need be seek new facts and theories with more significant supporting evidence.<\/p>\n<p style=\"text-align: center;\">Doubt and Verify \u2013 power tools for an amazing, emerging new paradigm for understanding our world.<\/p>\n<ul>\n<li>David Eagleman: Can we create new senses for humans? Ted2015, March 2015<\/li>\n<\/ul>\n<ul>\n<li>Jim Gray on eScience: a transformed scientific method, in Tony Hey, Stewart Tansley, Kristin M. Tolle<\/li>\n<\/ul>\n<ul>\n<li>(Eds.): The Fourth Paradigm: Data-\u00ad\u2010Intensive Scientific Discovery. Microsoft Research 2009 ISBN 978-\u00ad\u2010 0982544204<\/li>\n<\/ul>\n<ul>\n<li>Kuhn, Thomas S. The Structure of Scientific Revolutions. 3rd ed. Chicago, IL: University of Chicago Press, 1996.<\/li>\n<\/ul>\n<ul>\n<li>M. Guzdial. 2015. Bringing evidence-\u00ad\u2010based education to CS. Commun. ACM 58, 6 (May 2015). DOI:http:\/\/dx.doi.org\/10.1145\/2783419.2754947<\/li>\n<\/ul>\n<\/div>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In all fields new facts and knowledge are constantly being produced based on new data, discoveries, experience, and research -&shy;\u2010 far more than a single individual can absorb let alone put into practice. So how do professionals or how does anyone understand that they have a bias, its nature and limitations? And re-evaluate their knowledge (world view) in light of new facts (&ldquo;ground truth&rdquo;) and conclusions?<\/p>\n","protected":false},"author":218,"featured_media":2855,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[94],"ppma_author":[2832],"class_list":["post-1291","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":2832,"user_id":218,"is_guest":0,"slug":"michael-brodie","display_name":"Michael Brodie","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Brodie","first_name":"Michael","job_title":"","description":"Dr. Michael Brodie is a Research Scientist, Computer Science, and Artificial Intelligence Laboratory, Massachusetts Institute of Technology. He serves on Advisory Boards of national and international research organizations and is an adjunct professor at the National University of Ireland, Galway and at the University of Technology, Sydney. He has served on several National Academy of Science committees. His current interests include Big Data, Data Science, and Information Systems evolution."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1291","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/218"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1291"}],"version-history":[{"count":3,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1291\/revisions"}],"predecessor-version":[{"id":29811,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1291\/revisions\/29811"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/2855"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1291"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1291"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1291"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1291"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}