{"id":1054,"date":"2019-01-08T04:44:41","date_gmt":"2019-01-08T04:44:41","guid":{"rendered":"http:\/\/kusuaks7\/?p=659"},"modified":"2023-08-23T14:40:47","modified_gmt":"2023-08-23T14:40:47","slug":"data-science-and-data-analysis","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/data-science-and-data-analysis\/","title":{"rendered":"Data Science and Data Analysis"},"content":{"rendered":"<p><strong><em>Ready to learn Data Analytics? Browse <a href=\"https:\/\/www.experfy.com\/training\/tracks\/data-analyst-training-certification\">Data Analyst Training and Certification courses<\/a> developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/em><\/strong><\/p>\n<p>I met up with a grad school friend of 40 years the other day. While he earned a doctorate and became an academic luminary, I departed the program with a masters in statistical science and went on first to the not-for-profit and then to the business worlds. Both of us recently retired from full time employment and now satisfy our work cravings with contract consulting.<\/p>\n<p>My friend recently read the seminal paper\u00a0<a href=\"https:\/\/courses.csail.mit.edu\/18.337\/2015\/docs\/50YearsDataScience.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">50 years of Data Science<\/a>\u00a0by Stanford professor David Donoho and wanted my take on how this interpretation of DS history mapped to my career in data and analytics. Thanks for asking, I responded. It turns out quite well.<\/p>\n<p>The Cliff Notes version of Donoho\u2019s thesis is that development of the current field of Data Science has been in the works for a long time, born of frustration with the narrow purview of academic statistics in the 60\u2019s. \u2018More than 50 years ago, John Tukey called for a reformation of academic statistics. In \u2018The Future of Data Analysis\u2019, he pointed to the existence of an as-yet unrecognized science, whose subject of interest was learning from data, or \u2018Data Analysis\u2019. Ten to twenty years ago, John Chambers, Bill Cleveland and Leo Breiman independently once again urged academic statistics to expand its boundaries beyond the classical domain of theoretical statistics; Chambers called for more emphasis on data preparation and presentation rather than statistical modeling; and Breiman called for emphasis on prediction rather than inference. Cleveland even suggested the catchy name \u201cData Science\u201d for his envisioned field.\u2019 In short, Data Science advances statistics from its mathematical roots to more balanced math, data, and computational foci. I\u2019d encourage the hour or so investment to consume this important article.<\/p>\n<p>I sensed the beginnings of this divide in my grad school years, recognizing both a concern by some professors with the \u201cover-mathematization\u201d of statistical science, as well as the emergence of significant computational progress that lifted every analytic boat. In 1979, just about all my computer work was on mainframes with FORTRAN and PL\/I; by 1982 most was on minicomputers with Unix\/C\/Ingres and pc\u2019s with MS-DOS. SAS, originally written for IBM mainframes and the statistical software of choice at the time, was ported to minicomputers and pc\u2019s in the early 80\u2019s. At that time as well, resampling techniques like the bootstrap, fueled by computation, were starting to come of age in the statistical world.<\/p>\n<p>I well remember one of my first assignments as an internal hospital consultant to forecast the prevalence of cerebrovascular disease in the hospital network. A piece of cake assembling the data and applying regression\/time series techniques \u2013 just the kind of work I\u2019d done as a research assistant in grad school. Life was good.<\/p>\n<p>Not so fast, though. Next up was designing and implementing a perinatal registry with accumulation of 500 attributes in over 10,000 birth records\/year. The challenges were foremost of data management and computation \u2013 assembling, wrangling, cleaning, reporting, and managing the data were\u00a0<em>my<\/em>\u00a0jobs. So I developed\u00a0 database and programming expertise by necessity, becoming in time a\u00a0 capable data programmer. Alas, the statistical work was far downstream from implementation of the then-new relational database system to manage the data.<\/p>\n<p>Those evolving Data Management and wrangling skills drove my business consulting work from 1985-2005, with the initiatives at first called decision support and then ultimately Data Warehousing\/Business Intelligence (DW\/BI). Data was pre-eminent, followed by the computational processes of munging, cleaning, and managing. More often than not, BI tools like BusinessObjects and Cognos were superimposed on a data repository implemented with database software such as Oracle or Microsoft SQL Server. SAS software connected to the DW for statistical analysis. Occasionally, full-blown analytic apps were delivered.<\/p>\n<p>The ascendance of open source changed the analytics landscape fifteen years ago, with databases like PostgreSQL and MySQL, agile languages such as Python and Ruby, and the R statistical computing platform, encouraging an even greater commitment to analytics and facilitating the emergence of companies whose\u00a0<em>products\u00a0<\/em>were data and analytics. Add proprietary Self-service Analytics\/Visualization tools like Tableau to the Data Analysis mix as well. During this time and currently, I\u2019ve done much more of both data exploration and statistical analysis than in the early years. In many cases, EDA suffices. When it occurs, the statistical emphasis, however, is much more concerned with pure prediction\/forecasting than with the inference-generating models of classical statistics \u2013 another contrast noted by Donoho.<\/p>\n<p>When I size up my career against Donoho\u2019s\u00a0 Six Divisions of Greater Data Science, I feel I\u2019ve worked fairly intimately with the first five: 1. Data Exploration and Preparation 2. Data Representation and Transformation 3. Computing with Data 4. Data Modeling and 5. Data Visualization and Presentation. Only\u00a0 6. Science about Data Science, a much more academic pursuit now growing exponentially, has been unaddressed.<\/p>\n<p>My professor friend absorbed my chronology, opining that his career was primarily about deep dives into\u00a0 4. Data Modeling and 6. Science about Data Science. While we both expressed overall satisfaction with our careers, we acknowledged a bit of melancholy for not having had extensive opportunities to touch all six. Perhaps today\u2019s data scientists will be presented challenges in each.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ready to learn Data Analytics? Browse Data Analyst Training and Certification courses developed by industry thought leaders and Experfy in Harvard Innovation Lab. I met up with a grad school friend of 40 years the other day. While he earned a doctorate and became an academic luminary, I departed the program with a masters in<\/p>\n","protected":false},"author":430,"featured_media":3819,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[94],"ppma_author":[2310],"class_list":["post-1054","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":2310,"user_id":430,"is_guest":0,"slug":"steve-miller","display_name":"Steve Miller","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Miller","first_name":"Steve","job_title":"","description":"Steve Miller is Co-founder and President of&nbsp;<a href=\"http:\/\/www.inquidia.com\/\" target=\"_blank\" rel=\"noopener\">Inqudia Consulting<\/a>. He has over 35 years experience in business intelligence and statistics, the last 25 revolving on the delivery of analytics technology services.&nbsp;"}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1054","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/430"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1054"}],"version-history":[{"count":3,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1054\/revisions"}],"predecessor-version":[{"id":31326,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1054\/revisions\/31326"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3819"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1054"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1054"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1054"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1054"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}