Data Science and Data Analysis

Steve Miller Steve Miller
January 8, 2019 Big Data, Cloud & DevOps

Ready to learn Data Analytics? Browse Data Analyst Training and Certification courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.

I met up with a grad school friend of 40 years the other day. While he earned a doctorate and became an academic luminary, I departed the program with a masters in statistical science and went on first to the not-for-profit and then to the business worlds. Both of us recently retired from full time employment and now satisfy our work cravings with contract consulting.

My friend recently read the seminal paper 50 years of Data Science by Stanford professor David Donoho and wanted my take on how this interpretation of DS history mapped to my career in data and analytics. Thanks for asking, I responded. It turns out quite well.

The Cliff Notes version of Donoho’s thesis is that development of the current field of Data Science has been in the works for a long time, born of frustration with the narrow purview of academic statistics in the 60’s. ‘More than 50 years ago, John Tukey called for a reformation of academic statistics. In ‘The Future of Data Analysis’, he pointed to the existence of an as-yet unrecognized science, whose subject of interest was learning from data, or ‘Data Analysis’. Ten to twenty years ago, John Chambers, Bill Cleveland and Leo Breiman independently once again urged academic statistics to expand its boundaries beyond the classical domain of theoretical statistics; Chambers called for more emphasis on data preparation and presentation rather than statistical modeling; and Breiman called for emphasis on prediction rather than inference. Cleveland even suggested the catchy name “Data Science” for his envisioned field.’ In short, Data Science advances statistics from its mathematical roots to more balanced math, data, and computational foci. I’d encourage the hour or so investment to consume this important article.

I sensed the beginnings of this divide in my grad school years, recognizing both a concern by some professors with the “over-mathematization” of statistical science, as well as the emergence of significant computational progress that lifted every analytic boat. In 1979, just about all my computer work was on mainframes with FORTRAN and PL/I; by 1982 most was on minicomputers with Unix/C/Ingres and pc’s with MS-DOS. SAS, originally written for IBM mainframes and the statistical software of choice at the time, was ported to minicomputers and pc’s in the early 80’s. At that time as well, resampling techniques like the bootstrap, fueled by computation, were starting to come of age in the statistical world.

I well remember one of my first assignments as an internal hospital consultant to forecast the prevalence of cerebrovascular disease in the hospital network. A piece of cake assembling the data and applying regression/time series techniques – just the kind of work I’d done as a research assistant in grad school. Life was good.

Not so fast, though. Next up was designing and implementing a perinatal registry with accumulation of 500 attributes in over 10,000 birth records/year. The challenges were foremost of data management and computation – assembling, wrangling, cleaning, reporting, and managing the data were my jobs. So I developed  database and programming expertise by necessity, becoming in time a  capable data programmer. Alas, the statistical work was far downstream from implementation of the then-new relational database system to manage the data.

Those evolving Data Management and wrangling skills drove my business consulting work from 1985-2005, with the initiatives at first called decision support and then ultimately Data Warehousing/Business Intelligence (DW/BI). Data was pre-eminent, followed by the computational processes of munging, cleaning, and managing. More often than not, BI tools like BusinessObjects and Cognos were superimposed on a data repository implemented with database software such as Oracle or Microsoft SQL Server. SAS software connected to the DW for statistical analysis. Occasionally, full-blown analytic apps were delivered.

The ascendance of open source changed the analytics landscape fifteen years ago, with databases like PostgreSQL and MySQL, agile languages such as Python and Ruby, and the R statistical computing platform, encouraging an even greater commitment to analytics and facilitating the emergence of companies whose products were data and analytics. Add proprietary Self-service Analytics/Visualization tools like Tableau to the Data Analysis mix as well. During this time and currently, I’ve done much more of both data exploration and statistical analysis than in the early years. In many cases, EDA suffices. When it occurs, the statistical emphasis, however, is much more concerned with pure prediction/forecasting than with the inference-generating models of classical statistics – another contrast noted by Donoho.

When I size up my career against Donoho’s  Six Divisions of Greater Data Science, I feel I’ve worked fairly intimately with the first five: 1. Data Exploration and Preparation 2. Data Representation and Transformation 3. Computing with Data 4. Data Modeling and 5. Data Visualization and Presentation. Only  6. Science about Data Science, a much more academic pursuit now growing exponentially, has been unaddressed.

My professor friend absorbed my chronology, opining that his career was primarily about deep dives into  4. Data Modeling and 6. Science about Data Science. While we both expressed overall satisfaction with our careers, we acknowledged a bit of melancholy for not having had extensive opportunities to touch all six. Perhaps today’s data scientists will be presented challenges in each.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Steve Miller

    Tags
    Data Science
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    The Logistic Regression Algorithm

    The Logistic Regression Algorithm

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in Big Data, Cloud & DevOps
    Big Data, Cloud & DevOps
    Cognitive Load Of Being On Call: 6 Tips To Address It

    If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

    5 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    How To Refine 360 Customer View With Next Generation Data Matching

    Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

    4 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    3 Ways Businesses Can Use Cloud Computing To The Fullest

    Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

    7 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.