The Proliferation of Data Science Tools & Technology

Matthew Mahowald Matthew Mahowald
November 20, 2017 Big Data, Cloud & DevOps
Ready to learn Data Science? Browse courses like Data Science Training and Certification developed by industry thought leaders and Experfy in Harvard Innovation Lab.
The history of predictive analytics might be said to begin with Bayes’ famous theorem relating the conditional probabilities of two events.  Even today, the importance of foundational work like Bayes’ theorem cannot be overstated: it is both the basis for most significance tests across the experimental sciences, as well as a useful tool in its own right for assessing correlation.

P(A|B)=P(B|A)P(A)/P(B)

In recent years, as the sultans of Silicon Valley have pressed both computation speeds and data storage capacities to dizzying heights, researchers and analysts working at the intersection of statistics and computer science have leveraged new tools to chase increasingly sophisticated modeling techniques. This dramatic expansion in both software tools and, especially, the quantity and quality of data available led to the emergence of data science as a discipline, and most important the assets created by a data science teams: predictive analytic models.
However historically, when it was time to deploy a new predictive analytic model into production, the burden of deployment on IT and the production pipeline was fairly minor. Long lead times meant that each model could be manually restructured (and sometimes even translated into another programming language). Moreover, the comparative simplicity of the models themselves meant that this recoding was not unreasonably labor-intensive.
The proliferation of tools and techniques in data science have not changed the fundamental deployment problem. However, the complexity of the models strains the feasibly of traditional deployment methodology. There are now more than 10,000 open-source packages on CRAN (the global R package repository). With open-source projects like Scikit-Learn and Pandas, Python offers similarly comprehensive support. Today’s vast data science environment has the ability to construct a wider variety of models faster, at lower cost, and leveraging more data than ever before.
The trend has seeped into the speed at which analytic models are being built.  What used to be a leisurely build, with a small number of fairly simplistic rules-based or linear-regression models each year, has turned into the creation of dozens of complex models leveraging the latest and greatest gradient boosting machine or convolutional neural net toolsets. As a result, the traditional model deployment process becomes simply unsustainable.
So, what’s the solution? It’s imperative that everyone – from IT professionals to data scientists – understand and address the challenges of analytic deployment in the modern era. One way to ensure that an enterprise is making analytic deployment a core competency is with an analytic deployment engine. To find success, such an engine would have properties like:
Ensuring it’s a software component that sits in the production data pipeline, where it receives and executes models.
It provides native support (without recoding) for any modeling language or package, that is, the engine is language agnostic.
It can connect to any data source or sink used in the production data pipeline.
This engine should be simultaneously easy enough to use that the data science team can validate and deploy models without requiring IT involvement, and sufficiently robust and scalable that it can be used with confidence in the production pipeline.
Finally (and most importantly), an analytic deployment engine should be future-proof: new libraries and packages in R and Python shouldn’t require upgrading the engine, nor should the emergence of other new techniques and tools.
As organizations continue to gather massive data sets and develop more advanced analytic models to extract value, the number of barriers that are being encountered continue to pile up. By having the right set of data science tools that focus on analytic deployment technology, the IT and Analytics teams can find that sweet spot of success to drive ROI for their businesses.
Originally published at insideBIGDATA
  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Matthew Mahowald

    Tags
    Data Science
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    Decrypting the ICO

    Decrypting the ICO

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in Big Data, Cloud & DevOps
    Big Data, Cloud & DevOps
    Cognitive Load Of Being On Call: 6 Tips To Address It

    If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

    5 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    How To Refine 360 Customer View With Next Generation Data Matching

    Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

    4 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    3 Ways Businesses Can Use Cloud Computing To The Fullest

    Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

    7 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.