Automated Inspiration

Cassie Kozyrkov Cassie Kozyrkov
August 27, 2020 AI & Machine Learning

In the 19th century, doctors might have prescribed mercury for mood swings and arsenic for asthma. It might not have occurred to them to wash their hands before your surgery. They weren’t trying to kill you, of course—they just didn’t know any better.

These early doctors had valuable data scribbled in their notebooks, but each held only one piece in a grand jigsaw puzzle. Without modern tools for sharing and analyzing information—as well as a science for making sense of that data—there wasn’t much to stop superstition from influencing what could be seen through a keyhole of observable facts.

Humans have come a long way with technology since then, but today’s boom in machine learning (ML) and artificial intelligence (AI) isn’t really a break with the past. It’s the continuation of the basic human instinct to make sense of the world around us so that we can make smarter decisions. We simply have dramatically better technology than we’ve ever had before.

“Today’s boom in machine learning and artificial intelligence isn’t really a break with the past. It’s the continuation of the basic human instinct to make sense of the world around us so that we can make smarter decisions. We simply have dramatically better technology than we’ve ever had before.”

One way to think of this pattern through the ages is as a revolution of data sets, not data points. The difference isn’t trivial. Data sets helped shape the modern world. Consider the scribes of Sumer (modern day Iraq), who pressed their styluses to tablets of clay more than 5,000 years ago. When they did so, they invented not just the first system of writing, but the first data storage and sharing technology.

If you’re inspired by the promise of AI’s better-than-human abilities, consider that stationery gives us superhuman memory. Though it’s easy to take writing for granted today, the ability to store data sets reliably represents a ground-breaking first step on the path to higher intelligence.

Unfortunately, retrieving information from clay tablets and their pre-electronic cousins is a pain. You can’t snap your fingers at a book to get its word count. Instead, you’d have to upload every word into your brain to process it. This made early data analysis time-consuming, so initial forays into it stuck to the essentials. While a kingdom might analyze how much gold it raised in taxes, only an intrepid soul would try the same line of effortful reasoning on an application like, say, medicine, where millennia of tradition encouraged just winging it.

Luckily, our species produced some incredible pioneers. For example, John Snow’s map of deaths during the 1858 cholera outbreak in London inspired the medical profession to reconsider the superstition that the disease was caused by miasma (toxic air) and to start taking a closer look at the drinking water.

If you know “The Lady With The Lamp,” Florence Nightingale, for her heroic compassion as a nurse, you might be surprised to learn that she was also an analytics pioneer. Her inventive infographics during the Crimean War saved many lives by identifying poor hygiene as a leading cause of hospital deaths and inspiring her government to take sanitation seriously.

The one-data set era took off as the value of information began to assert itself in a growing number of fields, leading to the invention of the computer. No, not the electronic buddy you’re used to today. “Computer” started out as a human profession, with its practitioners performing computations and processing data manually to extract its value.

The beauty of data is that it allows you to form an opinion out of something better than thin air. By taking a look at information, you’re inspired to ask new questions, following in the footsteps of Florence Nightingale and John Snow. That’s what the discipline of analytics is all about: inspiring models and hypotheses through exploration.

From Data Sets To Data Splitting

In the early 20th century, a desire to make better decisions under uncertainty led to the birth of a parallel profession: statistics. Statisticians help you test whether it’s sensible to behave as though the phenomenon an analyst found in the current data set also applies beyond it.

A famous example comes from Ronald A. Fisher, who developed the world’s first statistics textbook. Fisher describes performing a hypothesis test in response to his friend’s claim that she could taste whether milk was added to tea before or after the water. Hoping to prove her wrong, he was instead forced by the data to conclude that she could.

Analytics and statistics have a major Achilles’ heel: If you use the same data point for hypothesis generation and for hypothesis testing, you’re cheating. Statistical rigor requires you to call your shots before you take them; analytics is more a game of advanced hindsight. They were almost tragicomically incompatible, until the next major revolution—data splitting—changed everything.

Data splitting is a simple idea, but to a data scientist like myself, it’s one of the most profound. If you have only one data set, you must choose between analytics (untestable inspiration) and statistics (rigorous conclusions). The hack? Split your data set into two pieces, then have your cake and eat it too!

The two-data set era replaces the analytics-statistics tension with coordinated teamwork between two different breeds of data specialist. Analysts use one data set to help you frame your questions, then statisticians use the other data set to bring you rigorous answers.

Such luxury comes with a hefty price tag: quantity. Splitting is easier said than done if you’ve struggled to scrape together enough information for even one respectable data set. The two-data set era is a fairly new development that goes hand-in-hand with better processing hardware, lower storage costs and the ability to share collected information over the internet.

In fact, the technological innovations that led to the two-data set era rapidly ushered in the next phase, a three-data set era of automated inspiration. There’s a more familiar word for it: machine learning.

Using a data set destroys its purity as a source of statistical rigor. You only get one shot, so how do you know which “insight” from analytics is most worthy of testing? Well, if you had a third data set, you could use it to take your inspiration for a test drive. This screening process is called validation; it’s at the heart of what makes machine learning tick.

Once you’re free to throw everything at the validation wall and see what sticks, you can safely let everyone have a go at coming up with a solution: seasoned analyst, intern, tea leaves and even algorithms with no context about your business problem. Whichever solution works best in validation becomes a candidate for the proper statistical test. You’ve just empowered yourself to automate inspiration!

Automated Inspiration

This is why machine learning is a revolution of data sets, not just data. It depends on the luxury of having enough data for a three-way split.

Where does AI fit into the picture? Machine learning with deep neural networks is technically called deep learning, but it got another nickname that stuck: AI. Although AI once had a different meaning, today you’re most likely to find it used as a synonym for deep learning.

Deep neural networks earned their hype by virtue of outclassing less sophisticated ML algorithms on many complex tasks. But they require much more data to train them, and with processing requirements beyond those of a typical laptop. That’s why the rise of modern AI is a cloud story; the cloud allows you to rent someone else’s data center instead of committing to building your deep learning rig, making AI a try-before-you-buy proposition.

With this puzzle piece in place, we have the full complement of professions: ML/AI, analytics and statistics. The umbrella term that encompasses all of them is called data science, the discipline of making data useful.

Modern data science is the product of our three-data set era, but many industries routinely generate more than enough data. So is there a case for four data sets?

Well, what’s your next move if the model you just trained gets a low validation score? If you’re like most people, you’ll immediately demand to know why! Unfortunately, there’s no data set you can ask. You might be tempted to go sleuthing in your validation data set, but unfortunately debugging breaks its ability to screen your models effectively.

By subjecting your validation data set to analytics, you’re effectively turning your three data sets back into two. Instead of finding help, you’ve unwittingly gone back an era!

The solution lies outside the three data sets you’re already using. To unlock smarter training iteration and hyperparameter tuning, you’ll want to join the cutting edge: an era of four data sets.

If you think of the other three data sets as giving you inspiration, iteration and rigorous testing, then the fourth fuels acceleration, shortening your AI development cycle through advanced analytics techniques geared at providing clues as to what approaches to try on each round. By embracing four-way data splitting, you’ll be in the best position to take advantage of data abundance! Welcome to the future.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Cassie Kozyrkov

    Tags
    AIData SetML
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    Do We Need Deep Graph Neural Networks?

    Do We Need Deep Graph Neural Networks?

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.