The Machine Learning Race Is Really a Data Race

Barry Libert Barry Libert
May 13, 2019 AI & Machine Learning

Organizations that hope to make AI a differentiator need to draw from alternative data sets — ones they may have to create themselves.

Machine learning — or artificial intelligence, if you prefer — is already becoming a commodity. Companies racing to simultaneously define and implement machine learning are finding, to their surprise, that implementing the algorithms used to make machines intelligent about a data set or problem is the easy part. There is a robust cohort of plug-and-play solutions to painlessly accomplish the heavy programmatic lifting, from the open-source machine learning framework of Google’s TensorFlow to Microsoft’s Azure Machine Learning and Amazon’s SageMaker. 

What’s not becoming commoditized, though, is data. Instead, data is emerging as the key differentiator in the machine learning race. This is because good data is uncommon.

Useful Data: Both Valuable and Rare

Data is becoming a differentiator because many companies don’t have the data they need. Although companies have measured themselves in systematic ways using generally accepted accounting principles for decades, this measurement has long been focused on physical and financial assets — things and money. A Nobel Prize was even awarded on capital asset pricing in 2013, reinforcing these well-established priorities.

Rowing crew of business people race through a sea of data in a four person row boat

But today’s most valuable companies trade in software and networks, not just physical goods and capital assets. Over the past 40 years, the asset focus has completely flipped, from the market being dominated by 83% tangible assets in 1975 to 84% intangible assets in 2015. Instead of manufacturing coffeepots and selling washing machines, today’s corporate giants offer apps and connect people. This shift has created a drastic mismatch between what we measure and what actually drives value.

The result is that useful data is problematically rare. There is a growing gap between market and book values. Because of this gap, companies are racing to apply machine learning to important business decisions, even replacing some of their expensive consultants, only to realize that the data they need doesn’t even exist yet. In essence, the fancy new AI systems are being asked to apply new techniques to the same old material.

Just like people, a machine learning system is not going to be smart about any topic until it has been taught. Machines need a lot more data than humans do in order to get smart — although, granted, they do read that data a lot faster. So, while there is a visible arms race as companies bring on machine learning coders and kick off AI initiatives, there is also a behind-the-scenes, panicked race for new and different data.

In finance, for instance, alternative data reaches beyond the traditional Securities and Exchange Commission reports and investor presentations that influence investment decisions. Alternative data, such as social media sentiment or number of patents awarded, is essential for two important reasons. First, traditional data focuses on traditional assets, and that isn’t expansive enough in the age of intangible assets. Second, there’s no reason to bother using machine learning to study the same data sets that everyone else in the market is analyzing. Everyone who is interested has already tried to correlate industry trends, profit margins, growth rates, earnings before interest and taxes, asset turnover, and return on assets — along with the more than 1,000 other commonly reported variables with shareholder return.

Looking for connections among the same sets of material that everyone else has isn’t going to help companies win. Instead, organizations that want to use AI as a differentiator are going to have to find relationships between new data sets — data sets they may have to create themselves to measure intangible assets. 

Curate Carefully: What Do You Want to Know?

Data creation is more complex than simply aggregating point-of-sale or customer information and dumping it into a database: Most organizations mistakenly believe that an expedient path involves gathering every scrap of possible data and painstakingly combing through it all hoping to find a glimmer of insight — the elusive feature that predicts or categorizes something they care about.

While machine learning can occasionally surprise us with a flash of rare brilliance that no one has yet to discover, the technology isn’t capable of providing these insights with consistency. This doesn’t mean the tool is broken. It means we have to apply it wisely. This is easier said than done: For instance, in our research of the alternative data market, we found that more than half of new data providers are still focused on measuring physical and financial assets.

The step that many organizations omit is creating a hypothesis about what matters. Where machine learning really excels is taking an insight that humans have — one based on rules of thumb, broad perceptions, or poorly understood relationships — and developing a faster, better understood, more scalable (and less error-prone) method for applying the insight.

In order to use machine learning in this way, you don’t feed the system every known data point in any related field. You feed it a carefully curated set of knowledge, hoping it can learn, and perhaps extend, at the margins, knowledge that people already have.

Insightful Machine Learning Comes From Different Data

All this has three specific implications for companies wanting to create impactful and valuable machine learning applications:

  • Differentiated data is key to a successful AI play. You won’t uncover anything new working with the same data your competitors have. Look internally and identify what your organization uniquely knows and understands, and create a distinctive data set using those insights. Machine learning applications do require a large number of data points, but this doesn’t mean the model has to consider a wide range of features. Focus your data efforts where your organization is already differentiated.
  • Meaningful data is better than comprehensive data. You may possess rich, detailed data on a topic that simply isn’t very useful. If your company wouldn’t use that information to help inform decision-making on an ad hoc basis, then that data likely won’t be valuable from a machine learning perspective. An expert machine learning architect will ask you tough questions about which fields really matter, and how those fields will likely matter to your application of the insights you get. If these questions are difficult to answer, then you haven’t put in the thought needed to produce practical value.
  • What you know should be the starting point. Companies that best use machine learning begin with a unique insight about what matters most to them for making important decisions. This guides them about what data to amass, as well as what technologies to use. An easy place to begin is to scale and grow a piece of knowledge that your team already has and that could create more value for the organization.
  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Barry Libert

    Tags
    Machine Learning
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    Key Kubernetes Commands

    Key Kubernetes Commands

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.