The role of the data curator: Make data scientists more productive

Kelly Stirman Kelly Stirman
February 15, 2019 Big Data, Cloud & DevOps

Ready to learn Data Science? Browse courses like Data Science Training and Certification developed by industry thought leaders and Experfy in Harvard Innovation Lab.

The ability to harness data to solve critical business challenges is an essential skill for every organization today. There are two primary roles responsible for this function—data scientists and data analysts. Unfortunately these people spend the majority of their time performing tasks that are not core to their high value responsibilities, such as finding data, preparing data, and optimizing data for their analysis.

As with many high value roles, complementary specialized roles emerge that allow each participant to be as efficient as possible. Consider healthcare, for example, where doctors work with nurses, allied health professionals, physicians assistants, office managers, and other roles to maximize the time each doctor practices medicine.

Today our data scientists and data analysts are more like doctors who perform many of these functions themselves.

We are fairly early in our evolution of roles to fulfill the end to end process of data analytics, and there is still tremendous opportunity to improve efficiency with better specialization of roles. Today we see the emergence of a new role: the data curator.

experfy-blog

If we look at all the different roles involved in data analytics, we can first rationalize responsibilities based on the larger organizations of IT and the business. Today there are individuals who reside in IT—data engineers and data custodians—as well as those who sit in the business—data analysts and data scientists.

In 2018 we will increasingly see data curators, a new role within the business that focuses on bridging the worlds of business and IT in terms of data analytics. Let’s take a look at these roles and their responsibilities:

  • Data scientists use tools like Python and R to build models that provide predictions, recommendations, and visualizations based on data inputs. They work within the business and rely on IT to provision their data.
  • Data analysts use BI tools to develop visualizations, reports, and dashboards that help to tell a story about business data. They work within the business and rely on IT to provide access to data from different applications and systems.
  • Data custodians are responsible for defining and documenting technical controls that safeguard the data across many different systems. They work within IT using security access controls and data management tools to standardize provenance and access to the data.
  • Data engineers have a deep understanding of the systems and infrastructure that generate and store the business data. They work in SQL, Python, Java, and other languages to query, transform, aggregate, and move data between systems for different end user needs. They work within IT.
  • Data curators sit within the business, using self-service data platforms to curate data for different analytical tasks, to allocate computational resources for accelerating data analysis, to add semantic meaning to a data catalog, to accelerate high-value datasets, to blend datasets together, and to organize project areas for teams of data analysts and data scientists to work together more effectively.

Data analysts and data scientists understand the meaning of the data, but they rely on IT to source the data they need, and to apply and changes necessary to reshape and transform the data for their needs. More often than not, these individuals find themselves waiting on IT to perform these tasks, and they take matters into their own hands, making copies of the data that are no longer governed by the central controls of the organization. More importantly, it means these high value individuals are not performing the work that is essential to their role.

As companies embrace more of a self-service model for their data scientists and data analysts, they are using data curators to make these individuals more productive and more impactful to the business. Data curators streamline the process of sourcing, organizing, and accelerating data for analysis. They know the data and understand the analytical workloads better than data engineering since they are closer to the business units.

The data curator has a good understanding of the types of systems that store the data, and the types of tools that can be used for processing the data, even if they are not practitioners of these technologies themselves. They have up-to-date knowledge about datasets, their provenance, and what data curation is needed. They also understand the different types of analysis that need to be performed on specific datasets, as well as the expectations in terms of latency and availability set by diverse business users.

By working with data engineers, data custodians, data analysts, and data scientists, the data curator develops a deep understanding of how data is used by the business, and how IT applies technology to make the data available. Data curators are making data analysts and data scientists more productive by allowing them to focus on what they do best. 

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Kelly Stirman

    Tags
    Data Science
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    Some Essential Hacks and Tricks for Machine Learning with Python

    Some Essential Hacks and Tricks for Machine Learning with Python

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in Big Data, Cloud & DevOps
    Big Data, Cloud & DevOps
    Cognitive Load Of Being On Call: 6 Tips To Address It

    If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

    5 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    How To Refine 360 Customer View With Next Generation Data Matching

    Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

    4 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    3 Ways Businesses Can Use Cloud Computing To The Fullest

    Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

    7 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.