Good Data Scientists Don’t Gather Project Requirements. They Dig For Them

Sergey Mastitsky Sergey Mastitsky
December 10, 2020 AI & Machine Learning

“Requirements rarely lie on the surface”

The majority of Data Science projects fail. I will not even provide any references in support of this statement — the Internet is full of examples. The reasons for the high failure rate are many and varied. However, as surprising as this may sound, one of the main reasons is the lack of clearly defined project goal(s) and the associated requirements.

Problem understanding and requirements gathering make up an initial phase in pretty much any project management framework, including the widely used “Cross-Industry Standard Process for Data Mining” (CRISP-DM). This implies that the project goals and requirements are already there, waiting to be “gathered”. However, as Andrew Hunt and David Thomas say in their famous book “The Pragmatic Programmer”:

“It doesn’t quite work that way. Requirements rarely lie on the surface. Normally, they’re buried deep beneath layers of assumptions, misconceptions, and politics.”

The advice the authors then give is simple:

“Don’t gather requirements — dig for them.”

Wise words, indeed. But what does this advice mean exactly in the context of Data Science projects? There are several crucial aspects to consider, and I will cover some of them in this article. To make things a bit more concrete, let us assume that a Data Science team is tasked with building a recommender system for products sold by an online shop.

Study your stakeholders

Defining the goals of a Data Science project is objectively hard because of the number of stakeholders involved and the different goals they pursue (Hulten 2018). A marketing team may want to have a recommender system to keep customers engaged and cross-sell as many products to them as possible. However, a user experience team may want to use it to make customer journeys on the website as smooth as possible. Finally, Data Scientists mainly care about the predictive accuracy of the Machine Learning model powering their recommender system.

Although related, these goals differ in terms of their measures of success. Marketers will want to see as many conversions as possible. UX experts will care about the time it takes to complete a purchase. Data Scientist will spend days trying to beef up that second digit in the nDCG metric. Oh, and the customers? They will never come back to the website again if they cannot find what they need fast enough or have troubles placing an order.

How does one make sure that all participants of a project are heard? There is only one way, really: get yourself out there and talk to the people you as a Data Scientist work with to understand “what keeps them awake at night”. When done in a structured and empathetic manner, this will help with building up a story around the project that all participants get behind. And do not hesitate to spend as much time for this as you need — you will thank your future self.

Maintain a project glossary

As Data Scientists, we get to work on various problems and often even in different industries. Personally, I think this is the best thing about Data Science that makes it so interesting and attractive. However, this also implies that with every new project one has to learn a lot of new concepts and terms. Maintaining a project-specific glossary of terms can help with better understanding the problem at hand and making the communication with stakeholders much smoother.

Capture requirements with “user stories”

Software developers use several techniques to capture requirements, and I believe these techniques are directly applicable to Data Science projects as well. In my consulting work, I found “user stories” to be particularly useful.

A user story is an informal way of describing a feature of an application following a simple pre-defined template, e.g.:

"As a <role> I want to <capability>, so that <receive benefit>"

User stories are great for the following reasons:

  • they can be written in plain English (on post-it notes or using programs like Jira) by any project participant, expressing her domain-specific needs;
  • they are high-level descriptions that allow project participants to focus on discussing the desired functionality rather than the implementation details;
  • they are well-suited for project planning as they can be given an estimate of how time- and resource-consuming they will be to develop.

In the context of our running example, user stories might look like this:

"As a marketer, I want our website visitors to see products that they are likely to buy, so that I can increase our overall cross-sell rate.""As a UX specialist, I want our website visitors to quickly find the products they are interested in, so that they spend minimal time to complete a purchase."

Having user stories written by all project participants helps with defining the goals and success metrics, understanding the scope of work, and prioritising individual deliverables.

Keep the requirements documented

Once you are done with defining project goals and requirements, get them documented. This can be done in many different ways (I like the “project poster” format developed by Atlassian). Irrespective of the format that makes sense for your organisation, try to avoid unnecessary details. Requirements are not describing the design or architecture of a system, they only capture what needs to be accomplished. Never waste your time writing detailed project charters because 1) they become obsolete the moment you save the file and 2) nobody will ever read them anyway due to their bloated size.

Having goals and requirements documented is important not only for kick-starting a project. It is also a mechanism to protect the project against continuous changes in requirements that, if undocumented, result in scope creep. Although it is natural for goals to evolve, having an easy to grasp requirements document will help all parties to stay informed, preventing the project from getting out of control.

Conclusions

Data Science projects are somewhat unique in that they involve many stakeholders, who have their own agendas and definitions of success. This calls for an extra effort from Data Scientists to define and properly document project goals and requirements. Luckily, Data Scientists can borrow project management techniques from software developers, who often operate under similarly complex conditions.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Sergey Mastitsky

    Tags
    Data Science ProjectsData ScientistsProject Goals
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    How Are AI, Big Data, And Advanced Analytics Transforming The Ecommerce Industry?

    How Are AI, Big Data, And Advanced Analytics Transforming The Ecommerce Industry?

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.