A Common Data Science Mistake: Prediction/Recommendation by Manipulating Model Inputs

Amin Sadri Amin Sadri
February 18, 2019 AI & Machine Learning

“We trained a machine learning model with high performance. However, it did not work and was not useful in practice.” I have heard this sentence several times, and each time I was eager to find out the reason. There could be different reasons that a model failed to work in practice. As these issues are not usually addressed in data science courses, in this article I address one of the common mistakes in designing and deploying a machine learning model.

In the rest of this article, first, I will discuss the confusion between Correlation and Causation that leads to the misuse of machine learning models. I will illustrate the discussion with an example. After that, different possibilities between inputs and outputs of the model are shown. Finally, I provide some suggestions to avoid this mistake.

Correlation not Causation

Mistaking Correlation with Causation can lead to wrong results. An example of confusion between Correlation and Causation is the analysis of Freakonomics in which Illinois sent books to students because the analysis revealed that books available at home are directly correlated to high test marks. However, the reality is that houses wherein parents usually buy books have an exhilarated learning environment. Further analysis revealed students from homes which have several books performed better in their academics even if they have never read the books. In fact, getting higher marks was not an effect of the books, but they both result from the environment.

Back to our topic, after you develop a model, you cannot manipulate the input parameters (features) to see the effect on the output. The reason is that an input feature could be an effect of the output and it is not necessarily the cause of the output. What a high-performance machine learning model tells you is that there is a correlation between the input and output. You cannot adjust the inputs to get the desired output and then provide the recommendations based on the adjusted inputs.

Example

Here is an example in which we develop a regression model but the model provides a false prediction/recommendation. Assume we have the outside temperature and temperature of a room. We can develop a linear regression model to estimate the outside temperature based on the temperature of the room.

T(Outside)= C1*T(Inside)+C2

where C1 and C2 are the constant coefficients derived from the data. Assume this model has very high performance (e.g. more than 99%).

Working with the model, we find out that if the inside temperature increase by 5C, the outside temperature will increase by 10C. Can we buy a heater for the room and increase the inside temperature to enjoy a warm day??!! Of course not. The reason is that the inside temperature is the effect, not the cause. The same thing can happen when a data scientist manipulates the inputs of a model (e.g. inside temperature) to get the desired output (e.g. outside temperature). The recommendations based on manipulating the inputs are usually useless in practice.

Input and Output Relationship

Now, let’s see different cases when there is a correlation between one of the features A and the output B. The following figures show different cases.

It is clear that in cases 2, 3, and 4, the output of the model for a manipulated value of A is different from what we see in the real world. It should be noted that even in case 1, the output might be different because A may have some correlation with other inputs of the model. This means when the value of Achanges the other inputs will also change. Therefore, it is not correct to change only one of the input features, and investigate its effect.

How to Avoid?

First, be aware of this issue. You should be aware that by manipulating the inputs, you cannot predict the output. Keeping this in your mind would affect how you design your model and how to choose the futures.

Second, if you would like to design a prediction model, you need to have the historical data that tells your model the effect of changing the inputs. By having snapshots, you cannot predict what will happen if an input changes. In this case, you can train the model based on the historical data. In our example, when we want to see the effect of the room temperature on the outside temperature, we need to have some samples that include changes of inside temperature and their effects on the outside temperature (e.g. after 1 hour). In this case, the model learns that the room temperature has no effect on the outside temperature.

Third, use your domain knowledge or talk to experts and see if your prediction/recommendation results make sense or not. This leads to avoiding not only this mistake but other logical mistakes. For example, there might be some bugs in your coding that you are not aware of. Sense check can help you validate the model in general.

Conclusion

Designing a machine learning model is a tricky task. A model may not work in practice although it has high performance on the training data. In this article, I discussed the misuse of a machine learning model that causes the predictions not to work in the real world situation. The other reasons could be overfitting, duplicated samples, and unbiased data. It is always good to use your domain knowledge or talk to some experts and see if your prediction/recommendation results make sense or not.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Amin Sadri

    Tags
    Machine Learning
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    Not Just Cyber: How Tech and Retail Are Reinventing Shopping

    Not Just Cyber: How Tech and Retail Are Reinventing Shopping

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.