# Rahul Agarwal

Rahul Agarwal is a Data Scientist at Walmart Labs.

### Six bits of advice for Data Scientists

A data scientist needs to be critical and always on a lookout of something that others miss. But sometimes in day to day job and coding perse, data scientist gets lost in his thought and fails to look at the overall picture. In the end, his business partners have only hired him to generate value, and he won’t be able to generate value unless he develops business critical thinking. So here is some advice that one can include in the day to day data science work to be more diligent and more impactful at the same time.

### The 5 Classification Evaluation metrics every Data Scientist must know

An important step while creating our machine learning pipeline is evaluating our different models against each other. A bad choice of an evaluation metric could wreak havoc to your whole system. So, always be watchful of what you are predicting and how the choice of evaluation metric might affect/alter your final predictions. Also, the choice of an evaluation metric should be well aligned with the business objective and hence it is a bit subjective. And you can come up with your own evaluation metric as well.

### P-value Explained Simply for Data Scientists

P-Values are always a headache to explain even to someone who knows about them let alone someone who doesn’t understand statistics. In statistical hypothesis testing, the p-value or probability value is, for a given statistical model, the probability that, when the null hypothesis is true, the statistical summary such as the sample mean difference between two groups would be equal to, or more extreme than, the actual observed results. This post is about explaining p-values in an easy to understand way without all that pretentiousness of statisticians.

### The Simple Math behind 3 Decision Tree Splitting criterions

How Decision Trees work exactly? This is one of the most asked questions in ML/DS interviews. We generally know they work in a stepwise manner and have a tree structure where we split a node using some feature on some criterion. But how do these features get selected and how a particular threshold or value gets chosen for a feature? This post will talk about three of the main splitting criteria used in Decision trees and why they work.