Hello, I'm a data analyst by trade, trying to implement more Data Science techniques at my job. I have never found a clear answer to this:
When is it more appropriate to use regression vs a machine learning technique like random forests? Both take many variables and predict a single variable on a test set. What are advantages of using one vs the other.
Thanks in advance.
Actually Regression is by definition the task of predicting a continuus variable based on some input. So if you use Decision Trees or Random Forests to predict a continuus target you are doing Regression.
If you mean Linear Regression vs. Random Forests (or Neural Nets or Nearest Neighbors etc.) you can just compare performance on a given dataset. You can also combine the predictions of different models by some kind of averaging (that's what random forest actually does, see Ensemble Learning for more).
Interesting. I didn't know Random Forests, etc could be classified as regression.
So it sounds like there's no single protocol for choosing a method. You test each and decide what works best.
I will check out Ensemble Learning. Thanks, that was very helpful.
You 're welcome. In this answer you can find some aditional info.
There is no strict protocol but there are some best practices.The scikit-learn docs have a nice visualization of this process
Harvard Innovation Lab
114 Western Ave, Boston, MA 02134
Toll Free: (844) EXPERFY or