Let us take a moment to appreciate them:
The Random Forest™ is my shepherd; I shall not want.
He makes me watch the mean squared error decrease rapidly.
He leads me beside classification problems.
He restores my soul.
He leads me in paths of the power of ensembles
for his name's sake.
Even though I walk through the valley of the curse of dimensionality,
I will fear no overfitting,
for you are with me;
your bootstrap and your randomness,
they comfort me.
You prepare a prediction before me
in the presence of complex interactions;
you anoint me data scientist;
my wallet overflows.
Surely goodness of fit and money shall follow me
all the days of my life,
and I shall use Random Forests™
forever.
One thing I learned the hard way was that you should not get to attached to an algorithm for prediction. This probably applies to other areas as well. When I participated in the Observing Dark Worlds challenge, I fell into this trap by sticking to Random Forests. My model performed poorly, but instead of thinking about another algorithm I thought about better features. The winner of this competition used a Bayesian approach.
You can find implementations in R (randomForest package) or in Python (scikit-learn library).
Excellent poem! Absolutely love it. However... (!) I think there's a good thread or two on the Heritage Health Prize forum where lots of folks, including myself ran into over-fitting problems with Random Forests in R. What I think is more precise is that the more iterations you run on Random Forest will not cause a bad fit, but it does not mean that the output of a random forest can not be overly tied to predictions that are too specific (fitted) to the input cases (and then fail the general or predicted case), even with reasonable cross-validation. It may overfit differently than other algorithms, but it does certainly happen. Good talk here: https://en.wikipedia.org/wiki/Talk%3ARandom_forest#Overfitting
ReplyDeleteThanks for the comment. Yes, Random Forests can overfit, but I think they are more robust and easier to handle than other algorithms. The discussion on Wikipedia Random Forest page was interesting.
DeleteThank you for this! Leo Breiman was my father- and I sometimes dig around for reflections of his work to best understand what made him tick- just having my own child who never met him, I collect the things that I can find on him wherever they may be- Best, Rebecca Breiman
ReplyDeleteThank you for this! Leo Breiman was my father- and I sometimes dig around for reflections of his work to best understand what made him tick- just having my own child who never met him, I collect the things that I can find on him wherever they may be- Best, Rebecca Breiman
ReplyDeleteYou never know who reads your blog =)
DeleteI am glad you liked the post!
This comment has been removed by a blog administrator.
ReplyDelete