Skip to main content


Showing posts from August, 2012

PCA or Polluting your Clever Analysis

Predictive analytics: Some ways to waste time

I am starting to take part at different competitions at kaggle and crowdanalytics. The goal of most competitions is to predict a certain outcome given some covariables.  It is a lot of fun trying out different methods like random forests, boosted trees, gam boosting, elastic net and other models. Although I still feel like not being very good with my predictions, I already learned a lot.

As I am a student I have quite some time for these competitions, which is really great, but the time I can spend is nonetheless limited. So it is in my interest to work efficiently. At least the technical part like pre processing of the data or creation of the folder structure is often the same and can be done quickly. I rather want to spent most of the time learning new methods.  In my first competitions I wasted some time with different technical things.
As an exemplary statistician I naturally use R.  So the short code snippets are in R.

Here are my tops of how I wasted time:

1.Forget to remove the…