Skip to main content

Get the party started

Have you already used trees or random forests to model a relationship of a response and some covariates? Then you might like the condtional trees, which are implemented in the party package.

In difference to the CART (Classification and Regression Trees) algorithm, the conditional trees algorithm uses statistical hypothesis tests to determine the next split. Every variable is tested at each splitting step, if it has an association with the response. The variable with the lowest p-value is taken for the next split. This is done until the global null-hypothesis of independence of the response and all covariates can not be rejected.

Conditional trees is my subject in a university seminar this semester. Here are my slides explaining the functionality of conditional trees, which I wanted to share with you. It includes the theory and two short examples in R.


Comments

  1. Thanks for the great presentation! Now I have another modelling tool in my belt :)

    ReplyDelete
  2. I saw your post. Good job! Nice to see a comparison with other model approaches.

    ReplyDelete
  3. I found your seminar paper "Recursive partitioning by conditional inference" very helpful to understand Hothorn 2006.

    By the way, using ctree (eventually cforest) to analyze magneto-encephalogrphic data. Thank you very much for this. Will cite this in myt methods paper I'm writing.

    Cheers,

    Antoine Tremblay

    ReplyDelete

Post a Comment

Popular posts from this blog

Explaining the decisions of machine learning algorithms

Being both statistician and machine learning practitioner, I have always been interested in combining the predictive power of (black box) machine learning algorithms and the interpretability of statistical models.

I thought the only way to combine predictive power and interpretability is by using methods that are somewhat in the middle between 'easy to understand' and 'flexible enough', like decision trees or the RuleFit algorithm or, additionally, by using techniques like partial dependency plots to understand the influence of single features. Then I read the paper "Why Should I Trust You" Explaining the Predictions of Any Classifier [1], which offers a really decent alternative for explaining decisions made by black boxes.


What is LIME? The authors propose LIME, an algorithm for Local Interpretable Model-agnostic Explanations. LIME can explain why a black box algorithm assigned a specific classification/prediction to one datapoint (image/text/tabular data) b…

Statistical modeling: two ways to see the world.

This a machine-learning-vs-traditional-statistics kind of blog post inspired by Leo Breiman's "Statistical Modeling: The Two Cultures". If you're like: "I had enough of this machine learning vs. statistics discussion,  BUT I would love to see beautiful beamer-slides with an awesome font.", then jump to the bottom of the post and for my slides on this subject plus source code.

I prepared presentation slides about the paper for a university course. Leo Breiman basically argued, that there are two cultures of statistical modeling:
Data modeling culture: You assume to know the underlying data-generating process and model your data accordingly. For example if you choose to model your data with a linear regression model you assume that the outcome y is normally distributed given the covariates x. This is a typical procedure in traditional statistics. Algorithmic modeling culture:  You treat the true data-generating process as unkown and try to find a model that is…