Skip to main content

Advances in Regression Tree Modelling 
Organizer and Chair: Matthew Pratola (Ohio State University) 
[PDF]

HUGH CHIPMAN, Acadia University
Dispersion Modelling with an Ensemble of Trees [PDF]
 
Bayesian additive regression trees (BART) is a flexible and scalable supervised learning model that offers accurate assessment of uncertainty via credible intervals. It makes the strong assumption of iid errors. Even when error variance is nonconstant, BART can still give accurate point predictions. However, credible intervals are unlikely to remain accurate or useful. We develop a novel heteroscedastic BART model to alleviate these concerns. This is achieved through the introduction of Bayesian Multiplicative Trees, which model the variance component of BART as a function of the predictors. We implement the approach and demonstrate it in several examples. 
 
TOM LOUGHIN, Simon Fraser University
Robust Adaptively Pruned Random Forests Using Likelihood-Based Trees  [PDF]
 
Random Forests for regression are typically constructed using standard regression trees. These trees make splits that minimize squared error, which implicitly assumes homoscedasticity. They are not robust against heteroscedasticity, and this can be passed on to the forests. As heteroscedasticity is prevalent in a substantial fraction of real datasets, random forests may underperform frequently. Furthermore, they are inefficient at fitting mean functions that are partially flat. We present a likelihood-based version of regression trees that explicitly model both the mean and variance, and use these trees as base learners in our random forest. We also develop a fast pruning algorithm based on information criteria that improves the fit to partially flat mean functions. 
 
DANIEL ROY, University of Toronto
Mondrian Processes and their Statistical Applications  [PDF]
 
Mondrian processes are a class of continuous time stochastic processes that induce a random hierarchical partition of a product space. In this talk, I will survey Mondrian processes and their application to a number of statistical problems, including network modeling and efficient online classification and regression via Mondrian forests.