Skip to main content

How to speed up your R computation by vectorization and parallel programming

Sunday, May 29, 2016 from 9 am to 4 pm, lunch included — Thistle 256

Hao Yu, Western University

Abstract

Running Monte Carlo simulations and the analysis of (big) data sets can take a very long time to complete (days, weeks, or even months in some cases). Therefore, it is essential to think carefully about your code to make it as efficient as possible. On the one hand, it is unavoidable that methodological advances in statistics have led to more computationally intensive solutions, but on the other, your R codes may not take full advantage of the R language itself. This workshop aims to improve the efficiency of your R codes and computation through three basic approaches. The first one is vectorization of R codes, which can lead to a large reduction in computation time. The second one is the use of C to speed up computations such as looping. The third one is to parallelize the computations by breaking down a large job into many small tasks that can be run concurrently on multiple CPUs or Cores.

In this workshop, the main topics are:

  1. How to vectorize R codes
  2. How to interact with C language and utilize many R built-in C functions
  3. Introduce parallel programming including embarrassingly parallel
  4. Use R default package parallel to parallelize computation on a single PC
  5. Introduce MPI (Message Passing Interface) and Rmpi package
  6. How to use Rmpi to parallelize computation on a Beowulf cluster such as SHARCNET

This workshop will be particularly useful for graduate students and professionals who analyze large data sets and who use computationally intensive techniques such as simulation and resampling.

Presenter

 

Hao Yu is Professor of Statistical and Actuarial Sciences at Western University. He is a recognized expert in statistical computing, and in particular, parallel statistical computing. His other research interests include limit theory, empirical and quantile processes, nonparametric time series, stationarity and probabilistic statistics.