Skip to main content

Parallel Computing in R

June 12, 9:00 am – 12:00 pm, 1:30 – 4:00 pm
Hao Yu, University of Western Ontario 
 

Analyzing data sets, which are increasing large in many fields such as Bioinformatics and Financial Mathematics, requires large amount of computing power. At the same time, methodological advances in statistics have led to more computationally demanding solutions. Both increased data size and increased simulation demands can be solved via parallel computing, a technique that breaks down a large computation into many small tasks that can be run concurrently on multiple CPUs. In this workshop, we focus on how to do parallel computing in R via explicit parallelism. The main R package used is Rmpi which is a wrapper to MPI (Message Passing Interface), the de facto standard in parallel computing. Rmpi can be run on a desktop/laptop with mutlicore CPU(s) to a cluster with hundred or thousand CPUs. For example, simulation or computation time reduction from half year to days is achievable on a cluster with hundred CPUs. In this workshop, the main topics are: (1) What is parallel computing and why is it useful in statistical computing?(2) How do we set up or utilize a parallel computer/cluster? (3) Introduce some basic MPI operations; (4) Learn how to use parallel apply functions to do so-called "embarrassingly parallel"; (5) Introduce some advanced MPI programming; (6) Introduce other parallel tools in R, mostly the snow package. Please check http://www.stats.uwo.ca/faculty/yu/Rmpi for related information.