Skip to main content

Hao Yu, Western University
 

One-Day Short Course, 25 May 2014, 9:00am - 4:30pm (lunch break 12:00 - 1:30) Room FG 129, near the Medical Sciences building, at the University of Toronto
 

Nowadays, Monte Carlo simulation is an essential part of our research, whether it be for a probability and statistics related problem. Simulation of a complex stochastic model or analyzing "Big Data" requires enormous computing power. Although advances in computing power and algorithm may lead to reduction of computational time, some tasks often still require weeks or months time to finish. The majority of those problems can be solved via distributed computing or parallel computing, a technique that breaks down a large computation into many small tasks that can be run concurrently on multiple nodes or CPUs.
 

In this workshop, we focus on how to do parallel computing in R. Various high-performance, parallel computing R packages, including the default and Rmpi packages, will be introduced. The parallel package is suitable for running a small to medium scale computing problem on a single desktop/laptop. However, for large scale computing problems, the Rmpi or similar packages are required. Rmpi is a wrapper to MPI (Message Passing Interface), the de facto standard in parallel computing. With Rmpi, simulation or computation time reduction from half year to days is achievable on a proper cluster such as SHARCNET.
 

Our main topics include:
 

1. What is parallel computing and why is it useful in statistical computing?

2. How do we set up or utilize a parallel computer/cluster.

3. Some parallel R packages.

4. How to use parallel apply functions to do the "embarrassingly parallel".

5. Advanced programming via stochastic simulation examples.