Skip to main content

Date: Sunday May 29, 2022
Time: 11:00 - 18:00 (EDT)

Title: Delivering applied statistics from concept to production

Instructors:

Peter Solymos
Peter is a senior data scientist with 20 years of experience in wildlife, environmental, and utilities sectors. He holds a PhD in biology, he has authored 70 peer-reviewed publications and several statistical software packages. He is passionate about using statistics and data science to bridge the gap between data and decision making. His focus is on enabling this by helping organizations adopting cloud-
native practices into their operations.




Khalid Lemzouji
Khalid is a senior statistician and data scientist. Khalid has 15 years of experience in environmental, public health, and pipeline reliability risk decision. Khalid is skilled in using statistical and machine learning tools to transform data to knowledge. The knowledge is used for informed decision making by environmental scientists, cardiac surgeons and pipeline engineers. Khalid is an accredited professional statistician from the Statistical Society of Canada (P.Stat.) and the American Statistical Association (PStat® (ASA)). Khalid has double bachelors in chemical engineering and statistics and a
masters in statistics from University of Alberta.


Description:

Modern applied statistics involve communicating the results to various audiences. This communication increasingly takes place in interactive media rather than status reports. Traditional education for statisticians does not adequately prepare applied scientists for effectively handling such requirements. However, healthy exposure to software engineering skills and practices can greatly facilitate the timely delivery of results. This is due to the shorter time to working prototypes, shorter feedback loops involving stakeholders, and easier communication with IT/engineering when it comes to scale and performance.

 

Our 1-day course will introduce the thought process of making modular and reusable software code. Such code lays the foundation for quickly building prototypes and interfaces. We will introduce cloud-native technologies, such as Docker, and will use the R statistical programming language. The R language supports building full-featured web applications using the Shiny framework and developing web interfaces. We will use free and open-source software with a focus on R. The workshop organizers will pre-configure cloud instances for the participants to use thus cutting down on preparation and installation time, and also removing setup-related issues.

 

Participants will be expected to be familiar with R but extensive knowledge is not required. We will use the RStudio integrated development environment for programming and for accessing servers. Participants will need their own laptop with a modern internet browser and access to the internet. AV equipment depends on the in-person/remote/hybrid nature of the workshop (projector, conferencing software for remote participants).

Outline:

The 6-hour workshop will be structured into 4 blocks, each approximately 1.5 hours long.

  • Introductions
  • Shaping the concept into a data modelling workflow
  • Coffee break
  • Developing command-line interfaces and application prototypes
  • Lunch break
  • Sharing results with stakeholders
  • Deploying interactive applications to the cloud
  • Coffee break
  • Performance and scale: decoupling business logic from presentation