Data Science and Analytics Workshop 2022

Date: Saturday June 4, 2022
Time: 13:00 - 16:30 (EDT)

Title: Intro to Databases in Industry: Data Cleaning, Querying, and Modeling at Scale


Rodolfo Lourenzutti, University of British Columbia
Arman Seyed-Ahmadi, University of British Columbia
Diego Ardila, Shopify


This workshop is intended to walk the participants through the journey of data from the “raw” to an “analysis-ready” state. Using R, we will explore the basic flow of data cleaning and organizing raw data such that the outcome is error-free, consistent and accurate. The participants will then be introduced to relational databases—the most widely used option for storing clean, well-structured data. We will explore how to interact with and efficiently retrieve data from relational databases using their well-known, powerful query language of SQL. Finally, we will show how to connect R to SQL databases for reading and writing purposes.