Biostatistics Workshop 2022

Date: Sunday June 5, 2022
Time: 12:00 - 15:30 (EDT)

Title: Electronic Health Records Phenotyping

Instructor: Jessica Gronsbell, University of Toronto


The widespread adoption of electronic health records (EHRs) has resulted in an unprecedented opportunity to leverage routinely collected medical data for purposes beyond patient care and billing. Vast amounts of longitudinal, patient-level information that were once locked away in paper format are being tapped for epidemiological research, clinical decision making, disease surveillance, and real-world predictive modeling of disease risk factors. The first step in nearly every EHR-based application is phenotyping, the process of identifying the subset of patients among the hundreds of thousands in the database who have the disease, condition, or characteristic that qualify them for analysis. Although a ubiquitous aspect of EHR research, phenotyping is a time consuming and financially demanding task due to the amount of expert knowledge required to precisely translate a clinical condition into criteria that describe its manifestation in the EHR. In this workshop, I will introduce statistical learning methods designed to expedite the phenotyping process in order to improve the scalability of EHR research.

Topics Covered & Timetable

This will be a half-day workshop covering the following topics:

  1. EHR Data: What is it and what is it good for? (20 mins)
  2. A brief history of EHR phenotyping (30 mins)
    Break - 10 mins
  3. Semi-supervised learning methods to expedite phenotyping (90 mins)
    1. Background on statistical learning
    2. General approach
    3. Processing unstructured data with clinical natural language processing
    4. Feature selection methods
  4. Current research in phenotyping (15 mins)
    Break - 10 mins
  5. Hands-on phenotyping example (40 mins)

Learning Objectives

  • Understand the benefits and challenges of using EHR data for research
  • Understand the challenges of EHR-phenotyping and basic approaches to address them
  • Ability to implement a phenotyping algorithm using statistical learning methods


  • R programming

Required Software

All attendees will need a laptop with R to participate in the hands-on session.