Skip to main content
Data Source
Libin Cardiovascular Institute
Organizer
Dr. Dina Labib, Dr. James White; Libin Cardiovascular Institute, University of Calgary


Background

Atrial fibrillation (AF) is the is the most common heart rhythm disorder, currently recognized as the 21st century cardiovascular disease epidemic, with an estimated life-time risk of up to 1 in 3 individuals above the age of 45 years.1,2 Among the most serious complications of this disorder is developing blood clots in the heart that can dislodge and cause stroke, with a 4- to 5-fold increased risk in these patients.1,3 Accurate prediction of developing future AF is important for early initiation of anticoagulant medications to prevent such devastating complications. Several risk scores have been developed for the prediction of AF using traditional statistical models, such as the C2HEST4 and CHARGE-AF5 scores, with modest performance in validation datasets (C-index 0.59-0.73). Additionally, some of these scores have been derived for restricted ethnicity groups. Recently, machine learning (ML) algorithms have been explored for this task and have shown improved predictive performance. One such study incorporated patient-reported, electronic health record (EHR), and cardiac MRI derived features, using a survival-based ML approach over a follow-up of five years, achieving a C-index of 0.78.6 Another ML model, FIND-AF, was derived in a UK primary care setting using routinely collected EHR variables for short term prediction of new onset AF at 6 months, with an ROC-AUC of 0.82.7 There is an ongoing need to develop accurate AF prediction models that are generalizable to patients routinely encountered across all clinical practice environments.  Whether routinely reported ECG markers can improve prediction accuracy above conventional EHR variables is currently unknown.

Research Question


Using a large repository of synthetic patient health data inclusive of 12 lead ECG and EHR variables captured from patients with suspected or known cardiovascular disease in Southern Alberta, can you develop a risk prediction model that can accurately predict the future occurrence of new-onset AF for individual patients?

Study cohort: A synthetic cohort of ~100,000 patients who have no prior history of AF and had a baseline ECG performed between January 2010 and January 2023, followed by a minimum follow-up of 12 months. Current or prior AF/flutter will be excluded on the basis of baseline ECG plus review of prior continuous ambulatory ECG monitoring (Holter), ICD-10-CA codes, or procedural codes related to AF/flutter interventions. This synthetic dataset has been generated by training off a randomly identified subset of ~100,000 patients from the Cardiovascular Imaging Registry of Calgary (CIROC).

Outcome of interest: New-onset future AF/flutter detected by any follow-up ECG, continuous ambulatory ECG monitoring (Holter), ICD-10-CA code, or procedural code for AF/flutter intervention.

Variables


Core demographics for each patient will be provided, inclusive of age and birth sex, followed by ICD-10-CCA/CCI coded baseline comorbidities, cardiac history, and coding for which cardiac procedures each patient has undergone. Diagnostic testing variables will include routinely reported ECG variables, patient location at time of baseline ECG (in-patient versus out-patient), followed by a collection of laboratory (blood) test variables captured surrounding the time of each ECG. Coding of all cardiac medications actively prescribed at time of baseline ECG will also be provided. Please see this link ssc2025_study01_datadic for a full list of variables and definitions. 

Note: Raw vector data of ECG’s is not being made available for this challenge.

Data access: 
A non-disclosure agreement will be signed by all participating teams, followed by granting access to the dataset hosted in a secure password-protected online environment. The dataset will be made available on January 15, 2025.

References

  1. Kornej, Jelena, Börschel, Christin S., Benjamin, Emelia J. & Schnabel, Renate B. Epidemiology of Atrial Fibrillation in the 21st Century. Circ. Res. 127, 4–20 (2020).
  2. Linz, Dominik et al. Atrial fibrillation: epidemiology, screening and digital health. Lancet Reg. Heal. - Eur. 37, 100786 (2024).
  3. Healey, Jeff S. et al. Subclinical Atrial Fibrillation and the Risk of Stroke. N. Engl. J. Med. 366, 120–129 (2012).
  4. Li, Yan-Guang et al. A Simple Clinical Risk Score (C2HEST) for Predicting Incident Atrial Fibrillation in Asian Subjects: Derivation in 471,446 Chinese Subjects, With Internal Validation and External Application in 451,199 Korean Subjects. Chest 155, 510–518 (2019).
  5. Alonso, Alvaro et al. Simple Risk Model Predicts Incidence of Atrial Fibrillation in a Racially and Geographically Diverse Population: the CHARGE‐AF Consortium. J. Am. Heart Assoc. 2, (2013).
  6. Dykstra, Steven et al. Machine learning prediction of atrial fibrillation in cardiovascular patients using cardiac magnetic resonance and electronic health information. Front. Cardiovasc. Med. 9, (2022).
  7. Nadarajah, Ramesh et al. Prediction of short-term atrial fibrillation risk using primary care electronic health records. Heart 109, 1072–1079 (2023).

Acknowledgment

This case study was prepared by Drs. James White, Dina Labib, and Jacqueline Flewitt, with help and guidance from the Case Study Committee of the Statistical Society of Canada. Any concerns and questions can be directed to the chair, Dr. Chel Hee Lee, via email, chelhee.lee@ucalgary.ca.