Distributed Kaplan-Meier Curves via the Influence Function
Large, multi-center observational data are required to study rare events and exposures. However, sharing sensitive individual-level survival data such as event times and patient characteristics can require a lengthy approval process. Existing work on distributed survival analysis focuses on parametric and semi-parametric models rather than non-parametric Kaplan-Meier (KM) curves. We develop a privacy-preserving sequential distributed method for approximating KM curves by splines updated via the influence function, with confounder adjustment via inverse probability weighting and inference using the weighted log-rank test. Our method requires sharing only summary-level data (spline coefficients and knot locations), and we show equivalent inferential performance to KM analysis with pooled data in simulations. We use our method to examine incidence of blood clots after COVID-19 infection and COVID-19 vaccination using electronic health record data at Corewell Health and Michigan Medicine.
Date and Time
-
Language of Oral Presentation
English
Language of Visual Aids
English