Data Fusion, Network Meta-Analysis, and Causal Inference
Chair: Mireille E. Schnitzer and Russell Steele
Organizer: Mireille E. Schnitzer
Sponsor: Biostatistics Section
[Wednesday, June 14, 2017 10:20-11:50]
10:20-10:42
Elias Bareinboim (Purdue University)
Causal Inference from Big Data: Theoretical Foundations and the Data-Fusion Problem
In this paper, we summarize some of the latest results in the field of causal inference that are related to big data. In particular, we address the problem of data-fusion -- piecing together multiple datasets collected under heterogeneous conditions (i.e., different populations, regimes, and sampling methods) so as to obtain valid answers to queries of interest. The availability of multiple heterogeneous datasets presents new opportunities to big data analysts since the knowledge that can be acquired from combined data would not be possible from any individual source alone. However, the biases that emerge in heterogeneous environments require new analytical tools. Some of these biases, including confounding, sampling selection, and cross-population biases, have been addressed in isolation, largely in restricted parametric models. We here present a general, non-parametric framework for handling these biases and, ultimately, a theoretical solution to the problem of data-fusion in causal inference tasks.
10:42-11:04
Russell Steele (McGill University), Mireille E. Schnitzer (Université de Montréal), Ian Shrier (Lady Davis Institute, Jewish General Hospital, Montreal and McGill University)
Breaking the Myth of Breaking Randomization: A Causal Examination of Arm-Based MetaAnalysis
In the analysis of multi-arm randomized trials, methods for pooling data across trials belong to one of two broad classes. The first class of methods consists of contrast-based estimators that estimate the contrast in treatment effect for each pair of treatment levels and then pool across the estimated contrasts. The second class encompasses arm-based methods that contrasts marginal estimates for each treatment arm. Leading researchers have assailed arm-based methods under the broad criticism of “breaking randomization”, implying biased estimation for causal effects of treatment. However, no one has established a formal causal definition of “breaking randomization”, nor a critical examination of the amount of bias that would result. In this talk, I characterize the conditions under which the arm-based methods be biased for population causal effects and discuss the advantages that arm-based methods have over contrast-based methods with regards to precision.
11:04-11:26
Qinshu Lian (University of Minnesota-Twin Cities), Haitao Chu (University of MinnesotaTwin cities)
A Bayesian HSROC Model for Network Meta-analysis of Diagnostic Tests
When evaluating the accuracy of diagnostic tests, three designs are commonly used: (1) the crossover design; (2) the randomized design; and (3) the non-comparative design. Existing methods on meta-analysis of diagnostic tests mainly consider the simple cases when the reference test in all or none of the studies can be considered as a gold standard test, and when all studies use either a randomized or non-comparative design. Yet the proliferation of diagnostic instruments and diversity of study designs being used have boosted the demand to develop more general methods. We extend the Bayesian hierarchical summary receiver operating characteristic model to network meta-analysis of diagnostic tests to simultaneously compare multiple tests under a missing data framework. Our model accounts for the potential correlations between multiple tests within a study and the heterogeneity across studies. It also allows different studies to perform different subsets of diagnostic tests. Our model is evaluated through simulations and illustrated using real data from deep vein thrombosis tests.
11:26-11:50
Christopher Schmid (Brown University), Youdan Wang (Brown University)
Hierarchical Models for Combining N-of-1 Trials
N-of-1 trials are single-patient multiple-crossover studies for determining the relative effectiveness of treatments for an individual participant. A series of N-of-1 trials assessing the same scientific question may be combined to make inferences about the average efficacy of the treatment as well as to borrow strength across the series to make improved inferences about individuals. Series that include more than two treatments may enable a network model that can simultaneously estimate and compare the different treatments. Such models are complex because each trial contributes data in the form of a time series with changing treatments. The data are therefore both highly correlated and potentially contaminated by carryover. We will use data from a series of 100 N-of-1 trials in an ongoing study assessing different treatments for chronic pain to illustrate different models that may be used to represent such data.