2016-Biostatistics: Methodological Innovation 2


Biostatistics: Methodological Innovation 2 
Chair: Andrea Benedetti (McGill University) 
[PDF]

ASHLEY BONNER, McMaster University
Inference for Model Parameters in Sparse Canonical Correlation using Resampling Techniques [PDF]
 
Sparse canonical correlation analysis (sCCA) is a multivariate method that uses latent variables to explore sparse and complex associations between high-dimensional data types. Currently, there are limited inferential strategies to accompany the estimated loading values that researchers depend on to determine whether, and to what extent, each variable contributes to the latent relationship. Through simulation, we evaluate the performance of re-sampling techniques, including the non-parametric bootstrap, to explore distributional properties and create confidence intervals for sCCA loading values. Under certain data conditions, re-sampling may offer a means to infer some level of confidence in the complex relationships returned from sCCA. 
 
OSVALDO ESPIN-GARCIA, University of Toronto and Lunenfeld-Tanenbaum Research Institute
Two-Phase Designs for Joint Quantitative-Trait-Dependent and GWAS-SNP-Dependent Sampling in Post-GWAS Regional Sequencing  [PDF]
 
We evaluate two-phase designs to follow up findings from genomewide association study (GWAS) when regional sequencing in the entire cohort is too expensive. We develop a novel EM-estimation under a semiparametric maximum likelihood formulation tailored for post-GWAS inference. A GWAS-SNP serves as an auxiliary covariate in inferring association between a sequence variant and a normally distributed quantitative trait (QT). We perform simulations to quantify efficiency and power of QT-SNP joint sampling under alternative sample allocations. A joint allocation balanced on GWAS-SNP genotype and extreme-QT strata yields power improvements compared to marginal QT or SNP-sampling counterparts. 
 
JINGXIONG XU, University of Toronto
Bayesian Statistical Approaches for Genetic Association with Next Generation Sequencing (NGS) Data  [PDF]
 
The discovery of rare variants is becoming a major challenge in genetic association studies and could help elucidating the genetic basis of common diseases. Because rare variants occur too infrequently in the general population, single-variant association tests lack power in NGS analyses. We developed several joint test statistics using a Bayes Factor approach to assess the evidence of association between a set of rare variants located on same chromosomal region and a cancer outcome. Our simulation studies show the advantages of the Bayesian approach compared to two popular approaches: the burden test and the sequence kernel association test (SKAT). 
 
QING YU, University of New Brunswick
A Bayesian Poisson Mixed Modelling Approach to Zero-Inflated Cox Proportional Hazard Models [PDF]
 
In Cox proportional hazard models, frailty can be introduced to incorporate individual susceptibility to a disease or an event of interest; however, some individuals may be immune or insusceptible to certain diseases. We use compound Poisson distributed frailties, which has a spike at zero, to characterize the heterogeneity in susceptibility among individuals. This approach allows us to accommodate both zero and positive susceptibilities in an integral way. We extend the Poisson modelling approach proposed by Ma et al. (Biometrika, 2003) to estimation of our model. Our proposed approach is illustrated with a simulated dataset and a real life example. 
 
TIAN FENG, McMaster University
A Bayesian Method for Handling Missing Data in the Growth Curve Model  [PDF]
 
The growth curve model (GCM) is a generalized multivariate analysis of variance (GMANOVA) model. The standard GCM has several desirable properties when data is complete, but is problematic when there are missing observations. We propose a Bayesian method for handling missing data in GCM using the Gibbs sampling algorithm. An extensive simulation study is carried out to compare the proposed method with the expectation-maximization (EM) algorithm. The Bayesian method is found to be generally useful, and gives promising results compared with the EM algorithm. Classical datasets are used to illustrate the performance of the proposed method.