Advanced Statistical and Machine Learning Techniques for High Dimensional Data Analysis

Advanced Statistical and Machine Learning Techniques for High Dimensional Data Analysis
Organizer and Chair: Abbas Khalili (McGill University)
[PDF]

 

RYAN TIBSHIRANI, Carnegie Mellon University
Recent Advances in Selective Inference  [PDF]
 
I will talk about new sets of tools for inference after model selection. The highlight will be doing inference along steps of the forward stepwise, least angle regression, and lasso paths, though the framework for inference is applicable well beyond these cases, and other relevant problems will be briefly described as well. I will also discuss some asymptotic results about the robustness of the proposed tests to non-Gaussian noise distributions, and if time permits, a completely different framework based on conformal inference, that is essentially distribution-free. This all represents joint work with many different authors.
 
 
ANAND N. VIDYSHANKAR, George Mason University
Efficiency Considerations in Post Model Selection Inference  [PDF]
 
It is common practice in high-dimensional data analysis that a model selection is first performed and then inference is carried out using the selected model presuming that the chosen model is the true model; that is, without accounting for model selection uncertainty. Recently, methods such as \emph{clean and screen} are being used to account for model selection uncertainty. However, efficiency properties of the resulting statistical procedures are largely unknown. In this presentation, we provide a systematic account of efficiency properties of post-selection estimators. In the process we address some foundational questions concerning the role of moderate deviation theory in the study of statistical efficiency.
 
YI YANG, McGill University
Variable Selection Deviation for High-dimensional Classification Methods  [PDF]
 
Despite wide applications of the regularization methods in many areas, sometimes we may question the reliability of the feature selection results, mainly for two reasons: (i) the outcome of selection depends heavily on the choice the tuning parameter, which could be very unstable if given small amount of data; (ii) although consistency in variable selection has been established for various regularization methods, it often depends on the assumption that the p-dimensional variable are sparse with many components being zero, which usually is unlike to hold for observational data. Thus we hope to develop the variable selection deviation (VSD) measures in the high-dimensional classification setting to evaluate the reliability of a set of selected variables.