- MARTIN BLOSTEIN, McMaster University
Robust High-Dimensional Modeling with the Contaminated Gaussian Distribution [PDF]
- The contaminated Gaussian is a robust elliptic distribution that allows for automatic detection of ``bad points'', i.e. outliers and noise. The contaminated Gaussian factor analysis model is proposed as an extension of the usual latent Gaussian factor analysis model. In turn, a mixture of these contaminated Gaussian factor analyzers is introduced, allowing robust data-reduction and detection of bad points even with high-dimensional data. The number of free parameters is controlled by specifying several parsimonious models with different constraints on covariance structure. For each model, a variant of the EM algorithm is implemented for parameter estimation.
- MICHAEL PATRICK BRIAN GALLAUGHER, McMaster University
Extending Fractionally Supervised Classification to Non-Gaussian Mixture Models [PDF]
- Fractionally-supervised classification has recently been considered in the context of the Gaussian mixture model. The approach allows for differing degrees of supervision by increasing or decreasing the respective influence of labelled and unlabelled observations when building a classifier. Using both real and simulated data, the performance of fractionally-supervised classification is considered in cases where the mixture component densities are not Gaussian.
- YANG TANG, McMaster University
Model-Based Clustering of Categorical Data with Extreme Patterns [PDF]
- We propose a mixture of latent trait models with the contaminated Gaussian distribution for the clustering of binary data. A mixture of contaminated Gaussian distributions is implemented to capture the outliers in the latent space in order to enhance the clustering performance. A variational approximation to the likelihood is exploited to derive a fast algorithm for determining the model parameters. Real and simulated data are used to demonstrate this approach.
- UTKARSH J. DANG, McMaster University
Power Exponential Mixtures and Skewed Extensions [PDF]
- A family of parsimonious mixtures of multivariate power exponential distributions is presented. The multivariate power exponential distribution is a flexible elliptical alternative to the Gaussian and Student t-distributions, allowing for dealing with both varying tail-weight (light or heavy) and peakedness of data. For particular values of the shape parameter, special and limiting cases of this distribution include the double-exponential, Gaussian, and the uniform distributions. Furthermore, an extension of these models is presented that can also model asymmetric data. Computational and inference challenges will be discussed. Lastly, the utility of the proposed models is illustrated using both toy and benchmark data.
- ANJALI SILVA, University of Guelph
Mixture Model Selection for Cluster Analysis of RNA Sequencing Data [PDF]
- Model-based clustering utilizes mixture models for clustering. It is a form of unsupervised learning where the group memberships of the observations are unknown. Hence, mixture models, commonly used for clustering, are fitted for a range of possible components and model selection is applied to determine the optimal number of components. Typically, each component corresponds to a cluster. Here we perform clustering of real and simulated RNA sequencing data, which is characterized as discrete, skewed, and high dimensional. Model selection through information criteria (BIC, ICL, AIC, AIC3) and slope heuristics (Djump and DDSE) are explored and compared.