CJS editor's corner | Statistical Society of Canada

Dimanche 6 octobre 2024 - 12:00

Liaison Newsletter

Liaison Vol. 38.6 October 2024

As I write these lines, summer is nearing its end. As far as CJS is concerned, we’re saying farewell to the old design. The September issue comprises the last 14 articles typeset in the former style. They are no less interesting, however. Below is my digest of their content.

This past summer has seen many exciting sports competitions. If you have been following the UEFA Euro or Copa América soccer championships, I’m sure you’ll enjoy the opening article by Roberts and Rosenthal [1], who consider the design of football group draws that are not only uniform over all valid team assignments but also transparent and make for entertaining broadcasts. They explain why the 2022 FIFA World Cup draw wasn’t uniform and propose several practical ways to achieve uniformity which can be easily adapted to different sports competitions.

Studying internet behaviour through clickstream data creates many interesting statistical problems in the realm of clustering and classification. Gallaugher and McNicholas [2] develop a mixture of first-order continuous Markov models for unsupervised and semi-supervised learning. Their approach allows them to factor in the amount of time each user spends on each website.

The next four articles contribute to functional data analysis. Pan, Shen, and Hu [3] study the spatial heterogeneity effect for functional data observed at spatially correlated locations. They propose a Bayesian nonparametric clustering approach with a geographically weighted Chinese restaurant process and a conditional autoregressive prior. Do and Du [4] consider functional analysis of variance to compare groups of functional data. To ease the commonly made assumption of independent functional observations, they develop a functional contrast test that accounts for time dependence between functional group members. Zhou, Yan, and Zhou [5] develop a robust model for sparsely observed paired functional data, such as multiband light curves of supernovae. Their approach features functional principal components, joint modelling of the principal component scores and measurement errors, and splines. Hao, Liu, Su, and Zhao [6] investigate the potential effects of functional and scalar predictors on mortality risks. To this end, they propose a functional additive hazards model and develop a penalized least squares estimation method based on a pseudo-score estimating equation.

As is well-known, measurement errors or incomplete data call for special statistical methodology. An example motivating the work of Jin, Liu, Mao, Sun, and Wu [7] are clinical studies where the observation of the onset of the disease relies on patient recall or chart review of electronic medical records. They propose a simulation extrapolation approach to account for measurement errors and utilize it to construct a nonparametric estimation of a survival function of interval-censored data. Incomplete data may also arise in environmental sciences, for example in chemical concentration measurements which are subject to left censoring and may contain missing values. To model the latter, Valeriano, Schumacher, Galarza, and Matos [8] propose censored linear regression model with serially correlated errors and Student t innovations. To obtain maximum likelihood estimates, they utilize stochastic approximation of the EM algorithm.

Missing data, such as item nonresponse, are also commonplace in surveys. Chen, Haziza, and Michal [9] consider a class of multiply robust imputation procedures that are insensitive to the violation of certain model assumptions, such as the presence of influential units. They then develop an efficient version of multiply robust estimators, in which the influence of a unit is measured through conditional bias.

The following two articles contribute to the modelling of discrete or mixed data. Yan and Ma [10] study longitudinal data which have a point mass at the origin. Their approach utilizes the Tweedie compound Poisson distribution with serially correlated nonparametric random effects and can achieve both population-averaged and subject-specific interpretations of covariate effects. Kang, Zhu, Wang, and Wang [11] propose a zero-modified geometric first-order autoregressive model to handle potential zero inflation, zero deflation, over- and under-dispersion in count time series.

The three closing contributions are concerned with data integration. Yu, Ye, and Wang [12] study high-dimensional linear regression models that aim to account for the heterogeneity of multiple data sources. Their contribution is an adaptive clustering penalty that achieves variable selection at the source level and coefficient clustering at the covariate level, and an alternating direction method of multipliers algorithm for parameter estimation. Zhang, Wu, and Gao [13] consider correlated data that have been collected from multiple platforms. They extend existing linear models by the inclusion of sub-Gaussian and sub-exponential random errors and propose a model selection criterion based on Bayesian composite posterior probabilities to identify important predictors across multiple platforms. Finally, Hector [14] aims to determine which data sources share the same mean model parameter in a setting in which multiple independent studies each collect multiple dependent vector outcomes. Her proposed technique specifies a quadratic inference function within each data source and combines mean model parameters using a new formulation of a pairwise fusion penalty.

Wishing you inspirational readings and a smooth fall semester!

Johanna G. Nešlehová, Editor-in-Chief

The Canadian Journal of Statistics

Table of Contents of the September 2024 Issue of The Canadian Journal of Statistics

Football group draw probabilities and corrections, by/par Gareth O. Roberts & Jeffrey S. Rosenthal
Clustering and semi-supervised classification for clickstream data via mixture models, by/par Michael P. B. Gallaugher & Paul D. McNicholas
Clustering spatial functional data using a geographically weighted Dirichlet process, by/par Tianyu Pan, Weining Shen, & Guanyu Hu
Contrast tests for groups of functional data, by/par Quyen Do & Pang Du
Robust joint modelling of sparsely observed paired functional data, by/par Huiya Zhou, Xiaomeng Yan, & Lan Zhou
Semiparametric estimation for the functional additive hazards model, by/par Meiling Hao, Kin-yat Liu, Wen Su, & Xingqiu Zhao
Nonparametric estimation of a survival function in the presence of measurement errors on the failure time of interest, by/par Shaojia Jin, Yanyan Liu, Guangcai Mao, Jianguo Sun, & Yuanshan Wu
Censored autoregressive regression models with Student-t innovations, by/par Katherine A. L. Valeriano, Fernanda L. Schumacher, Christian E. Galarza, & Larissa A. Matos
Efficient multiply robust imputation in the presence of influential units in surveys, by/par Sixia Chen, David Haziza, & Victoire Michal
Modelling occurrence and quantity of longitudinal semicontinuous data simultaneously with nonparametric unobserved heterogeneity, by/par Guohua Yan & Renjun Ma
A zero-modified geometric INAR(1) model for analyzing count time series with multiple features, by/par Yao Kang, Fukang Zhu, Dehui Wang, & Shuhui Wang
High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources, by/par Tingting Yu, Shangyuan Ye, & Rui Wang
Bayesian model selection via composite likelihood for high-dimensional data integration, by/par Guanlin Zhang, Yuehua Wu, & Xin Gao
Fused mean structure learning in data integration with dependence, by/par Emily C. Hector