Conformal Ordinal Prediction under Noisy Labels for Radiographic Knee Osteoarthritis Severity Grading

Knee osteoarthritis (KOA) is a prevalent and progressive disease. Deep learning methods have shown promise for automated Kellgren–Lawrence (KL) grade prediction from X-rays, but performance may be limited, potentially due to ambiguity and variability in human grading, especially between adjacent grades. We treat annotations as noisy measurements of latent disease severity rather than ground truth and propose a noise-robust ordinal learning framework with an image-conditional adjacent-grade noise mechanism to model structured label noise. To provide uncertainty quantification with formal guarantees, we construct ordinal conformal prediction sets using nonconformity scores that respect outcome ordering and finite-sample coverage. Simulations based on large-scale longitudinal cohorts evaluate robustness, interpretability, and calibrated uncertainty, enabling more reliable assessment and decision support in borderline KOA cases where multiple KL grades are plausible in clinical practice.

Prédiction ordinale conforme sous étiquettes bruitées pour la gradation radiographique de la sévérité de l’arthrose du genou

L’arthrose du genou (AG) est une maladie fréquente et progressive. Les méthodes d’apprentissage profond montrent un fort potentiel pour prédire automatiquement le stade de Kellgren–Lawrence (KL) à partir de radiographies, mais leurs performances peuvent être limitées par l’ambiguïté et la variabilité inter-lecteurs, notamment entre les stades adjacents. Nous considérons les annotations comme des mesures bruitées d’une sévérité latente plutôt que comme une vérité de référence. Nous proposons un cadre d’apprentissage ordinal robuste au bruit, fondé sur un mécanisme de bruit structuré entre grades adjacents, conditionnel à l’image. Pour quantifier l’incertitude avec des garanties formelles, nous construisons des ensembles de prédiction conformes pour données ordinales à partir de scores de non-conformité respectant l’ordre, garantissant une couverture en échantillon fini. Des simulations basées sur de larges cohortes longitudinales évaluent la robustesse, l’interprétabilité et la calibration de l’incertitude, afin de soutenir une évaluation plus fiable dans les cas limites où plusieurs stades de KL sont plausibles en pratique clinique.

Session

Concours pour le Prix de la meilleure présentation orale de recherche étudiante en statistique III

Date and Time

mar 02/06/2026 - 10:50 - mar 02/06/2026 - 11:05

Co-auteurs (non y compris vous-même)

Xinyi Wang

Department of Mathematics and Statistics, McMaster University

Ziqian Zhuang

Dalla Lana School of Public Health, University of Toronto

Divya Sharma

Department of Mathematics and Statistics, York University; Dalla Lana School of Public Health, University of Toronto; Department of Biostatistics, University Health Network

Osvaldo Espin-Garcia

Department of Epidemiology and Biostatistics, Western University; Dalla Lana School of Public Health and Department of Statistical Sciences, University of Toronto; Department of Biostatistics and Schroeder Arthritis Institute, University Health Network

Wei Xu

Dalla Lana School of Public Health, University of Toronto; Department of Biostatistics, University Health Network

Langue de la présentation orale

Anglais

Langue des supports visuels

Anglais

Speaker