Aller au contenu principal
Determining Adequate Sample Size for Studies Using Machine Learning
Studies that employ traditional statistical methods can rely on established techniques and guidelines for determining adequate sample size for achieving sufficient power, precision, or developing accurate prediction models. However, there are no well-developed methods for determining sufficient sample size for machine learning (ML) applications. Previous simulation studies suffer by simplistic data generation mechanisms, unrealistic assumptions, lack of tuning, limited range of sample sizes and performance measures used, among others. We address this gap, using a comprehensive simulation design focusing on binary classification. We extend a published methodology for simulating benchmark population datasets, and use varying sized samples, nested cross validation and two performance measures (AUC, Brier score) to train, tune and evaluate models of three ML methods (random forests, neural nets, SVM). Results can guide sample size determination and the design of studies using ML methods.
Date and Time
-
Co-auteurs (non y compris vous-même)
Samer El Kababji
Children's Hospital of Eastern Ontario Research Institute
Khaled El Emam
Children's Hospital of Eastern Ontario Research Institute
Langue de la présentation orale
Anglais
Langue des supports visuels
Anglais

Speaker

Edit Name Primary Affiliation
Nicholas Mitsakakis Children's Hospital of Eastern Ontario Research Institute