Skip to main content
How a Misclassified Binary Outcome Y in Training Data Affects Model Prediction Performance: a Simulation Study
Statistics models are used for explaining and/or predicting an outcome of interest. For explanations, the focus is on parameter estimation that describes an independent variable’s effect. In this regard, the effect of a misclassified outcome variable and how to correct it has been studied extensively, with one popular method being MCSIMEX. However, a relevant question yet to be addressed is how misclassification affects predictive performance. We investigate this question through extensive simulation studies. Motivated by a real world example, we generated a binary event status Y that is subject to misclassification. We fit a logistic regression model using the misclassified Y* and assessed model performance on a test data simulated from the same underlying model without misclassification. We show that the predictive performance on test data is similar regardless of whether or not the misclassified Y* was corrected and always better than the performance on the training data.
Date and Time
-
Additional Authors and Speakers (not including you)
Yutong Han
University of Alberta
Language of Oral Presentation
English
Language of Visual Aids
English

Speaker

Edit Name Primary Affiliation
Yan Yuan University of Alberta