How a Misclassified Binary Outcome Y in Training Data Affects Model Prediction Performance: a Simulation Study
Statistics models are used for explaining and/or predicting an outcome of interest. For explanations, the focus is on parameter estimation that describes an independent variable’s effect. In this regard, the effect of a misclassified outcome variable and how to correct it has been studied extensively, with one popular method being MCSIMEX. However, a relevant question yet to be addressed is how misclassification affects predictive performance. We investigate this question through extensive simulation studies. Motivated by a real world example, we generated a binary event status Y that is subject to misclassification. We fit a logistic regression model using the misclassified Y* and assessed model performance on a test data simulated from the same underlying model without misclassification. We show that the predictive performance on test data is similar regardless of whether or not the misclassified Y* was corrected and always better than the performance on the training data.
Session
Date and Time
-
Language of Oral Presentation
English
Language of Visual Aids
English