Skip to main content
Boosting Methods for Imbalanced Data Classification
Data imbalance is an important consideration when working with real world data. Over/undersampling approaches allow us to gather more insight from the limited data we have on the minority class; however, there are many proposed methods. The goal of our study is to identify the optimal approach for over/undersampling to use with Adaptive Boosting (AdaBoost). Based on a simulation study, we’ve found that combining AdaBoost with various sampling techniques provides an increased weighted accuracy across classes for progressively larger data imbalances. The three Synthetic Minority Oversampling Technique’s (SMOTE) performed the best, with the SMOTE – Edited Nearest Neighbours (SMOTE-ENN) approach being the most accurate for all levels of data imbalance. We then applied the most effective over/undersampling methods to predict upsets (games where the lower seeded team wins) in the March Madness College Basketball Tournament.
Date and Time
-
Additional Authors and Speakers (not including you)
William Marshall
Brock University
Mei Ling Huang
Brock University
Language of Oral Presentation
English
Language of Visual Aids
English

Speaker

Edit Name Primary Affiliation
Raymond Romaniuk Brock University