Online Updating Method with New Variables for Big Data Streams

For big data arriving in streams, online updating is an important statistical method that breaks the storage and computational barrier under certain circumstances. In the regression context, online updating algorithms assume that the set of predictor variables does not change, and consequently cannot incorporate new variables that may become available midway through the data stream. A naive approach would be to discard all previous information and start updating with new variables from scratch. We propose a method that utilizes the information from earlier data in the online updating algorithm with bias corrections to improve efficiency. The method is developed for linear models first, and then extended to generalized linear models. We compare the performance of our proposed bias-correcting approach and the naive approach in simulation studies. The method is applied to a study on airline delay, where reasons for delays were only available more recently, starting in 2003.

Date and Time: 

Monday, June 4, 2018 - 13:30 to 14:00

Co-authors (not including you): 

Chun Wang
Liberty Mutual Insurance
Ming-Hui Chen
University of Connecticut
Jing Wu
University of Rhode Island
Jun Yan
University of Connecticut
Yuping Zhang
University of Connecticut

Language of Oral Presentation: 

English

Language of Visual Aids: 

English

Type of Presentation: 

Invited

Session: 

Speaker

First Name Middle Name Last Name Primary Affiliation
Elizabeth D. Schifano University of Connecticut