Finite-Population Inference with ML-Based Predictions
Machine-learning methods are increasingly used in national statistical offices and survey organizations, mainly to produce predictions at different stages of a survey. This course covers how to use those predictions for valid finite-population inference. We discuss model-assisted estimation and imputation for item nonresponse; for unit nonresponse, we examine what ML changes for inverse-probability weighting, what is currently justified, what is not, and why. Topics include standard and doubly robust estimators, variance estimation using cross-fitting, and asymptotically valid confidence intervals. Practical issues such as hyperparameter tuning, and weight trimming will also be discussed. By the end, participants will have an up-to-date toolkit for valid inference when ML predictions are used in surveys, and a clear view of the key open problems for unit nonresponse.