Exploring Model-Assisted and Model-Based Survey Estimation Techniques in Relation to COVID-19 Impacts for the United States

Scroll

The project was the culmination of months dedicated to my masters research project in which we (my advisor and I) wanted to further understand the way in which the COVID-19 pandemic impacted social indicators of the household using survey statistics. In particular, I was interested in studying the loss of employment income and the delay in medical care of households across the continental United States. We began this work with the construction of finite population parameters and comparing the construction of these estimators based on different techniques.

We constructed estimators through a design-based approach and a model-based approach. Via the design-based approach we constructed a direct estimator and a model-assisted estimator. The direct estimator was the Horvitz-Thomspon estimator and the model-assisted estimator was categorized as the difference estimator (in the literature) with several machine learning techniques explored as the ‘method’.

In the model-based approach, we constructed estimates using a Bayesian formulation of the Fay-Herriot model.

Model-assisted estimator

Fay-Herriot model

Fay-Herriot Bayesian formulation

Posterior distributions used for MCMC

Empirical Simulation Study

A comparison of the estimator performances were computed and calculated using mean square error (MSE) as a metric of ‘performance’ to compare the different estimators for ‘loss of employment income’ and ‘delay in medical care’.

MSE and Bias across estimators when estimating ‘employment income loss’.

MSE and Bias across estimators when estimating ‘delay in medical care’.

Application Study

Estimated proportion of loss of employment income across the U.S. and different estimators.

Variance of estimators when estimating loss of employment income

Estimated proportion in delay in medical care due to COVID-19 in the U.S. across estimators.

Variance of estimators when estimating delay in medical care due to COVID-19

Discussion

This work brings forward the question as to the ultimate conclusion of this work—which is that the most appropriate methodology for analysis of finite population quantities of interest depends on what is being estimated and how that quantity of interest may benefit from a combination of techniques. Where sample sizes are large enough, it may be sufficient to rely on a direct estimator such as the HT to compute population estimates. On the other hand when sample sizes are questionably small, it may behoove the practitioner to rely on techniques that ‘borrow’ information from the area such as the MAE and to compute a more robust estimate, even incorporating FH model based estimates. However, as the analysis of ‘delay’ has showcased, even with sample sizes that are large enough, incorporating auxiliary information may still provide an insightful change in estimated values. Importantly, we note that the benefits of utilizing auxiliary information through the MAE is situational and is not always ‘better’. With seemingly any area of statistical study, it is up to the practitioner to decide which approach seems most appropriate to solve the particular problem of study.

Next
Next

Los Angeles Spatial/Statistical Data Science Project Series