A Two-Stage Approach to Missing Data: Theory and Application to Auxiliary Variables
Victoria Savalei, UBC
Abstract:
Covariance structure analysis is concerned with testing hypotheses about the structure of the population
covariance matrix. Applications include simultaneous equation models, factor models, and full structural
equation models. In the presence of missing data, a popular ad-hoc approach to conducting such an analysis
is to first obtain the saturated maximum likelihood (ML) estimate of the covariance matrix (sometimes called
the ``EM covariance matrix''), and then to proceed to estimate the structured parameters treating this matrix
as if it were obtained from complete data. This two-stage (TS) approach is appealing because the first stage
is easily done, and the second stage reduces the problem to a familiar complete data problem. An additional
advantage of the TS approach is that it allows for easy incorporation of auxiliary variables in stage 1,
which may be important in predicting missingness, yet allows to completely ignore them in stage 2, reducing
dimensions of the problem. The main disadvantage is that the standard errors and test statistics obtained in
stage 2 will not be correct. In this talk, I will describe how to obtain correct standard errors and test
statistics for the parameters obtained in Stage 2 of this approach, with both MCAR and MAR normally distributed
data. I compare this approach to a direct maximum likelihood approach. While the TS approach is marginally less
efficient, it performs extremely well, and its test statistic outperforms the test statistic from the direct ML
approach. The TS method is recommended for use with missing data.