Estimating Common Odds Ratio with Missing Data
Smith, Paul J.
MetadataShow full item record
We derive estimates of expected cell counts for $I\times J\times K$ contingency tables where the stratum variable $C$ is always observed but the column variable $B$ and row variable $A$ might be missing. In particular, we investigate cases where only row variable $A$ might be missing, either randomly or informatively. For $2\times 2\times K$ tables, we use Taylor expansion to study the biases and variances of the Mantel-Haenszel estimator and modified Mantel-Haenszel estimators of the common odds ratio using one pair of pseudotables for data without missing values and for data with missing values, based either on the completely observed subsample or on estimated cell means when both stratum and column variables are always observed. We examine both large table and sparse table asymptotics. \\ Analytic studies and simulation results show that the Mantel-Haenszel estimators overestimate the common odds ratio but adding one pair of pseudotables reduces bias and variance. Mantel-Haenszel estimators with jackknifing also reduces the biases and variances. Estimates using only the complete subsample seem to have larger bias than those based on full data, but when the total number of observations gets large, the bias is reduced. Estimators based on estimated cell means seem to have larger biases and variances than those based only on complete subsample with randomly missing data. With informative missingness, estimators based on the estimated cell means do not converge to the correct common odds ratio under sparse asymptotics, and converge slowly for the large table asymptotics. The Mantel-Haenszel estimators based on incorrectly estimated cell means when the variable $A$ is informatively missing behave similarly to those based on the only complete subsamples. The asymptotic variance formula of the ratio estimators had smaller biases and variances than those based on jackknifing or bootstrapping. Bootstrapping may produce zero divisors and unstable estimates, but adding one pair of pseudotables eliminates these problems and reduces the variability.