Joint Program in Survey Methodology

Permanent URI for this communityhttp://hdl.handle.net/1903/2251

Browse

Search Results

Now showing 1 - 10 of 10
  • Thumbnail Image
    Item
    Understanding the Mechanism of Panel Attrition
    (2009) Lemay, Michael; Kreuter, Frauke; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Nonresponse is of particular concern in longitudinal surveys (panels) for several reasons. Cumulative nonresponse over several waves can substantially reduce the proportion of the original sample that remains in the panel. Reduced sample size increases the variance of the estimates and reduces the possibility for subgroup analysis. Also, the higher the attrition, the greater the concern that error (bias) will arise in the survey estimates. The fundamental purpose of most panel surveys is to allow analysts to estimate dynamic behavior. However, current research on attrition in panel surveys focuses on the characteristics of respondents at wave 1 to explain attrition in later waves, essentially ignoring the role of life events as determinants of panel attrition. If the dynamic behaviors that panel surveys are designed to examine are also prompting attrition, estimates of those behaviors and correlates of those behaviors may be biased. Also, current research on panel attrition generally does not differentiate between attrition through non-contacts and attrition through refusals. As these two source of nonresponse have been shown to have different determinants, they can also be expected to have different impacts on data quality. The goal of this research is to examine these issues. Data for this research comes from the Panel Survey of Income Dynamics (PSID) conducted by the University of Michigan. The PSID is an ongoing longitudinal survey that began in 1968 and with a focus on the core topics of income, employment, and health.
  • Thumbnail Image
    Item
    HIERARCHICAL BAYES ESTIMATION AND EMPIRICAL BEST PREDICTION OF SMALL-AREA PROPORTIONS
    (2009) Liu, Benmei; Lahiri, Partha; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Estimating proportions of units with a given characteristic for small areas using small area estimation (SAE) techniques is a common problem in survey research. The direct survey estimates, usually based on area-specific sample data, are very imprecise or even unavailable due to the small or zero sample sizes in the areas. In order to provide precise estimates, a variety of model-dependent techniques, using Bayesian and frequentist approaches, have been developed. Among those, empirical best prediction (EBP) and hierarchical Bayes (HB) methods relying on mixed models have been considered for estimating small area proportions. Mixed models can be broadly classified as area or unit level models in SAE. When an area level model is used to produce estimates of proportions for small areas, it is commonly assumed that the survey weighted proportion for each sampled small area has a normal distribution and that the sampling variance of this proportion is known. However, these assumptions are problematic when the small area sample size is small or when the true proportion is near 0 or 1. In addition, normality is commonly assumed for the random effects in area level and unit level mixed models. However, this assumption maybe violated for some cases. To address those issues, in this dissertation, we first explore some alternatives to the well-known Fay-Herriot area level model. The aim is to consider models that are appropriate for survey-weighted proportions and can capture different sources of uncertainty, including the uncertainty that arises from the estimation of the sampling variances of the design-based estimators. Then we develop an adaptive HB method for SAE using data from a simple stratified design. The main goal is to relax the usual normality assumption for the random effects and instead determine the distribution of the random effects adaptively from the survey data. The Jiang-Lahiri type frequentist's alternative to the hierarchical Bayesian methods is also developed. Finally we propose a generalized linear mixed model that is suitable for binary data collected from a two-stage sampling design.
  • Thumbnail Image
    Item
    The Bayesian and Approximate Bayesian Methods in Small Area Estimation
    (2008-11-20) Pramanik, Santanu; Lahiri, Partha; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    For small area estimation, model based methods are preferred to the traditional design based methods because of their ability to borrow strength from related sources. The indirect estimates, obtained using mixed models, are usually more reliable than the direct survey estimates. To draw inferences from mixed models, one can use Bayesian or frequentist approach. We consider the Bayesian approach in this dissertation. The Bayesian approach is straightforward. The prior and likelihood produce the posterior, which is used for all inferential purposes. It overcomes some of the shortcomings of the empirical Bayes approach. For example, the posterior variance automatically captures all sources of uncertainties in estimating small area parameters. But this approach requires the specification of a subjective prior on the model parameters. Moreover, in almost all situation, the posterior moments involve multi-dimensional integration and consequently closed form expressions cannot be obtained. To overcome the computational difficulties one needs to apply computer intensive MCMC methods. We apply linear mixed normal models (area level and unit level) to draw inferences for small areas when the variable of interest is continuous. We propose and evaluate a new prior distribution for the variance component. We use Laplace approximation to obtain accurate approximations to the posterior moments. The approximations present the Bayesian methodology in a transparent way, which facilitates the interpretation of the methodology to the data users. Our simulation study shows that the proposed prior yields good frequentist properties for the Bayes estimators relative to some other popular choices. This frequentist validation brings in an objective flavor to the so-called subjective Bayesian approach. The linear mixed models are, usually, not suitable for handling binary or count data, which are often encountered in surveys. To estimate the small area proportions, we propose a binomial-beta hierarchical model. Our formulation allows a regression specification and hence extends the usual exchangeable assumption at the second level. We carefully choose a prior for the shape parameter of the beta density. This new prior helps to avoid the extreme skewness present in the posterior distribution of the model parameters so that the Laplace approximation performs well.
  • Thumbnail Image
    Item
    Sampling Weight Calibration with Estimated Control Totals
    (2008-11-12) Dever, Jill A; Valliant, Richard; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Sample weight calibration, also referred to as calibration estimation, is a widely applied technique in the analysis of survey data. This method borrows strength from a set of auxiliary variables and can produce weighted estimates with smaller mean square errors than those estimators that do not use the calibration adjustments. Poststratification is a well-known calibration method that forces weighted counts within cells generated by cross-classifying the categorical (or categorized) auxiliary variables to equal the corresponding population control totals. Several assumptions are critical to the theory developed to date for weight calibration. Two assumptions relevant to this research include: (i) the control totals calculated from the population of interest and known without (sampling) error; and (ii) the sample units selected for the survey are taken from a sampling frame that completely covers the population of interest (e.g., no problems with frame undercoverage). With a few exceptions, research to date generally is conducted as if these assumptions hold, or that any violation does not affect estimation. Our research directly examines the violation of the two assumptions by evaluating the theoretical and empirical properties of the mean square error for a set of calibration estimators, newly labeled as estimated-control (EC) calibration estimators. Specifically, this dissertation addresses the use of control totals estimated from a relatively small survey to calibrate sample weights for an independent survey suffering from undercoverage and sampling errors. The EC calibration estimators under review in the current work include estimated totals and ratios of two totals, both across all and within certain domains. The ultimate goal of this research is to provide survey statisticians with a sample variance estimator that accounts for the violated assumptions, and has good theoretical and empirical properties.
  • Thumbnail Image
    Item
    Regression Diagnostics for Complex Survey Data: Identification of Influential Observations
    (2007-09-13) Li, Jianzhu; Valliant, Richard; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Discussion of diagnostics for linear regression models have become indispensable chapters or sections in most of the statistical textbooks. However, survey literature has not given much attention to this problem. Examples from real surveys show that sometimes the inclusion and exclusion of a small number of the sampled units can greatly change the regression parameter estimates, which indicates that techniques of identifying the influential units are necessary. The goal of this research is to extend and adapt the conventional ordinary least squares influence diagnostics to complex survey data, and determine how they should be justified. We assume that an analyst is looking for a linear regression model that fits reasonably well for the bulk of the finite population and chooses to use the survey weighted regression estimator. Diagnostic statistics such as DFBETAS, DFFITS, and modified Cook's Distance are constructed to evaluate the effect on the regression coefficients of deleting a single observation. As components of the diagnostic statistics, the estimated variances of the coefficients are obtained from design-consistent estimators which account for complex design features, e.g. clustering and stratification. For survey data, sample weights, which are computed with the primary goal of estimating finite population statistics, are sources of influence besides the response variable and the predictor variables, and therefore need to be incorporated into influence measurement. The forward search method is also adapted to identify influential observations as a group when there is possible masked effect among the outlying observations. Two case studies and simulations are done in this dissertation to test the performance of the adapted diagnostic statistics. We reach the conclusion that removing the identified influential observations from the model fitting can obtain less biased estimated coefficients. The standard errors of the coefficients may be underestimated since the variation in the number of observations used in the regressions was not accounted for.
  • Thumbnail Image
    Item
    The Relationship Between Response Propensity and Data Quality in the Current Population Survey and the American Time Use Survey
    (2007-04-26) Fricker, Scott; Tourangeau, Roger; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    An important theoretical question in survey research over the past fifty years has been: How does bringing in late or reluctant respondents affect total survey error? Does the effort and expense of obtaining interviews from difficult to contact or reluctant respondents significantly decrease the nonresponse error of survey estimates? Or do these late respondents introduce enough measurement error to offset any reductions in nonresponse bias? This dissertation attempted to address these questions by examining nonresponse and data quality in two national household surveys--the Current Population Survey (CPS) and the American Time Use Survey (ATUS). Response propensity models were first developed for each survey, and busyness and social capital explanations of nonresponse were evaluated in light of the results. Using respondents' predicted probability of response, simulations were carried out to examine whether nonresponse bias was linked to response rates. Next, data quality in each survey was assessed by a variety of indirect indicators of response error--e.g., item missing data rates, round value reports, interview-reinterview response inconsistencies, etc.--and the causal roles of various household, respondent, and survey design attributes on the level of reporting error were explored. The principal analyses investigated the relationship between response propensity and the data quality indicators in each survey, and examined the effects of potential common causal factors when there was evidence of covariation. The implications of the findings from this study for survey practitioners and for nonresponse and measurement error studies are discussed.
  • Thumbnail Image
    Item
    ANALYSIS OF COMPLEX SURVEY DATA USING ROBUST MODEL-BASED AND MODEL-ASSISTED METHODS
    (2006-09-19) Li, Yan; Lahiri, Partha; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Over the past few decades, major advances have taken place in both model-based and model-assisted approaches to inferences in finite population sampling. In the standard model-based approach, the finite population is assumed to be a realization from a superpopulation characterized by a probability distribution, and that the distribution of the sample is identical to that of the finite population. The model-based method could lead to a misleading inference if either assumption is violated. The model-assisted estimators typically are consistent or at least approximately unbiased with respect to the sampling design, and yet more efficient than the customary randomization-based estimators in the sense of achieving smaller variance with respect to the design if the assumed model is appropriate. Since both approaches rely on the assumed model, there is a need to achieve robustness with respect to the model selection. This is precisely the main theme of this dissertation. This study uses the well-known Box-Cox transformation on the dependent variable to generate certain robust model-based and model-assisted estimators of finite population totals. The robustness is achieved since the appropriate transformation on the dependent variable is determined by the data. Both Monte Carlo simulation study and real data analyses are conducted to illustrate the robustness properties of the proposed estimation method using two different ways: (i) design-based, and (ii) model-based, wherever appropriate. A few potential areas of future research within the context of transformations in linear regression models, as well as linear mixed models, for analysis of complex survey data are identified.
  • Thumbnail Image
    Item
    GRICEAN EFFECTS IN SELF-ADMINSTERED SURVEYS
    (2005-10-31) Yan, Ting; Tourangeau, Roger; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Despite the best efforts of questionnaire designers, survey respondents don't always interpret questions as the question writers intended. Researchers have used Grice's conversational maxims to explain some of these discrepancies. This dissertation extends this work by reviewing studies on the use of Grice's maxims by survey respondents and describing six new experiments that looked for direct evidence that respondents apply Grice's maxims. The strongest evidence for respondents' use of the maxims came from an experiment that varied the numerical labels on a rating scale; the mean shift in responses to the right side of the rating scale induced by negative numerical labels was robust across items and fonts. Process measures indicated that respondents applied the maxim of relation in interpreting the questions. Other evidence supported use of the maxim of quantity -- as predicted, correlations between two highly similar items were lower when they were asked together. Reversing the wording of one of the items didn't prevent respondents from applying the maxim of quantity. Evidence was weaker for the application of Grice's maxim of manner; respondents still seemed to use definitions (as was apparent from the reduced variation in their answers), even though the definitions were designed to be uninformative. That direct questions without filters induced significantly more responses on the upper end of the scale -- presumably because of the presuppositions direct questions carried -- supported respondents' application of the maxim of quality. There was little support for respondents' use of the maxim of relation from an experiment on the physical layout of survey questions; the three different layouts didn't influence how respondents perceived the relation among items. These results provided some evidence that both survey "satisficers" and survey "optimizers" may draw automatic inferences based on Gricean maxims, but that only "optimizers" will carry out the more controlled processes requiring extra effort. Practical implications for survey practice include the need for continued attention to secondary features of survey questions in addition to traditional questionnaire development issues. Additional experiments that incorporate other techniques such as eye tracking or cognitive interviews may help to uncover other subtle mechanisms affecting survey responses.
  • Thumbnail Image
    Item
    STATISTICAL ESTIMATION METHODS IN VOLUNTEER PANEL WEB SURVEYS
    (2004-11-17) Lee, Sunghee; Valliant, Richard; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Data collected through Web surveys, in general, do not adopt traditional probability-based sample designs. Therefore, the inferential techniques used for probability samples may not be guaranteed to be correct for Web surveys without adjustment, and estimates from these surveys are likely to be biased. However, research on the statistical aspect of Web surveys is lacking relative to other aspects of Web surveys. Propensity score adjustment (PSA) has been suggested as an alternative for statistically surmounting inherent problems, namely nonrandomized sample selection, in volunteer Web surveys. However, there has been a minimal amount of evidence for its applicability and performance, and the implications are not conclusive. Moreover, PSA does not take into account problems occurring from uncertain coverage of sampling frames in volunteer panel Web surveys. This study attempted to develop alternative statistical estimation methods for volunteer Web surveys and evaluate their effectiveness in adjusting biases arising from nonrandomized selection and unequal coverage in volunteer Web surveys. Specifically, the proposed adjustment used a two-step approach. First, PSA was utilized as a method to correct for nonrandomized sample selection, and secondly calibration adjustment was used for uncertain coverage of the sampling frames. The investigation found that the proposed estimation methods showed a potential for reducing the selection and coverage bias in estimates from volunteer panel Web surveys. The combined two-step adjustment not only reduced bias but also mean square errors to a greater degree than each individual adjustment. While the findings from this study may shed some light on Web survey data utilization, there are additional areas to be considered and explored. First, the proposed adjustment decreased bias but did not completely remove it. The adjusted estimates showed a larger variability than the unadjusted ones. The adjusted estimator was no longer in the linear form, but an appropriate variance estimator has not been developed yet. Finally, naively applying the variance estimator for linear statistics highly overestimated the variance, resulting in understating the efficiency of the survey estimates.
  • Thumbnail Image
    Item
    PANEL SURVEY ESTIMATION IN THE PRESENCE OF LATE REPORTING AND NONRESPONSE
    (2004-08-06) Copeland, Kennon R; Lahiri, Partha; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Estimates from economic panel surveys are generally required to be published soon after the survey reference period, resulting in missing data due to late reporting as well as nonresponse. Estimators currently in use make some attempt to correct for the impact of missing data. However, these approaches tend to simplify the assumed nature of the missing data and often ignore a portion of the reported data for the reference period. Discrepancies between preliminary and revised estimates highlight the inability of the estimation methodology to correct for all error due to late reporting. The current model for one economic panel survey, the Current Employment Statistics survey, is examined to identify factors related to potential model misspecification error, leading to identification of an extended model. An approach is developed to utilize all reported data from the current and prior reference periods, through missing data imputation. Two alternatives to the current models that assume growth rates are related to recent reported data and reporting patterns are developed, one a simple proportional model, the other a hierarchical fixed effects model. Estimation under the models is carried out and performance compared to that of the current estimator through use of historical data from the survey. Results, although not statistically significant, suggest the potential associated with use of reported data from recent time periods in the working model, especially for smaller establishments. A logistic model for predicting likelihood of late reporting for sample units that did not report for preliminary estimates is also developed. The model uses a combination of operational, respondent, and environmental factors identified from a reporting pattern profile. Predicted conditional late reporting rates obtained under the model are compared to actual rates through use of historical information for the survey. Results indicate the appropriateness of the parameters chosen and general ability of the model to predict final reporting status. Such a model has the potential to provide information to survey managers for addressing late reporting and nonresponse.