Joint Program in Survey Methodology

Permanent URI for this communityhttp://hdl.handle.net/1903/2251

Browse

Search Results

Now showing 1 - 10 of 42
  • Thumbnail Image
    Item
    Nonparticipation Issues Related to Passive Data Collection
    (2024) Breslin, Alexandra Marie Brown; Presser, Stanley; Antoun, Chris; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    New passive data collection techniques on smartphones allow for the direct observation of a participant’s behavior and environment in place of self-reported information. However, such studies are not appealing to all people, especially those with higher security concerns. The current work explores the mechanisms that impact a sample member’s decision to participate in a passive data collection using three different online panels. The first study explores nonparticipation bias in a financial tracking study and finds evidence of bias in the self-reported measures of financial behaviors, and that prior experience with the research organization positively impacts a sample member’s decision to participate. Studies two and three employ deception studies (i.e., the passive data collections were presented as real rather than hypothetical, but no data was passively collected) in which respondents received experimentally varied invitations to participate in a smartphone-based passive data collection. The second study varies the type of data requested and the study topic to understand better how these study components interact. The findings suggest that the type of data requested impacts participation while the study topic does not. The second study utilized video messages presented to all sample members who chose not to participate. These videos asked the sample member to reconsider, varying whether or not they reiterated the data and security measures of the study from the initial invitation. The results suggest that offering a follow-up video increased participation. Finally, the third study experimentally varied the level of control the sample member would have over what data is shared with researchers during a passive data collection. The findings suggest that an offer of control may not increase participation in app-based passive data collection. The three studies suggest that sample members are more likely to participate in a survey when they have prior experience with such a request and may be converted to participate with a video message, but that the type of data requested greatly impacts the decision to participate. Future work should include replicating these studies with different requested data types and shifting to samples not drawn from online panels.
  • Thumbnail Image
    Item
    Optimizing stratified sampling allocations to account for heteroscedasticity and nonresponse
    (2023) Mendelson, Jonathan; Elliott, Michael R; Lahiri, Partha; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Neyman's seminal paper in 1934 and subsequent developments of the next two decades transformed the practice of survey sampling and continue to provide the underpinnings of today's probability samples, including at the design stage. Although hugely useful, the assumptions underlying classic theory on optimal allocation, such as complete response and exact knowledge of strata variances, are not always met, nor is the design-based approach the only way to identify good sample allocations. This thesis develops new ways to allocate samples for stratified random sampling (STSRS) designs. In Papers 1 and 2, I provide a Bayesian approach for optimal STSRS allocation for estimating the finite population mean via a univariate regression model with heteroscedastic errors. I use Bayesian decision theory on optimal experimental design, which accommodates uncertainty in design parameters. By allowing for heteroscedasticity, I aim for improved realism in some establishment contexts, compared with some earlier Bayesian sample design work. Paper 1 assumes that the level of heteroscedasticity is known, which facilitates analytical results. Paper 2 relaxes this assumption, which results in an analytically intractable problem. Thus, I develop a computational approach that uses Monte Carlo sampling to estimate the loss for a given allocation, in conjunction with a stochastic optimization algorithm that accommodates noisy loss functions. In simulation, the proposed approaches performed as well or better than the design-based and model-assisted strategies considered, while having clearer theoretical justification. Paper 3 changes focus toward addressing how to account for nonresponse when designing samples. Existing theory on optimal STSRS allocation generally assumes complete response. A common practice is to allocate sample under complete response, then to inflate the sample sizes by the inverse of the anticipated response rates. I show that this practice overcorrects for nonresponse, leading to excessive costs per effective interview. I extend the existing design-based framework for STSRS allocation to accommodate scenarios with incomplete response. I provide theoretical comparisons between my allocation and common alternatives, which illustrate how response rates, population characteristics, and cost structure can affect the methods' relative efficiency. In an application to a self-administered survey of military personnel, the proposed allocation resulted in a 25% increase in effective sample size compared with common alternatives.
  • Thumbnail Image
    Item
    Big Data and Official Statistics
    (Wiley, 2022-10-02) Abraham, Katharine G.
    The infrastructure and methods for developed countries' economic statistics, largely established in the mid-20th century, rest almost entirely on survey and administrative data. The increasing difficulty of obtaining survey responses threatens the sustainability of this model. Meanwhile, users of economic data are demanding ever more timely and granular information. “Big data” originally created for other purposes offer the promise of new approaches to the compilation of economic data. Drawing primarily on the U.S. experience, the paper considers the challenges to incorporating big data into the ongoing production of official economic statistics and provides examples of progress towards that goal to date. Beyond their value for the routine production of a standard set of official statistics, new sources of data create opportunities to respond more nimbly to emerging needs for information. The concluding section of the paper argues that national statistical offices should expand their mission to seize these opportunities.
  • Thumbnail Image
    Item
    Global trends and predictors of face mask usage during the COVID-19 pandemic
    (Springer Nature, 2021-11-15) Badillo-Goicoechea, Elena; Chang, Ting-Hsuan; Kim, Esther; LaRocca, Sarah; Morris, Katherine; Deng, Xiaoyi; Chiu, Samantha; Bradford, Adrianne; Garcia, Andres; Kern, Christoph; Cobb, Curtiss; Kreuter, Frauke; Stuart, Elizabeth A.
    Guidelines and recommendations from public health authorities related to face masks have been essential in containing the COVID-19 pandemic. We assessed the prevalence and correlates of mask usage during the pandemic. We examined a total of 13,723,810 responses to a daily cross-sectional online survey in 38 countries of people who completed from April 23, 2020 to October 31, 2020 and reported having been in public at least once during the last 7 days. The outcome was individual face mask usage in public settings, and the predictors were country fixed effects, country-level mask policy stringency, calendar time, individual sociodemographic factors, and health prevention behaviors. Associations were modeled using survey-weighted multivariable logistic regression. Mask-wearing varied over time and across the 38 countries. While some countries consistently showed high prevalence throughout, in other countries mask usage increased gradually, and a few other countries remained at low prevalence. Controlling for time and country fixed effects, sociodemographic factors (older age, female gender, education, urbanicity) and stricter mask-related policies were significantly associated with higher mask usage in public settings. Crucially, social behaviors considered risky in the context of the pandemic (going out to large events, restaurants, shopping centers, and socializing outside of the household) were associated with lower mask use. The decision to wear a face mask in public settings is significantly associated with sociodemographic factors, risky social behaviors, and mask policies. This has important implications for health prevention policies and messaging, including the potential need for more targeted policy and messaging design.
  • Thumbnail Image
    Item
    BAYESIAN METHODS FOR PREDICTION OF SURVEY DATA COLLECTION PARAMETERS IN ADAPTIVE AND RESPONSIVE DESIGNS
    (2020) Coffey, Stephanie Michelle; Elliott, Michael R; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Adaptive and responsive survey designs rely on estimates of survey data collection parameters (SDCPs), such as response propensity, to make intervention decisions during data collection. These interventions are made with some data collection goal in mind, such as maximizing data quality for a fixed cost or minimizing costs for a fixed measure of data quality. Data quality may be defined by response rate, sample representativeness, or error in survey estimates. Therefore, the predictions of SDCPs are extremely important. Predictions within a data collection period are most commonly generated using fixed information about sample cases, and accumulating paradata and survey response data. Interventions occur during the data collection period, however, meaning they are applied based on predictions from incomplete accumulating data. There is evidence that the incomplete accumulating data can lead to biased and unstable predictions, particularly early in data collection. This dissertation explores the use of Bayesian methods to improve predictions of SDCPs during data collection, by providing a mathematical framework for combining priors, based on external data about covariates in the prediction models, with the current accumulating data to generate posterior predictions of SDCPs for use in intervention decisions.This dissertation includes three self-contained papers, each focused on the use of Bayesian methods to improve predictions of SDCPs for use in adaptive and responsive survey designs. The first paper predicts time to first contact, where priors are generated from historical survey data. The second paper implements expert elicitation, a method for prior construction when historical data is not available. The last paper describes a data collection experiment conducted using a Bayesian framework, which attempts to minimize data collection costs without reducing the quality of a key survey estimate. In all three papers, the use of Bayesian methods introduces modest improvements in the predictions of SDCPs, especially early in data collection, when interventions would have the largest effect on survey outcomes. Additionally, the experiment in the last paper resulted in significant data collection cost savings without having a significant effect on a key survey estimate. This work suggests that Bayesian methods can improve predictions of SDCPs that are critical for adaptive and responsive data collection interventions.
  • Thumbnail Image
    Item
    Design and Effectiveness of Multimodal Definitions in Online Surveys
    (2020) Spiegelman, Maura; Conrad, Frederick G; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    If survey respondents do not interpret a question as it was intended, they may, in effect, answer the wrong question, increasing the chances of inaccurate data. Researchers can bring respondents’ interpretations into alignment with what is intended by defining the terms that respondents might misunderstand. This dissertation explores strategies to increase response alignment with definitions in online surveys. In particular, I compare the impact of unimodal (either spoken or textual) to multimodal (both spoken and textual) definitions on question interpretation and, indirectly, response quality. These definitions can be further categorized as conventional or optimized for the mode in which they are presented (for textual definitions, fewer words than in conventional definitions with key information made visually salient and easier for respondents to grasp; for spoken definitions, a shorter, more colloquial style of speaking). The effectiveness of conventional and optimized definitions are compared, as well as the effectiveness of unimodal and multimodal definitions. Amazon MTurk workers were randomly assigned to one of six definition conditions in a 2x3 design: conventional or optimized definitions, presented in a spoken, textual, or multimodal (both spoken and textual) format. While responses for unimodal optimized and conventional definitions were similar, multimodal definitions, and particularly multimodal optimized definitions, resulted in responses with greater alignment with definitions. Although complementary information presented in different modes can increase comprehension and lead to increased data quality, redundant or otherwise untailored multimodal information may not have the same positive effects. Even as not all respondents complied with instructions to read and/or listen to definitions, the compliance rates and effectiveness of multimodal presentation were sufficiently high to show improvements in data quality, and the effectiveness of multimodal definitions increased when only compliant observations were considered. Multimodal communication in a typically visual medium (such as web surveys) may increase the amount of time needed to complete a questionnaire, but respondents did not consider their use to be burdensome or otherwise unsatisfactory. While further techniques could be used to help increase respondent compliance with instructions, this study suggests that multimodal definitions, when thoughtfully designed, can improve data quality without negatively impacting respondents.
  • Thumbnail Image
    Item
    Improving External Validity of Epidemiologic Analyses by Incorporating Data from Population-Based Surveys
    (2020) Wang, Lingxiao; Li, Yan; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Many epidemiologic studies forgo probability sampling and turn to volunteer-based samples because of cost, confidentiality, response burden, and invasiveness of biological samples. However, the volunteers may not represent the underlying target population mainly due to self-selection bias. Therefore, standard epidemiologic analyses may not be generalizable to the target population, which is called lack of “external validity.” In survey research, propensity score (PS)-based approaches have been developed to improve representativeness of the nonprobability samples by using population-based surveys as references. These approaches create a set of “pseudo-weights” to weight the nonprobability sample up to the target population. There are two main types of PS-based approaches: (1) PS-based weighting methods using PSs to estimate participation rates of the nonprobability sample; for example, the inverse of PS weighting (IPSW); (2) PS-based matching methods using PSs to measure similarity between the units in the nonprobability sample and the reference survey sample, such as PS adjustment by subclassification (PSAS). Although the PS-based weighting methods reduce the bias, they are sensitive to propensity model misspecification and can be inefficient. The PS-based matching methods are more robust to the propensity model misspecification and can avoid extreme weights. However, matching methods such as PSAS are less effective at bias reduction. This dissertation proposes a novel PS-based matching method, named the kernel weighting (KW) approach, to improve the external validity of epidemiologic analyses that gain a better bias–variance tradeoff. A unifying framework is established for PS-based methods to provide three advances. First, the KW method is proved to provide consistent estimates, yet generally has smaller mean-square error than the IPSW. Second, the framework reveals a fundamental strong exchangeability assumption (SEA) underlying the existing PS-based matching methods that has previously been unknown. The SEA is relaxed to a weak exchangeability assumption that is more realistic for data analysis. Third, survey weights are scaled in propensity estimation to reduce the variance of the estimated PS and improve efficiency of all PS-based methods under the framework. The performance of the proposed PS-based methods is evaluated for estimating prevalence of diseases and associations between risk factors and disease in the finite population.
  • Thumbnail Image
    Item
    The Use of Email in Establishment Surveys
    (2019) Langeland, Joshua Lee; Abraham, Katharine; Wagner, James; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation evaluates the effectiveness of using Email for survey solicitation, nonresponse follow-up, and notifications for upcoming scheduled interviews in an establishment survey setting. Reasons for interest in the use of Email include the possibility that it could reduce printing and postage expenses, speed responses and encourage online reporting. To date, however, there has been limited research on the extent to which these benefits can in fact be realized in an establishment survey context. In order to send an Email for survey purposes, those administering a survey must have Email addresses for the units in the sample. One method for collecting Email addresses is to send a prenotification letter to sampled businesses prior to the initial survey invitation, informing respondents about the upcoming survey and requesting they provide contact information for someone within the organization who will have knowledge of the survey topic. Relatively little is known, however, about what makes a prenotification letter more or less effective. The first experiment on which this dissertation reports varies the content of prenotification letters sent to establishments selected for participation in a business survey in order to identify how different features affect the probability of obtaining a respondent's Email address. In this experiment, neither survey sponsorship, appeal type, nor a message about saving taxpayer dollars had a significant impact on response. The second experiment is a pilot study designed to compare the results of sending an initial Email invitation to participate in an establishment survey to the results of sending a standard postal invitation. Sampled businesses that provided an Email address were randomized into two groups. Half of the units in the experiment received the initial survey invitation by Email and the other half received the standard survey materials through postal mail; all units received the same nonresponse follow-up treatments. The analysis of this experiment focuses on response rates, timeliness of response, mode of response and cost per response. In this production environment, Email invitations achieved an equivalent response rate at reduced cost per response. Units receiving the Email invitation were more likely to report online, but it took them longer on average to respond. The third experiment built on the second and was an investigation into nonresponse follow-up procedures. In the second experiment, at the point when the cohort that received the initial survey invitation by Email received their first nonresponse follow-up, there was a large increase in response. The third experiment tests whether this large increase in response can be achieved by sending a follow-up Email instead of a postal reminder. Sampled units that provided an Email address were randomized into three groups. All units received the initial survey invitation by Email and all units also received nonresponse follow-ups by Email. The treatments varied in the point in the nonresponse follow-up period at which the Emails were augmented with a postal mailing. The analysis focuses on how this timing affects response rates and mode of response. The sequence that introduced postal mail early in nonresponse follow-up achieved the highest final response rate. All mode sequences were successful in encouraging online data reporting. The fourth and final experiment studies the use of Email in a monthly business panel survey conducted through Computer Assisted Telephone Interviewing (CATI). After the first month in which an interviewer in this survey collects data from a business, she schedules a date to call and collect data the following month. The current procedure is to send a postcard to the business a few days prior to the scheduled appointment to serve as a reminder of the upcoming interview. The fourth experiment investigates the effects of replacing this reminder postcard with an Email. Businesses in a sample that included both businesses for which the survey organization had an Email address and businesses for which no Email address was available were randomized into three groups. The first group acts as the control and received the standard postcard; the second group was designated to receive an Email reminder, provided an Email address was available, instead of the postcard; and the third group received an Email reminder with an iCalendar attachment instead of the postcard, again provided an Email address was available. Results focus on response rates, call length, percent of units reporting on time, and number of calls to respondents. The experiment found that the use of Email as a reminder for a scheduled interview significantly increased response rates and decreased the effort required to collect data.
  • Thumbnail Image
    Item
    A Unifying Parametric Framework for Estimating Finite Population Totals from Complex Samples
    (2019) Flores Cervantes, Ismael; Brick, J. Michael; Kreuter, Frauke; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    We propose a unifying framework for improving the efficiency of design-based estimators of finite population characteristics in the presence of full response. We call the framework a Parametric (PA) approach. The PA framework, an extension of the model-assisted theory, uses an algorithmic approach driven by the observed data. The algorithm identifies the relevant subset of auxiliary variables related to the outcome, and the known population totals of these variables are used to compute the PA estimator. We apply the PA framework to three important estimation problems: the identification of the functional form of a design-based estimator based on the observed data; the identification working or assisting model; and the development of the methodology for creating new design-based estimators. The PA estimators are theoretically justified and evaluated by simulations. This dissertation is limited to single-stage sample designs with full response, but the framework can be extended to other sample designs and for estimation with nonresponse.
  • Thumbnail Image
    Item
    Selection Bias in Nonprobability Surveys: A Causal Inference Approach
    (2018) Mercer, Andrew William; Kreuter, Frauke; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Many in the survey research community have expressed concern at the growing popularity of nonprobability surveys. The absence of random selection prompts justified concerns about self-selection producing biased results and means that traditional, design-based estimation is inappropriate. The Total Survey Error (TSE) paradigm’s designations of selection bias as attributable to undercoverage or nonresponse are not especially helpful for nonprobability surveys as they are based on an implicit assumption that selection and inferences rely on randomization. This dissertation proposes an alternative classification for sources of selection bias for nonprobability surveys based on principles borrowed from the field of causal inference. The proposed typology describes selection bias in terms of the three conditions that are required for a statistical model to correct or explain systematic differences between a realized sample and the target population: exchangeability, positivity, and composition. We describe the parallels between causal and survey inference and explain how these three sources of bias operate in both probability and nonprobability survey samples. We then provide a critical review of current practices in nonprobability data collection and estimation viewed through the lens of the causal bias framework. Next, we show how net selection bias can be decomposed into separate additive components associated with exchangeability, positivity, and composition respectively. Using 10 parallel nonprobability surveys from different sources, we estimate these components for six measures of civic engagement using the 2013 Current Population Survey Civic Engagement Supplement as a reference sample. We find that a large majority of the bias can be attributed to a lack of exchangeability. Finally, using the same six measures of civic engagement, we compare the performance of four approaches to nonprobability estimation based on Bayesian additive regression trees. These are propensity weighting (PW), outcome regression (OR), and two types of doubly-robust estimators: outcome regression with a residual bias correction (OR-RBC) and outcome regression with a propensity score covariate (OR-PSC). We find that OR-RBC tends to have the lowest bias, variance, and RMSE, with PW only slightly worse on all three measures.