Human Development & Quantitative Methodology Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2779

Browse

Search Results

Now showing 1 - 10 of 34
  • Item
    Multivariate Multilevel Value-Added Modeling: Constructing a Teacher Effectiveness Composite
    (2019) Lissitz, Anna; Stapleton, Laura; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This simulation study presents a justification for evaluating teacher effectiveness with a multivariate multilevel model. It was hypothesized that the multivariate model leads to more precise effectiveness estimates when compared to separate univariate multilevel models. Then, this study investigated combining the multiple effectiveness estimates that are produced by the multivariate multilevel model and produced by separate univariate multilevel models. Given that the models could produce significantly different effectiveness estimates, it was hypothesized that the composites formed from the results of the multivariate multilevel model differ from the composites formed from the results of the separate univariate models in terms of bias. The correlations between the composites from the different models were very high, providing no evidence that the model choice was impactful. Also, the differences in bias and fit were slight. While the findings do not really support a claim for the use of the more complex multivariate model over the univariate models, the increased theoretical validity from adding outcomes to the VAM does.
  • Item
    A Latent Factor Approach for Social Network Analysis
    (2019) Zheng, Qiwen; Sweet, Tracy M.; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Social network data consist of entities and the relation of information between pairs of entities. Observations in a social network are dyadic and interdependent. Therefore, making appropriate statistical inferences from a network requires specifications of dependencies in a model. Previous studies suggested that latent factor models (LFMs) for social network data can account for stochastic equivalence and transitivity simultaneously, which are the two primary dependency patterns that are observed social network data in real-world social networks. One particular LFM, the additive and multiplicative effects network model (AME) accounts for the heterogeneity of second-order dependencies at the actor level. However, all current latent variable models have not considered the heterogeneity of third-order dependencies, actor-level transitivity for example. Failure to model third-order dependency heterogeneity may result in worse fits to local network structures, which in turn may result in biased parameter inferences and may negatively influence the goodness-of-fit and prediction performance of a model. Motivated by such a gap in the literature, this dissertation proposes to incorporate a correlation structure between the sender and receiver latent factors in the AME to account for the distribution of actor-level transitivity. The proposed model is compared with the existing AME in both simulation studies real-world data. Models are evaluated via multiple goodness-of-fit techniques, including mean squared error, parameter coverage rate, information criteria, receiver-operation curve (ROC) based on K-fold cross-validation or full data, and posterior predictive checking. This work may also contribute to the literature of goodness-of-fit methods to network models, which is an area that has not been unified. Both the simulation studies and real-world data analyses showed that adding the correlation structure provides a better fit as well as higher prediction accuracy to network data. The proposed method has equal or similar performance to the AME when the underlying correlation is zero, with regard to mean-squared error of probability of ties and widely applicable information criteria. The present study did not find any significant impact of the correlation term on the node-level covariate’s coefficient estimation. Future studies include investigating more types of covariates, subgroup related covariate effects is an example.
  • Item
    Handling of Missing Data with Growth Mixture Models
    (2019) Lee, Daniel Yangsup; Harring, Jeffrey R; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The recent growth of applications of growth mixture models for inference with longitudinal data has introduced a wide range of research dedicated to testing the different aspects of the model. One area of research that has not drawn much attention, however, is the performance of growth mixture models with missing data and when using the various methods for dealing with them. Missing data are usually an inconvenience that must be addressed in any data analysis scenario, and the use of growth mixture models is no less an exception to this. While the literature on various other aspects of growth mixture models has grown, not much research has been conducted on the consequences of mishandling missing data. Although the literature on missing data has generally accepted the use of modern missing data handling techniques, these techniques are not free of problems nor have they been comprehensively tested in the context of growth mixture models. The purpose of this dissertation is to incorporate the various missing data handling techniques on growth mixture models and, by using Monte Carlo simulation techniques, to provide guidance on specific conditions in which certain missing data handling methods will produce accurate and precise parameter estimates typically compromised when using simple, ad hoc, missing data handling approaches, or incorrect techniques.
  • Item
    The Performance of Balance Diagnostics for Propensity-Score Matched Samples in Multilevel Settings
    (2019) Burnett, Alyson; Stapleton, Laura M; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The purpose of the study was to assess and demonstrate the use of covariate balance diagnostics for samples matched with propensity scores in multilevel settings. A Monte Carlo simulation was conducted that assessed the ability of different balance measures to identify the correctly specified propensity score model and predict bias in treatment effect estimates. The balance diagnostics included absolute standardized bias (ASB) and variance ratios calculated across the pooled sample (pooled balance measures) as well as the same balance measures calculated separately for each cluster and then summarized across the sample (within-cluster balance measures). The results indicated that overall across conditions, the pooled ASB was most effective for predicting treatment effect bias but the within-cluster ASB (summarized as a median across clusters) was most effective for identifying the correctly specified model. However, many of the within-cluster balance measures were not feasible with small cluster sizes. Empirical illustrations from two distinct datasets demonstrated the different approaches to modeling, matching, and assessing balance in a multilevel setting depending on the cluster size. The dissertation concludes with a discussion of limitations, implications, and topics for further research.
  • Item
    A FRAMEWORK FOR THE PRE-CALIBRATION OF AUTOMATICALLY GENERATED ITEMS
    (2018) Sweet, Shauna Jayne; Hancock, Gregory R; Harring, Jeffrey R; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This paper presents a new conceptual framework and corresponding psychometric model designed for the pre-calibration of automatically generated items. This model utilizes a multi-level framework and a combination of crossed fixed and random effects to capture key components of the generative process, and is intended to be broadly applicable across research efforts and contexts. Unique among models proposed within the AIG literature, this model incorporates specific mean and variance parameters to support the direct assessment of the quality of the item generation process. The utility of this framework is demonstrated through an empirical analysis of response data collected from the online administration of automatically generated items intended to assess young students’ mathematics fluency. Limitations in the application of the proposed framework are explored through targeted simulation studies, and future directions for research are discussed.
  • Item
    ESTIMATING THE LONGITUDINAL COMPLIER AVERAGE CAUSAL EFFECT USING THE LATENT GROWTH MODEL: A SIMULATION STUDY
    (2018) Liu, Huili; Hancock, Gregory; Stapleton, Laura; Human Development; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    When noncompliance happens to longitudinal experiments, the randomness for drawing causal inferences is contaminated. In such cases, the longitudinal Complier Average Causal Effect (CACE) is often estimated. The Latent Growth Model (LGM) is very useful in estimating longitudinal trajectories and can be easily adapted for estimating longitudinal CACE. Two popular CACE approaches, the Standard IV approach and the Mixture Model Based (MMB) approach, are both readily applicable to the LGM framework. The Standard IV approach is simple in modelling and has a low computational burden, but it is also criticized for ignoring distributions of subgroups and leading to biased estimations. The MMB approach is capable of not only estimating the CACE but also answering research questions regarding distributions of subpopulations, but this method may yield unstable results under unfavorable conditions, especially when the estimation model is complicated. Previous studies laid out a theoretical background for applying LGMs to longitudinal CACE estimation using both approaches. However, 1) very little was known regarding the factors that might influence the longitudinal CACE estimation, 2) the three compliance classes scenario was not thoroughly investigated, and 3) it was still unclear about how and to what extent the Standard IV approach would perform better or worse than the MMB approach in the longitudinal CACE estimation. The present study used an intensive simulation design to investigate the performance of the Standard IV and the MMB approaches while manipulating six factors that were related to most experimental designs: sample size, compliance composition, effect size, reliability of measurements, mean distances, and noncomplier-complier Level 2 covariance ratio. Their performance was evaluated on four criteria, estimation success rate, estimation bias, power, and type I error rate. With the analysis result, suggestions regarding experiment designs were provided for researchers and practitioners.
  • Item
    Modeling the Speed-Accuracy-Difficulty Interaction in Joint Modeling of Responses and Response Time
    (2018) Liao, Dandan; Jiao, Hong; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    With the rapid development of information technology, computer-based tests have become more and more popular in large-scale assessments. Among all the auxiliary data collected during the test-taking process, response times (RTs) seem to be one of the most important and commonly utilized sources of information. A commonly adopted assumption in joint modeling of RTs and item responses is that item responses and RTs are conditionally independent given a person’s speed and ability, and a person has constant speed and ability throughout the test (e.g., Thissen, 1983; van der Linden, 2007). However, researchers have been investigating more complex scenarios where the conditional independence assumption between item responses and RTs is likely to be violated in various ways (e.g., De Boeck, Chen, & Davison, 2017; Meng, Tao, & Chang, 2015; Ranger & Ortner, 2012b). Empirical evidence suggests that the direction of conditional dependence differs among items in a systematic way (Bolsinova, Tijmstra, & Molenaar, 2017). For difficult items, correct responses are associated with longer RTs; for easier items, however, correct responses are usually associated with shorter RTs (Bolsinova, De Boeck, & Tijmstra, 2017; Goldhammer, Naumann, & Greiff, 2015; Partchev & De Boeck, 2012). This phenomenon reflects a clear pattern that item difficulty affects the direction of conditional dependence between item responses and RTs. However, such an interaction has not been explicitly explored in jointly modeling of RT and response accuracy. In the present study, various approaches for joint modeling of RT and response accuracy are proposed to account for the conditional dependence between responses and RTs due to the interaction among speed, accuracy, and item difficulty. Three simulation studies are carried out to compare the proposed models with van der Linden’s (2007) hierarchical model that does not take into account the conditional dependence with respect to model fit and parameter recovery. The consequences of ignoring the conditional dependence between RT and item responses on parameter estimation is explored. Further, empirical data analyses are conducted to investigate the potential violations of the conditional independence assumption between item responses and RTs and obtain a more fundamental understanding of examinees’ test-taking behaviors.
  • Item
    An Evaluation of Clustering Algorithms for Modeling Game-Based Assessment Work Processes
    (2017) Fossey, William Austin; Stapleton, Laura; Sweet, Tracy; Human Development; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Game-based assessments (GBAs) use game design elements to make assessments more engaging for students and capture response data about work processes. GBA response data are often too complex to plan for every potential response pattern, so some researchers have turned to exploratory cluster analysis to classify students’ work processes. This paper identifies the design elements specific to GBAs and investigates how well k-means, self-organizing maps (SOM), and robust clustering using links (ROCK) clustering algorithms group response patterns in prototypical GBA response data. Results from a simulation study are discussed, and a tutorial is provided with recommendations of general considerations and best practices for analyzing GBA data with clustering algorithms.
  • Item
    A Proposed Index to Detect Relative Item Performance when the Focal Group Sample Size is Small
    (2017) Hansen, Kari; Stapleton, Laura M; Jiao, Hong; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    When developing educational assessments, ensuring that the test is fair to all groups of examinees is an essential part of the process. The primary statistical method for identifying potential bias in assessments is known as differential item functioning (DIF) analysis, where DIF refers to differences in performance on a specific test item between two groups assuming that the two groups have an overlap in their ability distribution. However, this requirement may be less likely to be feasible if the sample size for the focal group is small. A new index, relative item performance, is proposed to address the issue of small focal group sample sizes without the requirement of an overlap in ability distribution. This index is calculated by obtaining the effect size of the difference in item difficulty estimates between the two groups. A simulation study was conducted to compare the proposed method with the Mantel-Haenszel test with score group widths and the Differential Item Pair Functioning in terms of Type I error rates and power. The following factors were manipulated: the sample size of the focal group, the mean of the ability distribution, the amount of DIF, the number of items on the assessment, and the number of items that have different item difficulties. For all three methods, the main factors that affect the Type I error rates are the amount of item contamination, the size of the DIF, the ability mean for the focal group, and the item parameters. The sample size and the number of items were found not to have an effect on the Type I error rates for all methods. As the Type I error rate overall for the RI method is much lower than that of the MH1 and MH2 methods and not controlled across the simulation factors, power was only evaluated for the MH1 and MH2 methods. The median power of these methods were .203 and .181, respectively. It is recommended that the MH1 and MH2 methods be used only when the sample size is larger than 100 and in conjunction with expert and cognitive review of the items on the assessment.
  • Item
    Performance of Propensity Score Methods in the Presence of Heterogeneous Treatment Effects
    (2016) Stepien, Kathleen Maria; Stapleton, Laura M; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Estimating an average treatment effect assumes that individuals and groups are homogeneous in their responses to a treatment or intervention. However, treatment effects are often heterogeneous. Selecting the most effective treatment, generalizing causal effect estimates to a population, and identifying subgroups for which a treatment is effective or harmful are factors that motivate the study of heterogeneous treatment effects. In observational studies, treatment effects are often estimated using propensity score methods. This dissertation adds to the literature on the analysis of heterogeneous treatment effects using propensity score methods. Three propensity score methods were compared using Monte Carlo simulation: single propensity score with exact matching on subgroup, matching using group propensity scores, and multinomial propensity scores using generalized boosted modeling. Methods were evaluated under various group distributions, sample sizes, effect sizes, and selection models. An empirical analysis using data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K) is included to demonstrate the methods studied. Simulation results showed that estimating group propensity scores provided the smallest MSE, MNPS performance was comparable to GBM, and including the group indicator in the propensity score model improved treatment effect estimates regardless of whether group membership influenced selection. In addition, subclassification performed poorly when one group was more prevalent in the extremes of the propensity score distribution.