UMD Theses and Dissertations
Permanent URI for this collectionhttp://hdl.handle.net/1903/3
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a given thesis/dissertation in DRUM.
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
8 results
Search Results
Item A Mean-Parameterized Conway–Maxwell–Poisson Multilevel Item Response Theory Model for Multivariate Count Response Data(2024) Strazzeri, Marian Mullin; Yang, Ji Seung; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Multivariate count data arise frequently in the process of measuring a latent construct in human development, psychology, medicine, education, and the social sciences. Some examples include the number of different types of mistakes a student makes when reading a passage of text, or the number of nausea, vomiting, diarrhea, and/or dysphagia episodes a patient experiences in a given day. These response data are often sampled from multiple sources and/or in multiple stages, yielding a multilevel data structure with lower level sampling units (e.g., individuals, such as students or patients) nested within higher level sampling units or clusters (e.g., schools, clinical trial sites, studies). Motivated by real data, a new Item Response Theory (IRT) model is developed for the integrative analysis of multivariate count data. The proposed mean-parameterized Conway--Maxwell--Poisson Multilevel IRT (CMPmu-MLIRT) model differs from currently available models in its ability to yield sound inferences when applied to multilevel, multivariate count data, where exposure (the length of time, space, or number of trials over which events are recorded) may vary across individuals, and items may provide different amounts of information about an individual’s level of the latent construct being measured (e.g., level of expressive language development, math ability, disease severity). Estimation feasibility is demonstrated through a Monte Carlo simulation study evaluating parameter recovery across various salient conditions. Mean parameter estimates are shown to be well aligned with true parameter values when a sufficient number of items (e.g., 10) are used, while recovery of dispersion parameters may be challenging when as few as 5 items are used. In a second Monte Carlo simulation study, to demonstrate the need for the proposed CMPmu-MLIRT model over currently available alternatives, the impact of CMPmu-MLIRT model misspecification is evaluated with respect to model parameter estimates and corresponding standard errors. Treating an exposure that varies across individuals as though it were fixed is shown to notably overestimate item intercept and slope estimates, and, when substantial variability in the latent construct exists among clusters, underestimate said variance. Misspecifying the number of levels (i.e., fitting a single-level model to multilevel data) is shown to overestimate item slopes---especially when substantial variability in the latent construct exists among clusters---as well as compound the overestimation of item slopes when a varying exposure is also misspecified as being fixed. Misspecifying the conditional item response distributions as Poisson for underdispersed items and negative binomial for overdispersed items is shown to bias estimates of between-cluster variability in the latent construct. Lastly, the applicability of the proposed CMPmu-MLIRT model to empirical data was demonstrated in the integrative data analysis of oral language samples.Item Characterizing the Adventitious Model Error as a Random Effect in Item-Response-Theory Models(2023) Xu, Shuangshuang; Liu, Yang; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)When drawing conclusions from statistical inferences, researchers are usually concerned about two types of errors: sampling error and model error. The sampling error is caused by the discrepancy between the observed sample and the population from which the sample is drawn from (i.e., operational population). The model error refers to the discrepancy between the fitted model and the data-generating mechanism. Most item response theory (IRT) models assume that models are correctly specified in the population of interest; as a result, only sampling errors are characterized, not model errors. The model error can be treated either as fixed or random. The proposed framework in this study treats the model error as a random effect (i.e., an adventitious error) and provides an alternative explanation for the model errors in IRT models that originate from unknown sources. A random, ideally small amount of discrepancy between the operational population and the fitted model is characterized using a Dirichlet-Multinomial framework. A concentration/dispersion parameter is used in the Dirichlet-Multinomial framework to measure the amount of adventitious error between the operational population probability and the fitted model. In general, the study aims to: 1) build a Dirichlet-Multinomial framework for IRT models, 2) establish asymptotic results for estimating model parameters when the operational population probability is assumed known or unknown, 3) conduct numerical studies to investigate parameter recovery and the relationship between the concentration/dispersion parameter in the proposed framework and the Root Mean Square Error of Approximation (RMSEA), 4) correct bias in parameter estimates of the Dirichlet-Multinomial framework using asymptotic approximation methods, and 5) quantify the amount of model error in the framework and decide whether the model should be retained or rejected.Item Multilevel Regression Discontinuity Models with Latent Variables(2020) Morell, Monica; Yang, Ji Seung; Liu, Yang; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Regression discontinuity (RD) designs allow estimating a local average treatment effect (LATE) when assignment of an individual to treatment is determined by their location on a running variable in relation to a cutoff value. The design is especially useful in education settings, where ethical concerns can forestall the use of randomization. Applications of RD in education research typically share two characteristics, which can make the use of the conventional RD model inappropriate: 1) The use of latent constructs, and 2) The hierarchical structure of the data. The running variables often used in education research represent latent constructs (e.g., math ability), which are measured by observed indicators such as categorical item responses. While the use of a latent variable model to account for the relationships among item responses and the latent construct is the preferred approach, conventional RD analyses continue to use observed scores, which can result in invalid or less informative conclusions. The current study proposes a multilevel latent RD model which accounts for the prevalence of clustered data and latent constructs in education research, allows for the generalizability of the LATE to individuals further from the cutoff, and allows researchers to quantify the heterogeneity in the treatment effect due to measurement error in the observed running variable. Models are derived for two of the most commonly used multilevel RD designs. Due to the complex and high-dimensional nature of the proposed models, they are estimated in one stage using full-information likelihood via the Metropolis-Hastings Robbins-Monro algorithm. The results of two simulation studies, under varying sample size and test length conditions, indicate the models perform well when using the full sample with at least moderate-length assessments. A proposed model is used to examine the effects of receiving an English language learner designation on science achievement using the Early Childhood Longitudinal Study. Implications of the results of these studies and future directions for the research are discussed.Item SEEKING CULTURAL FAIRNESS IN A MEASURE OF RELATIONAL REASONING(2016) Dumas, Denis George; Alexander, Patricia A; Human Development; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Relational reasoning, or the ability to identify meaningful patterns within any stream of information, is a fundamental cognitive ability associated with academic success across a variety of domains of learning and levels of schooling. However, the measurement of this construct has been historically problematic. For example, while the construct is typically described as multidimensional—including the identification of multiple types of higher-order patterns—it is most often measured in terms of a single type of pattern: analogy. For that reason, the Test of Relational Reasoning (TORR) was conceived and developed to include three other types of patterns that appear to be meaningful in the educational context: anomaly, antinomy, and antithesis. Moreover, as a way to focus on fluid relational reasoning ability, the TORR was developed to include, except for the directions, entirely visuo-spatial stimuli, which were designed to be as novel as possible for the participant. By focusing on fluid intellectual processing, the TORR was also developed to be fairly administered to undergraduate students—regardless of the particular gender, language, and ethnic groups they belong to. However, although some psychometric investigations of the TORR have been conducted, its actual fairness across those demographic groups has yet to be empirically demonstrated. Therefore, a systematic investigation of differential-item-functioning (DIF) across demographic groups on TORR items was conducted. A large (N = 1,379) sample, representative of the University of Maryland on key demographic variables, was collected, and the resulting data was analyzed using a multi-group, multidimensional item-response theory model comparison procedure. Using this procedure, no significant DIF was found on any of the TORR items across any of the demographic groups of interest. This null finding is interpreted as evidence of the cultural-fairness of the TORR, and potential test-development choices that may have contributed to that cultural-fairness are discussed. For example, the choice to make the TORR an untimed measure, to use novel stimuli, and to avoid stereotype threat in test administration, may have contributed to its cultural-fairness. Future steps for psychometric research on the TORR, and substantive research utilizing the TORR, are also presented and discussed.Item A MIXED-STRATEGIES RASCH TESTLET MODEL FOR LOW-STAKES TESTLET-BASED ASSESSMENTS(2013) Chen, Ying-Fang; Jiao, Hong; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In low-stakes assessments, a lack of test-taking motivation inevitably occurs because test scores impose inconsequential effects on test takers' academic records. A common occurrence is that some test takers are unmotivated and simply apply random guessing strategy rather than solution strategy in taking a test. Testlet effects also arise because educational assessment items are frequently written in testlet units. A challenge to psychometric measurement is that conventional item response theory models do not sufficiently account for test-taking motivation heterogeneity and testlet effects. These construct-irrelevant variances affect test validity, accuracy of parameter estimates, and targeted inferences. This study proposes a low-stakes assessment measurement model that can simultaneously explain test-taking motivation heterogeneity and testlet effects. The performance and effectiveness of the proposed model are evaluated through a simulation study. Its utility is demonstrated through an application to a real standardized low-stakes assessment dataset. Simulation results show that overlooking test-taking motivation heterogeneity and testlet effects adversely affected model-data fit and model parameter estimates. The proposed model improved model-data fit and classification accuracy and well recovered model parameters under test-taking motivation heterogeneity and testlet effects. For the real data application, the item response dataset, which was originally calibrated with the Rasch model, was fitted better by the proposed model. Both test-taking motivation heterogeneity and testlet effects were identified in the real dataset. Finally, a set of variables selected from the real dataset is used to explore potential factors that characterize the latent classes of test-taking motivation. In the science assessment, science proficiency was associated with test-taking motivation heterogeneity.Item An Integrated Item Response Model for Evaluating Individual Students' Growth in Educational Achievement(2009) Koran, Jennifer; Hancock, Gregory R.; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Measuring continuous change or growth in individual students' academic abilities over time currently uses several statistical models or transformations to move from data representing a student's correct or incorrect responses on individual test items to inferences about the form and quantity of changes in the student's underlying ability. This study proposed and investigated a single integrated model of underlying growth within an Item Response Theory framework as a potential alternative to this approach. A Monte Carlo investigation explored parameter recovery for marginal maximum likelihood estimates via the Expectation-Maximization algorithm under variations of several conditions, including the form of the underlying growth trajectory, the amount of inter-individual variation in the rate(s) of growth, the sample size, the number of items at each time point, and the selection of items administered across time points. A real data illustration with mathematics assessment data from the Early Childhood Longitudinal Study showed the practical use of this integrated model for measuring gains in academic achievement. Overall, this exploration of an integrated model approach contributed to a better understanding of the appropriate use of growth models to draw valid inferences about students' academic growth over time.Item Mixed-format Test Equating: Effects of Test Dimensionality and Common-item Sets(2008-11-18) Cao, Yi; Lissitz, Robert; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The main purposes of this study were to systematically investigate the impact of representativeness and non-representativeness of common-item sets in terms of statistical, content, and format specifications in mixed-format tests using concurrent calibration with unidimensional IRT models, as well as to examine its robustness to various multidimensional test structures. In order to fulfill these purposes, a simulation study was conducted, in which five factors - test dimensionality structure, group ability distributions, statistical, content and format representativeness - were manipulated. The examinees' true and estimated expected total scores were computed and BIAS, RMSE and Classification Consistency indices over 100 replications were then compared. The major findings were summarized as follows: First, considering all of the simulation conditions, the most notable and significant effects on the equating results appeared to be those due to the factor of group ability distributions. The equivalent groups condition always outperformed the nonequivalent groups condition on the various evaluation indices. Second, regardless of the group ability differences, there were no statistically and practically significant interaction effects among the factors of the statistical, content and format representativeness. Third, under the unidimensional test structure, the content and format representativeness factors showed little significant impact on the equating results. Meanwhile, the statistical representativeness factor affected the performance of the concurrent calibration significantly. Fourth, regardless of the various levels of multidimensional test structure, the statistical representativeness factor showed more significant and systematic effects on the performance of the concurrent calibration than the content and format representativeness factors did. When the degree of multidimensionality due to multiple item formats increased, the format representativeness factor began to make significant differences especially under the nonequivalent groups condition. The content representativeness factor, however, showed minimum impact on the equating results regardless of the increase of the degree of multidimensionality due to different content areas. Fifth, the concurrent calibration was not quite robust to the violation of the unidimensionality since the performance of the concurrent calibration with the unidimensional IRT models declined significantly with the increase of the degree of multidimensionality.Item Posterior predictive model checking for multidimensionality in item response theory and Bayesian networks(2006-04-26) Levy, Roy; Mislevy, Robert J; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)If data exhibit a dimensional structure more complex than what is assumed, key conditional independence assumptions of the hypothesized model do not hold. The current work pursues posterior predictive model checking, a flexible family of Bayesian model checking procedures, as a tool for criticizing models in light of inadequately modeled dimensional structure. Factors hypothesized to influence dimensionality and dimensionality assessment are couched in conditional covariance theory and conveyed via geometric representations of multidimensionality. These factors and their hypothesized effects motivate a simulation study that investigates posterior predictive model checking in the context of item response theory for dichotomous observables. A unidimensional model is fit to data that follow compensatory or conjunctive multidimensional item response models to assess the utility of conducting posterior predictive model checking. Discrepancy measures are formulated at the level of individual items and pairs of items. A second study draws from the results of the first study and investigates the model checking techniques in the context of multidimensional Bayesian networks with inhibitory effects. Key findings include support for the hypothesized effects of the manipulated factors with regard to their influence on dimensionality assessment and the superiority of certain discrepancy measures for conducting posterior predictive model checking on dimensionality assessment. The application of these techniques to models both familiar to assessment and those that have not yet become standard practice speaks to the generality of the procedures and its potentially broad applicability.