Human Development & Quantitative Methodology

Permanent URI for this communityhttp://hdl.handle.net/1903/2248

The departments within the College of Education were reorganized and renamed as of July 1, 2011. This department incorporates the former departments of Measurement, Statistics & Evaluation; Human Development; and the Institute for Child Study.

Browse

Search Results

Now showing 1 - 9 of 9

Accuracy and consistency in discovering dimensionality by correlation constraint analysis and common factor analysis
(2009) Tractenberg, Rochelle Elaine; Hancock, Gregory R; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
An important application of multivariate analysis is the estimation of the underlying dimensions of an instrument or set of variables. Estimation of dimensions is often pursued with the objective of finding the single factor or dimension to which each observed variable belongs or by which it is most strongly influenced. This can involve estimating the loadings of observed variables on a pre-specified number of factors, achieved by common factor analysis (CFA) of the covariance or correlational structure of the observed variables. Another method, correlation constraint analysis (CCA), operates on the determinants of all 2x2 submatrices of the covariance matrix of the variables. CCA software also determines if partialling out the effects of any observed variable affects observed correlations, the only exploratory method to specifically rule out (or identify) observed variables as being the cause of correlations among observed variables. CFA estimates the strengths of associations between factors, hypothesized to underlie or cause observed correlations, and the observed variables; CCA does not estimate factor loadings but can uncover mathematical evidence of the causal relationships hypothesized between factors and observed variables. These are philosophically and analytically diverse methods for estimating the dimensionality of a set of variables, and each can be useful in understanding the simple structure in multivariate data. This dissertation studied the performances of these methods at uncovering the dimensionality of simulated data under conditions of varying sample size and model complexity, the presence of a weak factor, and correlated vs. independent factors. CCA was sensitive (performed significantly worse) when these conditions were present in terms of omitting more factors, and omitting and mis-assigning more indicators. CFA was also found to be sensitive to all but one condition (whether factors were correlated or not) in terms of omitting factors; it was sensitive to all conditions in terms of omitting and mis-assigning indicators, and it also found extra factors depending on the number of factors in the population, the purity of factors and the presence of a weak factor. This is the first study of CCA in data with these specific features of complexity, which are common in multivariate data.
Finite Mixture Model Specifications Accommodating Treatment Nonresponse in Experimental Research
(2009) Wasko, John A.; Hancock, Gregory R; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
For researchers exploring causal inferences with simple two group experimental designs, results are confounded when using common statistical methods and further are unsuitable in cases of treatment nonresponse. In signal processing, researchers have successfully extracted multiple signals from data streams with Gaussian mixture models, where their use is well matched to accommodate researchers in this predicament. While the mathematics underpinning models in either application remains unchanged, there are stark differences. In signal processing, results are definitively evaluated assessing whether extracted signals are interpretable. Such obvious feedback is unavailable to researchers seeking causal inference who instead rely on empirical evidence from inferential statements regarding mean differences, as done in analysis of variance (ANOVA). Two group experimental designs do provide added benefit by anchoring treatment nonrespondents' distributional response properties from the control group. Obtaining empirical evidence supporting treatment nonresponse, however, can be extremely challenging. First, if indeed nonresponse exists, then basic population means, ANOVA or repeated measures tests cannot be used because of a violation of the identical distribution property required for each method. Secondly, the mixing parameter or proportion of nonresponse is bounded between 0 and 1, so does not subscribe to normal distribution theory to enable inference by common methods. This dissertation introduces and evaluates the performance of an information-based methodology as a more extensible and informative alternative to statistical tests of population means while addressing treatment nonresponse. Gaussian distributions are not required under this methodology which simultaneously provides empirical evidence through model selection regarding treatment nonresponse, equality of population means, and equality of variance hypotheses. The use of information criteria measures as an omnibus assessment of a set of mixture and non-mixture models within a maximum likelihood framework eliminates the need for a Newton-Pearson framework of probabilistic inferences on individual parameter estimates. This dissertation assesses performance in recapturing population conditions for hypotheses' conclusions, parameter accuracy, and class membership. More complex extensions addressing multiple treatments, multiple responses within a treatment, a priori consideration of covariates, and multivariate responses within a latent framework are also introduced.
Testing for Differentially Functioning Indicators Using Mixtures of Confirmatory Factor Analysis Models
(2009) Mann, Heather Marie; Hancock, Gregory R; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Heterogeneity in measurement model parameters across known groups can be modeled and tested using multigroup confirmatory factor analysis (CFA). When it is not reasonable to assume that parameters are homogeneous for all observations in a manifest group, mixture CFA models are appropriate. Mixture CFA models can add theoretically important unmeasured characteristics to capture heterogeneity and have the potential to be used to test measurement invariance. The current study investigated the ability of mixture CFA models to identify differences in factor loadings across latent classes when there is no mean separation in both the latent and measured variables. Using simulated data from models with known parameters, parameter recovery, classification accuracy, and the power of the likelihood-ratio test were evaluated as impacted by model complexity, sample size, latent class proportions, magnitude of factor loading differences, percentage of noninvariant factor loadings, and pattern of noninvariant factor loadings. Results suggested that mixture CFA models may be a viable option for testing the invariance of measurement model parameters, but without impact and differences in measurement intercepts, larger sample sizes, more noninvariant factor loadings, and larger amounts of heterogeneity are needed to distinguish different latent classes and successfully estimate their parameters.
The Multidimensional Generalized Graded Unfolding Model for Assessment of Change across Repeated Measures
(2008-05-13) Cui, Weiwei; Roberts, James S; Dayton, Chan M; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
A multidimensional extension of the generalized graded unfolding model for repeated measures (GGUM-RM) is introduced and applied to analyze attitude change across time using responses collected by a Thurstone or Likert questionnaire. The model conceptualizes the change across time as separate latent variables and provides direct estimates of both individual and group change while accounting for the dependency among latent variables. The parameters and hyperparameters of GGUM-RM are estimated by fully Bayesian estimation method via WinBUGS. The accuracy of the estimation procedure is demonstrated by a simulation study, and the application of the GGUM-RM is illustrated by the analysis of attitude change toward abortion among college students.
MULTIDIMENSIONALITY IN THE NAEP SCIENCE ASSESSMENT:SUBSTANTIVE PERSPECTIVES, PSYCHOMETRIC MODELS, AND TASK DESIGN
(2008-03-05) Wei, Hua; Mislevy, Robert J; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Educational assessments are characterized by the interplay among substantive theories, task design, and measurement models. Substantive theories define the nature of inferences to be made about students and types of observations that lend support to the targeted inferences. Task design represents the schemes for the design of tasks and extraction of evidence from student behaviors in the task situations. Measurement models are the tools by which observations of students' performances are synthesized to derive the targeted inferences. This dissertation elaborates on the interplay by specifying the entities that are involved and how they work in concert to produce an effective assessment and sound inferences. Developments in several areas are contributing to interest in more complex educational assessments: Advances in cognitive psychology spark interest in more complex inferences about students' knowledge, advances in technology make it possible to collect richer performance data, and advances in statistical methods make fitting more complex models feasible. The question becomes how to construct and analyze assessments to take advantage of this potential. In particular, a framework is required for understanding how to think about selecting and reasoning through the multivariate measurement models that are now available. Illustrations of the idea are made through explicating and analyzing the 1996 National Assessment of Educational Progress (NAEP) Science Assessment. Three measurement models, each of which reflects a particular perspective for thinking about the structure of the assessment, are used to model the item responses. Each model sheds light on a particular aspect of student proficiencies, addresses certain inferences for a particular purpose, and delivers a significant story about the examinees and their learning of science. Each model highlights certain patterns at the expense of hiding other potentially interesting patterns that reside in the data. Model comparison is conducted in terms of conceptual significance and degree of fit. The two criteria are used in complement to check the coherence of the data with the substantive theories underlying the use of the models.
INVESTIGATING DIFFERENTIAL ITEM FUNCTION AMPLIFICATION AND CANCELLATION IN APPLICATION OF ITEM RESPONSE TESTLET MODELS
(2007-05-24) Bao, Han; Dayton, Mitchell; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Many educational tests use testlets as a way of providing context, instead of presenting only discrete multiple-choice items, where items are grouped into testlets (Wainer & Kiely, 1987) or item bundles (Rosenbaum, 1988) marked by shared common stimulus materials. One might doubt the usual assumption of standard item response theory of local independence among items in these cases. Plausible causes of local dependence might be test takers' different levels of background knowledge necessary to understand the common passage, as a considerable amount of mental processing may be required to read and understand the stimulus, and different persons' learning experiences. Here, the local dependence can be viewed as additional dimensions other than the latent traits. Furthermore, from the multidimensional differential item functioning (DIF) point of view, different distributions of testlet dimensions among different examinee subpopulations (race, gender, etc) could be the cognitive cause of individual differences in test performance. When testlet effect and item idiosyncratic features of individual items are both considered to be the reasons of DIF, it is interesting to investigate the phenomena of DIF amplification and cancellation resulting from the interactive effects of these two factors. This dissertation presented a study based on a multiple-group testlet item response theory model developed by Li et al. (2006) to examine in detail different situations of DIF amplification and cancellation at the item and testlet level using testlet characteristic curve procedures with signed/ unsigned area indices and logistic regression procedure. The testlet DIF model was estimated using a hierarchical Bayesian framework with the Markov Chain Monte Carlo (MCMC) method implemented in the computer software WINBUGS. The simulation study investigated all of the possible conditions of DIF amplification and cancellation attributed to person-testlet interaction effect and individual item characteristics. Real data analysis indicated the existence of testlet effect and its magnitudes of difference on the means and/or variance of testlet distribution between manifest groups imputed to the different contexts or natures of the passages as well as its interaction with the manifest groups of examinees such as gender or ethnicity.
EFFECT OF CATEGORIZATION ON TYPE I ERROR AND POWER IN ORDINAL INDICATOR LATENT MEANS MODELS FOR BETWEEN-SUBJECTS DESIGNS
(2006-07-28) Choi, Jaehwa; Hancock, Gregory R; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Due to the superiority of latent means models (LMM) over the modeling of means on a single measured variable (ANOVA) or on a composite (MANOVA) in terms of power and effect size estimation, LMM is starting to be recognized as a powerful modeling technique. Conducting a group difference (e.g., a treatment effect) testing at the latent level, LMM enables us to analyze the consequence of the measurement error on measured level variable(s). And, this LMM has been developed for both interval indicators (IILMM; Jöreskog & Goldberger, 1975, Muthén, 1989, Sörbom, 1974) and ordinal indicators (OILMM; Jöreskog, 2002). Recently, effect size estimates, post hoc power estimates, and a priori sample size determination for LMM have been developed for interval indicators (Hancock, 2001). Considering the frequent analysis of ordinal data in the social and behavior sciences, it seems most appropriate that these measures and methods be extended to LMM involving such data, OILMM. However, unlike IILMM, the OILMM power analysis involves various additional issues regarding the ordinal indicators. This research starts with illustrating various aspects of the OILMM: options for handling ordinal variables' metric level, options of estimating OILMM, and the nature of ordinal data (e.g., number of categories, categorization rules). Also, this research proposes a test statistic of the OILMM power analysis parallel to the IILMM results by Hancock (2001). The main purpose of this research is to examine the effect of categorization (mostly focused on the options handling ordinal indicators, and number of ordinal categories) on Type I error and power in OILMM based on the proposed measures and OILMM test statistic. A simulation study is conducted particularly for the two-populations between-subjects design case. Also, a numerical study is provided using potentially useful statistics and indices to help understanding the consequence of the categorization especially when one treats ordinal data as if they had metric properties.
Posterior predictive model checking for multidimensionality in item response theory and Bayesian networks
(2006-04-26) Levy, Roy; Mislevy, Robert J; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
If data exhibit a dimensional structure more complex than what is assumed, key conditional independence assumptions of the hypothesized model do not hold. The current work pursues posterior predictive model checking, a flexible family of Bayesian model checking procedures, as a tool for criticizing models in light of inadequately modeled dimensional structure. Factors hypothesized to influence dimensionality and dimensionality assessment are couched in conditional covariance theory and conveyed via geometric representations of multidimensionality. These factors and their hypothesized effects motivate a simulation study that investigates posterior predictive model checking in the context of item response theory for dichotomous observables. A unidimensional model is fit to data that follow compensatory or conjunctive multidimensional item response models to assess the utility of conducting posterior predictive model checking. Discrepancy measures are formulated at the level of individual items and pairs of items. A second study draws from the results of the first study and investigates the model checking techniques in the context of multidimensional Bayesian networks with inhibitory effects. Key findings include support for the hypothesized effects of the manipulated factors with regard to their influence on dimensionality assessment and the superiority of certain discrepancy measures for conducting posterior predictive model checking on dimensionality assessment. The application of these techniques to models both familiar to assessment and those that have not yet become standard practice speaks to the generality of the procedures and its potentially broad applicability.
PSYCHOMETRIC ANALYSES BASED ON EVIDENCE-CENTERED DESIGN AND COGNITIVE SCIENCE OF LEARNING TO EXPLORE STUDENTS' PROBLEM-SOLVING IN PHYSICS
(2003-12-04) Huang, Chun-Wei; Mislevy, Robert J; Measurement, Statistics and Evaluation
Most analyses of physics assessment tests have been done within the framework of classical test theory in which only the number of correct answers is considered in the scoring. More sophisticated analyses have been developed recently by physics researchers to further study students' conceptions/misconceptions in physics learning to improve physics instruction. However, they are not connected with the well-developed psychometric machinery. The goal of this dissertation is to use a formal psychometric model to study students' conceptual understanding in physics (in particular, Newtonian mechanics). The perspective is based on the evidence-centered design (ECD) framework, building on previous analyses of the cognitive processes of physics problem-solving and the task design from two physics tests (Force Concept Inventory, FCI and Force Motion Concept Evaluation, FMCE) that are commonly used to measure students' conceptual understanding about force-motion relationships. Within the ECD framework, the little-known Andersen/Rasch (AR) multivariate IRT model that can deal with mixtures of strategies within individuals is then introduced and discussed, including the issue of identification of the model. To demonstrate its usefulness, four data sets (one from FCI and three from FMCE) were used and analyzed with the AR model using a Markov Chain Monte Carlo estimation procedure, carried out with the BUGS computer program. Results from the first three data sets (questions were used to assess students' understanding about force-motion relationships) indicate that most students are in a mixed model state (i.e., in a transition toward understanding Newtonian mechanics) after one semester of physics learning. In particular, they incorrectly tend to believe that there must be a force acting on an object to maintain its movement, one of the common misconceptions indicated in physics literature. Findings from the last data set (which deals with acceleration) indicate that although students have improved their understanding about acceleration after one semester of instruction, they may still find it difficult to represent their understanding in terms of acceleration-time graphs. This is especially so when the object is slowing down or moving toward the left, in which case the sign of acceleration in both task scenarios is negative.

Human Development & Quantitative Methodology

Browse

Filters

Settings

Sort By

Results per page

Search Results