MULTIDIMENSIONALITY IN THE NAEP SCIENCE ASSESSMENT:SUBSTANTIVE PERSPECTIVES, PSYCHOMETRIC MODELS, AND TASK DESIGN
Mislevy, Robert J
MetadataShow full item record
Educational assessments are characterized by the interplay among substantive theories, task design, and measurement models. Substantive theories define the nature of inferences to be made about students and types of observations that lend support to the targeted inferences. Task design represents the schemes for the design of tasks and extraction of evidence from student behaviors in the task situations. Measurement models are the tools by which observations of students' performances are synthesized to derive the targeted inferences. This dissertation elaborates on the interplay by specifying the entities that are involved and how they work in concert to produce an effective assessment and sound inferences. Developments in several areas are contributing to interest in more complex educational assessments: Advances in cognitive psychology spark interest in more complex inferences about students' knowledge, advances in technology make it possible to collect richer performance data, and advances in statistical methods make fitting more complex models feasible. The question becomes how to construct and analyze assessments to take advantage of this potential. In particular, a framework is required for understanding how to think about selecting and reasoning through the multivariate measurement models that are now available. Illustrations of the idea are made through explicating and analyzing the 1996 National Assessment of Educational Progress (NAEP) Science Assessment. Three measurement models, each of which reflects a particular perspective for thinking about the structure of the assessment, are used to model the item responses. Each model sheds light on a particular aspect of student proficiencies, addresses certain inferences for a particular purpose, and delivers a significant story about the examinees and their learning of science. Each model highlights certain patterns at the expense of hiding other potentially interesting patterns that reside in the data. Model comparison is conducted in terms of conceptual significance and degree of fit. The two criteria are used in complement to check the coherence of the data with the substantive theories underlying the use of the models.