Developing and Measuring Latent Constructs in Text
Files
(RESTRICTED ACCESS)
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Constructs---like inflation, populism, or paranoia---are of fundamental concern to social science. Constructs are the vocabulary over which theory operates, and so a central activity is the development and measurement of latent constructs from observable data. Although the social sciences comprise fields with different epistemological norms, they share a concern for valid operationalizations that transparently map between data and measure. Economists at the US Bureau of Labor Statistics, for example, follow a hundred-page handbook to sample the egg prices that constitute the Consumer Price Index; Clinical psychologists rely on suites of psychometric tests to diagnose schizophrenia.
In many fields, this observable data takes the form of language: as a social phenomenon, language data can encode many of the latent social constructs that people care about. Commensurate with both increasing sophistication in language technologies and amounts of available data, there has thus emerged a "text-as-data" paradigm aimed at "amplifying and augmenting" the analyses that compose research. At the same time, Natural Language Processing (NLP), the field from which analysis tools originate, has often remained separate from real-world problems and guiding theories---as least when it comes to social science. Instead, it focuses on atomized tasks under the assumption that progress on low-level language aspects will generalize to higher-level problems that involve overlapping elements.
This dissertation focuses on NLP methods and evaluations that facilitate the development and measurement of latent constructs from natural language, while remaining sensitive to social sciences' need for interpretability and validity.