Guided Probabilistic Topic Models for Agenda-setting and Framing

dc.contributor.advisorResnik, Philipen_US
dc.contributor.advisorBoyd-Graber, Jordanen_US
dc.contributor.authorNguyen, Viet Anen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2015-06-26T05:34:37Z
dc.date.available2015-06-26T05:34:37Z
dc.date.issued2015en_US
dc.description.abstractProbabilistic topic models are powerful methods to uncover hidden thematic structures in text by projecting each document into a low dimensional space spanned by a set of topics. Given observed text data, topic models infer these hidden structures and use them for data summarization, exploratory analysis, and predictions, which have been applied to a broad range of disciplines. Politics and political conflicts are often captured in text. Traditional approaches to analyze text in political science and other related fields often require close reading and manual labeling, which is labor-intensive and hinders the use of large-scale collections of text. Recent work, both in computer science and political science, has used automated content analysis methods, especially topic models to substantially reduce the cost of analyzing text at large scale. In this thesis, we follow this approach and develop a series of new probabilistic topic models, guided by additional information associated with the text, to discover and analyze agenda-setting (i.e., what topics people talk about) and framing (i.e., how people talk about those topics), a central research problem in political science, communication, public policy and other related fields. We first focus on study agendas and agenda control behavior in political debates and other conversations. The model we introduce, Speaker Identity for Topic Segmentation (SITS), is able to discover what topics that are talked about during the debates, when these topics change, and a speaker-specific measure of agenda control. To make the analysis process more effective, we build Argviz, an interactive visualization which leverages SITS's outputs to allow users to quickly grasp the conversational topic dynamics, discover when the topic changes and by whom, and interactively visualize the conversation's details on demand. We then analyze policy agendas in a more general setting of political text. We present the Label to Hierarchy (L2H) model to learn a hierarchy of topics from multi-labeled data, in which each document is tagged with multiple labels. The model captures the dependencies among labels using an interpretable tree-structured hierarchy, which helps provide insights about the political attentions that policymakers focus on, and how these policy issues relate to each other. We then go beyond just agenda-setting and expand our focus to framing--the study of how agenda issues are talked about, which can be viewed as second-level agenda-setting. To capture this hierarchical views of agendas and frames, we introduce the Supervised Hierarchical Latent Dirichlet Allocation (SHLDA) model, which jointly captures a collection of documents, each is associated with a continuous response variable such as the ideological position of the document's author on a liberal-conservative spectrum. In the topic hierarchy discovered by SHLDA, higher-level nodes map to more general agenda issues while lower-level nodes map to issue-specific frames. Although qualitative analysis shows that the topic hierarchies learned by SHLDA indeed capture the hierarchical view of agenda-setting and framing motivating the work, interpreting the discovered hierarchy still incurs moderately high cost due to the complex and abstract nature of framing. Motivated by improving the hierarchy, we introduce Hierarchical Ideal Point Topic Model (HIPTM) which jointly models a collection of votes (e.g., congressional roll call votes) and both the text associated with the voters (e.g., members of Congress) and the items (e.g., congressional bills). Customized specifically for capturing the two-level view of agendas and frames, HIPTM learns a two-level hierarchy of topics, in which first-level nodes map to an interpretable policy issue and second-level nodes map to issue-specific frames. In addition, instead of using pre-computed response variable, HIPTM also jointly estimates the ideological positions of voters on multiple interpretable dimensions.en_US
dc.identifierhttps://doi.org/10.13016/M2H056
dc.identifier.urihttp://hdl.handle.net/1903/16600
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledAgenda settingen_US
dc.subject.pquncontrolledComputational Social Scienceen_US
dc.subject.pquncontrolledFramingen_US
dc.subject.pquncontrolledMachine Learningen_US
dc.subject.pquncontrolledNatural Language Processingen_US
dc.titleGuided Probabilistic Topic Models for Agenda-setting and Framingen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nguyen_umd_0117E_16056.pdf
Size:
4.09 MB
Format:
Adobe Portable Document Format