Classifying Bias in Large Multilingual Corpora via Crowdsourcing and Topic Modeling

dc.contributor.advisorZajic, David
dc.contributor.authorCaljean, Brianna
dc.contributor.authorCalvert, Katherine
dc.contributor.authorChang, Ashley
dc.contributor.authorFrank, Elliot
dc.contributor.authorGaray Jáuregui, Rosana
dc.contributor.authorPalo, Geoffrey
dc.contributor.authorRinker, Ryan
dc.contributor.authorWeakly, Gareth
dc.contributor.authorWolfrey, Nicolette
dc.contributor.authorZhang, William
dc.date.accessioned2018-06-22T17:38:58Z
dc.date.available2018-06-22T17:38:58Z
dc.date.issued2018
dc.description.abstractOur project extends previous algorithmic approaches to finding bias in large text corpora. We used multilingual topic modeling to examine language-specific bias in the English, Spanish, and Russian versions of Wikipedia. In particular, we placed Spanish articles discussing the Cold War on a Russian-English viewpoint spectrum based on similarity in topic distribution. We then crowdsourced human annotations of Spanish Wikipedia articles for comparison to the topic model. Our hypothesis was that human annotators and topic modeling algorithms would provide correlated results for bias. However, that was not the case. Our annotators indicated that humans were more perceptive of sentiment in article text than topic distribution, which suggests that our classifier provides a different perspective on a text’s bias.en_US
dc.identifierhttps://doi.org/10.13016/M2R49GC7C
dc.identifier.urihttp://hdl.handle.net/1903/20668
dc.language.isoen_USen_US
dc.relation.isAvailableAtDigital Repository at the University of Maryland
dc.relation.isAvailableAtGemstone Program, University of Maryland (College Park, Md)
dc.subjectGemstone Team BIASESen_US
dc.titleClassifying Bias in Large Multilingual Corpora via Crowdsourcing and Topic Modelingen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BIASES Thesis.pdf
Size:
1.46 MB
Format:
Adobe Portable Document Format
Description: