Discovering Credible Events In Near Real Time From Social Media Streams

dc.contributor.advisorGolbeck, Jenniferen_US
dc.contributor.authorBuntain, Codyen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2017-01-24T06:33:28Z
dc.date.available2017-01-24T06:33:28Z
dc.date.issued2016en_US
dc.description.abstractRecent reliance on social media platforms as major sources of news and information, both for journalists and the larger population and especially during times of crisis, motivate the need for better methods of identifying and tracking high-impact events in these social media streams. Social media's volume, velocity, and democratization of information (leading to limited quality controls) complicate rapid discovery of these events and one's ability to trust the content posted about these events. This dissertation addresses these complications in four stages, using Twitter as a model social platform. The first stage analyzes Twitter's response to major crises, specifically terrorist attacks in Western countries, showing these high-impact events do not significantly impact message or user volume. Instead, these events drive changes in Twitter's topic distribution, with conversation, retweets, and hashtags relevant to these events experiencing significant, rapid, and short-lived bursts in frequency. Furthermore, conversation participants tend to prefer information from local authorities/organizations/media over national or international sources, with accounts for local police or local newspapers often emerging as central in the networks of interaction. Building on these results, the second stage in this dissertation presents and evaluates a set of features that capture these topical bursts associated with crises by modeling bursts in frequency for individual tokens in the Twitter stream. The resulting streaming algorithm is capable of discovering notable moments across a series of major sports competitions using Twitter's public stream without relying on domain- or language-specific information or models. Furthermore, results demonstrate models trained on sporting competition data perform well when transferred to earthquake identification. This streaming algorithm is then extended in this dissertation's third stage to support real-time event tracking and summarization. This real-time algorithm leverages new distributed processing technology to operate at scale and is evaluated against a collection of other community-developed information retrieval systems, where it performs comparably. Further experiments also show this real-time burst detection algorithm can be integrated with these other information retrieval systems to increase overall performance. The final stage then investigates automated methods for evaluating credibility in social media streams by leveraging two existing data sets. These two data sets measure different types of credibility (veracity versus perception), and results show veracity is negatively correlated with the amount of disagreement in and length of a conversation, and perceptions of credibility are influenced by the amount of links to other pages, shared media about the event, and the number of verified users participating in the discussion. Contributions made across these four stages are then usable in the relatively new fields of computational journalism and crisis informatics, which seek to improve news gathering and crisis response by leveraging new technologies and data sources like machine learning and social media.en_US
dc.identifierhttps://doi.org/10.13016/M2QC2Q
dc.identifier.urihttp://hdl.handle.net/1903/18946
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pqcontrolledArtificial intelligenceen_US
dc.subject.pquncontrolledaccuracyen_US
dc.subject.pquncontrolledcredibilityen_US
dc.subject.pquncontrolledevent detectionen_US
dc.subject.pquncontrolledmachine learningen_US
dc.subject.pquncontrolledtwitteren_US
dc.titleDiscovering Credible Events In Near Real Time From Social Media Streamsen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Buntain_umd_0117E_17592.pdf
Size:
3.48 MB
Format:
Adobe Portable Document Format