Characterization and Detection of Malicious Behavior on the Web
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Web platforms enable unprecedented speed and ease in transmission of knowledge, and allow users to communicate and shape opinions. However, the safety, usability and reliability of these platforms is compromised by the prevalence of online malicious behavior -- for example 40% of users have experienced online harassment. This is present in the form of malicious users, such as trolls, sockpuppets and vandals, and misinformation, such as hoaxes and fraudulent reviews. This thesis presents research spanning two aspects of malicious behavior: characterization of their behavioral properties, and development of algorithms and models for detecting them.
We characterize the behavior of malicious users and misinformation in terms of their activity, temporal frequency of actions, network connections to other entities, linguistic properties of how they write, and community feedback received from others. We find several striking characteristics of malicious behavior that are very distinct from those of benign behavior. For instance, we find that vandals and fraudulent reviewers are faster in their actions compared to benign editors and reviewers, respectively. Hoax articles are long pieces of plain text that are less coherent and created by more recent editors, compared to non-hoax articles. We find that sockpuppets are created that vary in their deceptiveness (i.e., whether they pretend to be different users) and their supportiveness (i.e., if they support arguments of other sockpuppets controlled by the same user).
We create a suite of feature based and graph based algorithms to efficiently detect malicious from benign behavior. We first create the first vandal early warning system that accurately predicts vandals using very few edits. Next, based on the properties of Wikipedia articles, we develop a supervised machine learning classifier to predict whether an article is a hoax, and another that predicts whether a pair of accounts belongs to the same user, both with very high accuracy. We develop a graph-based decluttering algorithm that iteratively removes suspicious edges that malicious users use to masquerade as benign users, which outperforms existing graph algorithms to detect trolls. And finally, we develop an efficient graph-based algorithm to assess the fairness of all reviewers, reliability of all ratings, and goodness of all products, simultaneously, in a rating network, and incorporate penalties for suspicious behavior.
Overall, in this thesis, we develop a suite of five models and algorithms to accurately identify and predict several distinct types of malicious behavior -- namely, vandals, hoaxes, sockpuppets, trolls and fraudulent reviewers -- in multiple web platforms.
The analysis leading to the algorithms develops an interpretable understanding of malicious behavior on the web.