AUTOMATIC FEATURE ENGINEERING FOR DISCOVERING AND EXPLAINING MALICIOUS BEHAVIORS

dc.contributor.advisorDumitras, Tudoren_US
dc.contributor.authorZhu, Ziyunen_US
dc.contributor.departmentElectrical Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2019-06-20T05:36:24Z
dc.date.available2019-06-20T05:36:24Z
dc.date.issued2019en_US
dc.description.abstractA key task of cybersecurity is to discover and explain malicious behaviors of malware. The understanding of malicious behaviors helps us further develop good features and apply machine learning techniques to detect various attacks. The effectiveness of machine learning techniques primarily depends on the manual feature engineering process, based on human knowledge and intuition. However, given the adversaries’ efforts to evade detection and the growing volume of publications on malicious behaviors, the feature engineering process likely draws from a fraction of the relevant knowledge. Therefore, it is necessary and important to design an automated system to engineer features for discovering malicious behaviors and detecting attacks. First, we describe a knowledge-based feature engineering technique for malware detection. It mines documents written in natural language (e.g. scientific literature), and represents and queries the knowledge about malware in a way that mirrors the human feature engineering process. We implement the idea in a system called FeatureSmith, which generates a feature set for detecting Android malware. We train a classifier using these features on a large data set of benign and malicious apps. This classifier achieves comparable performance to a state-of-the-art Android malware detector that relies on manually engineered features. In addition, FeatureSmith is able to suggest informative features that are absent from the manually engineered set and to link the features generated to abstract concepts that describe malware behaviors. Second, we propose a data-driven feature engineering technique called ReasonSmith, which explains machine learning models by ranking features based on their global importance. Instead of interpreting how neural networks make decisions for one specific sample, ReasonSmith captures general importance in terms of the whole data set. In addition, ReasonSmith allows us to efficiently identify data biases and artifacts, by comparing feature rankings over time. We further summarize the common data biases and artifacts for malware detection problems at the level of API calls. Third, we study malware detection from a global view, and explore automatic feature engineering problem in analyzing campaigns that include a series of actions. We implement a system ChainSmith to bridge large-scale field measurement and manual campaign report by extracting and categorizing IOCs (indicators of compromise) from security blogs. The semantic roles of IOCs allow us to link qualitative data (e.g. security blogs) to quantitative measurements, which brings new insights to malware campaigns. In particular, we study the effectiveness of different persuasion techniques used on enticing user to download the payloads. We find that the campaign usually starts from social engineering and “missing codec” ruse is a common persuasion technique that generates the most suspicious downloads each day.en_US
dc.identifierhttps://doi.org/10.13016/iris-0jtw
dc.identifier.urihttp://hdl.handle.net/1903/22000
dc.language.isoenen_US
dc.subject.pqcontrolledElectrical engineeringen_US
dc.subject.pquncontrolledfeature engineeringen_US
dc.subject.pquncontrolledmachine learningen_US
dc.subject.pquncontrolledmalware campaignen_US
dc.subject.pquncontrolledmalware detectionen_US
dc.titleAUTOMATIC FEATURE ENGINEERING FOR DISCOVERING AND EXPLAINING MALICIOUS BEHAVIORSen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhu_umd_0117E_19908.pdf
Size:
2.63 MB
Format:
Adobe Portable Document Format