AUTOMATIC FEATURE ENGINEERING FOR DISCOVERING AND EXPLAINING MALICIOUS BEHAVIORS

Zhu, Ziyun

AUTOMATIC FEATURE ENGINEERING FOR DISCOVERING AND EXPLAINING MALICIOUS BEHAVIORS

dc.contributor.advisor	Dumitras, Tudor	en_US
dc.contributor.author	Zhu, Ziyun	en_US
dc.contributor.department	Electrical Engineering	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2019-06-20T05:36:24Z
dc.date.available	2019-06-20T05:36:24Z
dc.date.issued	2019	en_US
dc.description.abstract	A key task of cybersecurity is to discover and explain malicious behaviors of malware. The understanding of malicious behaviors helps us further develop good features and apply machine learning techniques to detect various attacks. The effectiveness of machine learning techniques primarily depends on the manual feature engineering process, based on human knowledge and intuition. However, given the adversaries’ efforts to evade detection and the growing volume of publications on malicious behaviors, the feature engineering process likely draws from a fraction of the relevant knowledge. Therefore, it is necessary and important to design an automated system to engineer features for discovering malicious behaviors and detecting attacks. First, we describe a knowledge-based feature engineering technique for malware detection. It mines documents written in natural language (e.g. scientific literature), and represents and queries the knowledge about malware in a way that mirrors the human feature engineering process. We implement the idea in a system called FeatureSmith, which generates a feature set for detecting Android malware. We train a classifier using these features on a large data set of benign and malicious apps. This classifier achieves comparable performance to a state-of-the-art Android malware detector that relies on manually engineered features. In addition, FeatureSmith is able to suggest informative features that are absent from the manually engineered set and to link the features generated to abstract concepts that describe malware behaviors. Second, we propose a data-driven feature engineering technique called ReasonSmith, which explains machine learning models by ranking features based on their global importance. Instead of interpreting how neural networks make decisions for one specific sample, ReasonSmith captures general importance in terms of the whole data set. In addition, ReasonSmith allows us to efficiently identify data biases and artifacts, by comparing feature rankings over time. We further summarize the common data biases and artifacts for malware detection problems at the level of API calls. Third, we study malware detection from a global view, and explore automatic feature engineering problem in analyzing campaigns that include a series of actions. We implement a system ChainSmith to bridge large-scale field measurement and manual campaign report by extracting and categorizing IOCs (indicators of compromise) from security blogs. The semantic roles of IOCs allow us to link qualitative data (e.g. security blogs) to quantitative measurements, which brings new insights to malware campaigns. In particular, we study the effectiveness of different persuasion techniques used on enticing user to download the payloads. We find that the campaign usually starts from social engineering and “missing codec” ruse is a common persuasion technique that generates the most suspicious downloads each day.	en_US
dc.identifier	https://doi.org/10.13016/iris-0jtw
dc.identifier.uri	http://hdl.handle.net/1903/22000
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Electrical engineering	en_US
dc.subject.pquncontrolled	feature engineering	en_US
dc.subject.pquncontrolled	machine learning	en_US
dc.subject.pquncontrolled	malware campaign	en_US
dc.subject.pquncontrolled	malware detection	en_US
dc.title	AUTOMATIC FEATURE ENGINEERING FOR DISCOVERING AND EXPLAINING MALICIOUS BEHAVIORS	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zhu_umd_0117E_19908.pdf
Size:: 2.63 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Electrical & Computer Engineering Theses and Dissertations