Improving the Usability of Static Analysis Tools Using Machine Learning

Koc, Ugur

Improving the Usability of Static Analysis Tools Using Machine Learning

dc.contributor.advisor	Porter, Adam A.	en_US
dc.contributor.advisor	Foster, Jeffrey S.	en_US
dc.contributor.author	Koc, Ugur	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2020-02-01T06:42:48Z
dc.date.available	2020-02-01T06:42:48Z
dc.date.issued	2019	en_US
dc.description.abstract	Static analysis can be useful for developers to detect critical security flaws and bugs in software. However, due to challenges such as scalability and undecidability, static analysis tools often have performance and precision issues that reduce their usability and thus limit their wide adoption. In this dissertation, we present machine learning-based approaches to improve the adoption of static analysis tools by addressing two usability challenges: false positive error reports and proper tool configuration. First, false positives are one of the main reasons developers give for not using static analysis tools. To address this issue, we developed a novel machine learning approach for learning directly from program code to classify the analysis results as true or false positives. The approach has two steps: (1) data preparation that transforms source code into certain input formats for processing by sophisticated machine learning techniques; and (2) using the sophisticated machine learning techniques to discover code structures that cause false positive error reports and to learn false positive classification models. To evaluate the effectiveness and efficiency of this approach, we conducted a systematic, comparative empirical study of four families of machine learning algorithms, namely hand-engineered features, bag of words, recurrent neural networks, and graph neural networks, for classifying false positives. In this study, we considered two application scenarios using multiple ground-truth program sets. Overall, the results suggest that recurrent neural networks outperformed the other algorithms, although interesting tradeoffs are present among all techniques. Our observations also provide insight into the future research needed to speed the adoption of machine learning approaches in practice. Second, many static program verification tools come with configuration options that present tradeoffs between performance, precision, and soundness to allow users to customize the tools for their needs. However, understanding the impact of these options and correctly tuning the configurations is a challenging task, requiring domain expertise and extensive experimentation. To address this issue, we developed an automatic approach, auto-tune, to configure verification tools for given target programs. The key idea of auto-tune is to leverage a meta-heuristic search algorithm to probabilistically scan the configuration space using machine learning models both as a fitness function and as an incorrect result filter. auto-tune is tool- and language-agnostic, making it applicable to any off-the-shelf configurable verification tool. To evaluate the effectiveness and efficiency of auto-tune, we applied it to four popular program verification tools for C and Java and conducted experiments under two use-case scenarios. Overall, the results suggest that running verification tools using auto-tune produces results that are comparable to configurations manually-tuned by experts, and in some cases improve upon them with reasonable precision.	en_US
dc.identifier	https://doi.org/10.13016/bqo5-xlnp
dc.identifier.uri	http://hdl.handle.net/1903/25464
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	False Positives	en_US
dc.subject.pquncontrolled	Machine Learning	en_US
dc.subject.pquncontrolled	Program Analysis	en_US
dc.subject.pquncontrolled	Program Verification	en_US
dc.subject.pquncontrolled	Software Engineering	en_US
dc.subject.pquncontrolled	Static Analysis	en_US
dc.title	Improving the Usability of Static Analysis Tools Using Machine Learning	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Koc_umd_0117E_20465.pdf
Size:: 1.93 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations