Continuous, Effort-Aware Prediction of Software Security Defects

Stuckman, Jeffrey Charles

Continuous, Effort-Aware Prediction of Software Security Defects

dc.contributor.advisor	Purtilo, James	en_US
dc.contributor.author	Stuckman, Jeffrey Charles	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2015-09-18T06:03:19Z
dc.date.available	2015-09-18T06:03:19Z
dc.date.issued	2015	en_US
dc.description.abstract	Software security defects are coding flaws which allow for a system's security to be compromised. Due to the potential severity of these defects, it is important to discover them quickly; therefore, they are a good focus for software quality improvement efforts such as code inspection. Our research focuses on vulnerability prediction models, which use machine learning to identify code that has an elevated likelihood of containing these defects. In particular, we study continuous prediction models, which repeatedly search for vulnerable code over a period of time, rather than being used at just one particular moment. To empirically evaluate the prediction methodologies that we define, we collected a fine-grained dataset of vulnerabilities in PHP applications. We then defined and implemented a method for defining families of features, or metrics, which characterize both the change in code over time and the state of the code at a given moment, enabling a systematic and fair comparison of continuous and traditional prediction models. We also introduce a methodology for effort-sensitive learning, which optimizes to minimize the expected cost of inspecting the code that is ultimately flagged by the model. Our results show that the security defects in our dataset were long-lived, with a median lifetime of 871 days. Continuous prediction more readily discriminated vulnerable from non-vulnerable code than traditional static prediction did, and prediction was more efficient when changes were broken apart by file than when they were aggregated together. However, high code churn negated some of the efficiency gains of continuous predictors in simulations, and the optimal prediction method in a given scenario depended on making a tradeoff between speed of detection and cost savings. As an additional contribution, we have released the fine-grained defect dataset -- the first of its kind -- to the public, in order to encourage future work in this field.	en_US
dc.identifier	https://doi.org/10.13016/M2KW7J
dc.identifier.uri	http://hdl.handle.net/1903/17113
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	defect prediction	en_US
dc.subject.pquncontrolled	machine learning	en_US
dc.subject.pquncontrolled	measurement	en_US
dc.subject.pquncontrolled	security vulnerabilities	en_US
dc.subject.pquncontrolled	software engineering	en_US
dc.subject.pquncontrolled	vulnerability prediction	en_US
dc.title	Continuous, Effort-Aware Prediction of Software Security Defects	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Stuckman_umd_0117E_16567.pdf
Size:: 1.71 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations