Continuous, Effort-Aware Prediction of Software Security Defects

dc.contributor.advisorPurtilo, Jamesen_US
dc.contributor.authorStuckman, Jeffrey Charlesen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2015-09-18T06:03:19Z
dc.date.available2015-09-18T06:03:19Z
dc.date.issued2015en_US
dc.description.abstractSoftware security defects are coding flaws which allow for a system's security to be compromised. Due to the potential severity of these defects, it is important to discover them quickly; therefore, they are a good focus for software quality improvement efforts such as code inspection. Our research focuses on vulnerability prediction models, which use machine learning to identify code that has an elevated likelihood of containing these defects. In particular, we study continuous prediction models, which repeatedly search for vulnerable code over a period of time, rather than being used at just one particular moment. To empirically evaluate the prediction methodologies that we define, we collected a fine-grained dataset of vulnerabilities in PHP applications. We then defined and implemented a method for defining families of features, or metrics, which characterize both the change in code over time and the state of the code at a given moment, enabling a systematic and fair comparison of continuous and traditional prediction models. We also introduce a methodology for effort-sensitive learning, which optimizes to minimize the expected cost of inspecting the code that is ultimately flagged by the model. Our results show that the security defects in our dataset were long-lived, with a median lifetime of 871 days. Continuous prediction more readily discriminated vulnerable from non-vulnerable code than traditional static prediction did, and prediction was more efficient when changes were broken apart by file than when they were aggregated together. However, high code churn negated some of the efficiency gains of continuous predictors in simulations, and the optimal prediction method in a given scenario depended on making a tradeoff between speed of detection and cost savings. As an additional contribution, we have released the fine-grained defect dataset -- the first of its kind -- to the public, in order to encourage future work in this field.en_US
dc.identifierhttps://doi.org/10.13016/M2KW7J
dc.identifier.urihttp://hdl.handle.net/1903/17113
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolleddefect predictionen_US
dc.subject.pquncontrolledmachine learningen_US
dc.subject.pquncontrolledmeasurementen_US
dc.subject.pquncontrolledsecurity vulnerabilitiesen_US
dc.subject.pquncontrolledsoftware engineeringen_US
dc.subject.pquncontrolledvulnerability predictionen_US
dc.titleContinuous, Effort-Aware Prediction of Software Security Defectsen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Stuckman_umd_0117E_16567.pdf
Size:
1.71 MB
Format:
Adobe Portable Document Format