Using Historical Data From Source Code Revision Histories to Detect Source Code Properties

Thumbnail Image


umi-umd-3682.pdf (1.04 MB)
No. of downloads: 950

Publication or External Link






In this dissertation, we describe several techniques for using historical data mined from the source code revision histories of software projects to determine important properties of the source code. These properties are then used to improve the results of various bug-finding techniques as well as to provide documentation to the developer. We describe a method to mine source code revision histories, in this case CVS repositories, to extract relevant information to be fed into a static source code bug finder for use in improving the results generated by the bug finding tool. We apply this technique to the CVS repositories of two widely used open source software projects, Apache httpd and Wine. We show how source code revision history can be used to reduce false positives from a static source code checker that identifies the misuse of values returned from a function call. A method of mining source code revision histories for the purpose of learning about project specific idioms is then discussed. Specifically, we show how source code revision history can be used to identify patterns of calling sequences that describe how functions in the software should be used in relation to each other. With this data, we are able to find bugs in the source code, document API usage and identify refactoring events. In short, this dissertation shows that it is possible to automatically determine meaningful properties of the source code from studying source code changes cataloged in the software revision history.