Show simple item record

dc.contributor.advisorPorter, Adamen_US
dc.contributor.authorSong, Charlesen_US
dc.date.accessioned2012-02-17T07:14:06Z
dc.date.available2012-02-17T07:14:06Z
dc.date.issued2011en_US
dc.identifier.urihttp://hdl.handle.net/1903/12397
dc.description.abstractMany modern software systems are highly configurable. While a high degree of configurability has many benefits, such as extensibility, reusability and portability, it also has its costs. In the worst case, the full configuration space of a system is the exponentially large combination of all possible option settings and every configuration can potentially produce unique behavior in the software system. Therefore, this software configuration space explosion problem adds combinatorial complexity to many already difficult software engineering tasks. To date, much of the research in this area has tackled this problem using black-box techniques, such as combinatorial interaction testing (CIT). Although these techniques are promising in systematizing the testing and analysis of configurable systems, they ignore a system's internal structure and we think that is a huge missed opportunity. We hypothesize that systems are often structured such that their effective configuration spaces -- the set of configurations needed to achieve a specific goal -- are often much smaller than their full configuration spaces. And if we can efficiently identify or approximate the effective configuration spaces, then we can use that information to greatly improve various software engineering tasks. To understand the effective configuration spaces of software systems, we used symbolic evaluation, a white-box analysis, to capture all executions a system can take under any configuration. The symbolic evaluation results confirmed that the effective configuration spaces are in fact the composition of many small, self-contained groupings of options. And we developed analysis techniques to succinctly characterize how configurations interact with a system's internal structures. We showed that while the majority of a system's interactions are relatively low strength, some important high-strength interactions do exist, and that existing approaches such as CIT are highly unlikely to generate them in practice. Results from our in-depth investigations serve as the foundation for developing new approaches to efficiently discovering effective configuration spaces. We proposed a new algorithm called interaction tree discovery (iTree) that aims to identify sets of configurations that are smaller than those generated by CIT, while also including important high-strength interactions missed by practical applications of CIT. On each iteration of iTree, we first use low-strength covering array to test the system under, and then apply machine learning techniques to discover new interactions that are potentially responsible for any new coverage seen. By repeating this process, iTree builds up a set of configurations likely to contain key high-strength interactions. We evaluated iTree and our results strongly suggest that iTree can identify high-coverage sets of configurations more effectively than traditional CIT or random sampling. We next developed the interaction learning approach that estimates the configuration interactions underlying the effective configuration space by building classification models for iTree execution results. This approach is light-weight, yet produces accurate estimates of the interactions; making leveraging effective configuration spaces practical for many software engineering tasks. Using this approach, we were able to approximate the effective configuration space of the ~1M-LOC MySQL, something that is infeasible using existing techniques, at very low cost.en_US
dc.titleUnderstanding, Discovering and Leveraging a Software System's Effective Configuration Spaceen_US
dc.typeDissertationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.contributor.departmentComputer Scienceen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledEmpirical Studyen_US
dc.subject.pquncontrolledMachine Learningen_US
dc.subject.pquncontrolledSoftware Configurationen_US
dc.subject.pquncontrolledSoftware Testingen_US
dc.subject.pquncontrolledSymbolic Evaluationen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record