Deep Analysis of Binary Code to Recover Program Structure

dc.contributor.advisorBarua, Rajeeven_US
dc.contributor.authorElWazeer, Khaleden_US
dc.contributor.departmentElectrical Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2014-06-26T05:35:26Z
dc.date.available2014-06-26T05:35:26Z
dc.date.issued2014en_US
dc.description.abstractReverse engineering binary executable code is gaining more interest in the research community. Agencies as diverse as anti-virus companies, security consultants, code forensics consultants, law-enforcement agencies and national security agencies routinely try to understand binary code. Engineers also often need to debug, optimize or instrument binary code during the software development process. In this dissertation, we present novel techniques to extend the capabilities of existing binary analysis and rewriting tools to be more scalable, handling a larger set of stripped binaries with better and more understandable outputs as well as ensuring correct recovered intermediate representation (IR) from binaries such that any modified or rewritten binaries compiled from this representation work correctly. In the first part of the dissertation, we present techniques to recover accurate function boundaries from stripped executables. Our techniques as opposed to current techniques ensure complete live executable code coverage, high quality recovered code, and functional behavior for most application binaries. We use static and dynamic based techniques to remove as much spurious code as possible in a safe manner that does not hurt code coverage or IR correctness. Next, we present static techniques to recover correct prototypes for the recovered functions. The recovered prototypes include the complete set of all arguments and returns. Our techniques ensure correct behavior of rewritten binaries for both internal and external functions. Finally, we present scalable and precise techniques to recover local variables for every function obtained as well as global and heap variables. Different techniques are represented for floating point stack allocated variables and memory allocated variables. Data type recovery techniques are presented to declare meaningful data types for the detected variables. Our data type recovery techniques can recover integer, pointer, structural and recursive data types. We discuss the correctness of the recovered representation. The evaluation of all the methods proposed is conducted on SecondWrite, a binary rewriting framework developed by our research group. An important metric in the evaluation is to be able to recompile the IR with the recovered information and run it producing the same answer that is produced when running the original executable. Another metric is the analysis time. Some other metrics are proposed to measure the quality of the IR with respect to the IR with source code information available.en_US
dc.identifier.urihttp://hdl.handle.net/1903/15449
dc.language.isoenen_US
dc.subject.pqcontrolledComputer engineeringen_US
dc.subject.pquncontrolledBinary Analysisen_US
dc.subject.pquncontrolledBinary Rewritingen_US
dc.subject.pquncontrolledBinary Understandingen_US
dc.subject.pquncontrolledReverse Engineeringen_US
dc.subject.pquncontrolledType Recoveryen_US
dc.subject.pquncontrolledVariable Recoveryen_US
dc.titleDeep Analysis of Binary Code to Recover Program Structureen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ElWazeer_umd_0117E_15040.pdf
Size:
1.97 MB
Format:
Adobe Portable Document Format