VLIW Instruction Scheduling for Reduced Code Size

dc.contributor.advisorBarua, Rajeev Ken_US
dc.contributor.authorHaga, Steve Wayneen_US
dc.contributor.departmentElectrical Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2006-02-04T06:43:59Z
dc.date.available2006-02-04T06:43:59Z
dc.date.issued2005-12-05en_US
dc.description.abstractCode size is important to the cost of embedded systems. Although VLIW architectures are popular for embedded systems, they impose constraints on instruction placement that make it difficult to find a compact schedule. Existing VLIW instruction scheduling methods primarily target run-time but not code size. The usual approach has two components. First, methods such as trace scheduling provide a mechanism to correctly move instructions across basic blocks. Second, the instructions within a trace are scheduled, perhaps moving instructions across blocks. Because run-time is the only consideration, this approach increases code size by inserting compensation code. Methods such as superblocking increase the size even further by duplicating code. We present a compiler method for instruction scheduling that, for the first time, uses the power of across-block scheduling methods such as trace scheduling to reduce code size as well as run-time. For a certain class of VLIWs, we show that trace scheduling, previously synonymous with increased code size, can in fact reduce it. Our within-trace scheduler uses a cost-model driven, back-tracking approach. Starting with an optimal, exponential-time algorithm, branch-and-bound techniques and non-optimal heuristics reduce the compile time to within a factor of 2 of the original, on average. The code size for our benchmarks is reduced by 16.3% versus the best existing across-block scheduler, while being within 0.8% of its run-time, on a 6-wide VLIW. For a 3-wide VLIW, code size improves by 14.7%, with the same 0.8% run-time cost. Thus, the code size improvements are fairly stable across VLIW widths. We further explore the impact of our techniques on machines with predication support or small I-cache sizes. In the process, we present a novel predication analysis of general applicability. If predication is present, the code size improves to 16.6%. In addition, for machines with small I-caches, the reduced code size of our approach tends to yield better cache hit rates. We find that, although this effect is modest, the performance improvement more than offsets the run-time costs of our method. Therefore, on machines with small I-caches, our code size improvements are achievable at no run-time cost.en_US
dc.format.extent1330613 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/3077
dc.language.isoen_US
dc.subject.pqcontrolledEngineering, Electronics and Electricalen_US
dc.subject.pqcontrolledComputer Scienceen_US
dc.subject.pquncontrolledVLIWen_US
dc.subject.pquncontrolledinstruction schedulingen_US
dc.subject.pquncontrolledcode sizeen_US
dc.subject.pquncontrolledcode compressionen_US
dc.subject.pquncontrolledNOPsen_US
dc.titleVLIW Instruction Scheduling for Reduced Code Sizeen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
umi-umd-2885.pdf
Size:
1.27 MB
Format:
Adobe Portable Document Format