VLIW Instruction Scheduling for Reduced Code Size
dc.contributor.advisor | Barua, Rajeev K | en_US |
dc.contributor.author | Haga, Steve Wayne | en_US |
dc.contributor.department | Electrical Engineering | en_US |
dc.contributor.publisher | Digital Repository at the University of Maryland | en_US |
dc.contributor.publisher | University of Maryland (College Park, Md.) | en_US |
dc.date.accessioned | 2006-02-04T06:43:59Z | |
dc.date.available | 2006-02-04T06:43:59Z | |
dc.date.issued | 2005-12-05 | en_US |
dc.description.abstract | Code size is important to the cost of embedded systems. Although VLIW architectures are popular for embedded systems, they impose constraints on instruction placement that make it difficult to find a compact schedule. Existing VLIW instruction scheduling methods primarily target run-time but not code size. The usual approach has two components. First, methods such as trace scheduling provide a mechanism to correctly move instructions across basic blocks. Second, the instructions within a trace are scheduled, perhaps moving instructions across blocks. Because run-time is the only consideration, this approach increases code size by inserting compensation code. Methods such as superblocking increase the size even further by duplicating code. We present a compiler method for instruction scheduling that, for the first time, uses the power of across-block scheduling methods such as trace scheduling to reduce code size as well as run-time. For a certain class of VLIWs, we show that trace scheduling, previously synonymous with increased code size, can in fact reduce it. Our within-trace scheduler uses a cost-model driven, back-tracking approach. Starting with an optimal, exponential-time algorithm, branch-and-bound techniques and non-optimal heuristics reduce the compile time to within a factor of 2 of the original, on average. The code size for our benchmarks is reduced by 16.3% versus the best existing across-block scheduler, while being within 0.8% of its run-time, on a 6-wide VLIW. For a 3-wide VLIW, code size improves by 14.7%, with the same 0.8% run-time cost. Thus, the code size improvements are fairly stable across VLIW widths. We further explore the impact of our techniques on machines with predication support or small I-cache sizes. In the process, we present a novel predication analysis of general applicability. If predication is present, the code size improves to 16.6%. In addition, for machines with small I-caches, the reduced code size of our approach tends to yield better cache hit rates. We find that, although this effect is modest, the performance improvement more than offsets the run-time costs of our method. Therefore, on machines with small I-caches, our code size improvements are achievable at no run-time cost. | en_US |
dc.format.extent | 1330613 bytes | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | http://hdl.handle.net/1903/3077 | |
dc.language.iso | en_US | |
dc.subject.pqcontrolled | Engineering, Electronics and Electrical | en_US |
dc.subject.pqcontrolled | Computer Science | en_US |
dc.subject.pquncontrolled | VLIW | en_US |
dc.subject.pquncontrolled | instruction scheduling | en_US |
dc.subject.pquncontrolled | code size | en_US |
dc.subject.pquncontrolled | code compression | en_US |
dc.subject.pquncontrolled | NOPs | en_US |
dc.title | VLIW Instruction Scheduling for Reduced Code Size | en_US |
dc.type | Dissertation | en_US |
Files
Original bundle
1 - 1 of 1