Compiler-Decided Dynamic Memory Allocation for Scratch-Pad Based Embedded Systems

Loading...
Thumbnail Image

Files

umi-umd-3630.pdf (1.38 MB)
No. of downloads: 2343

Publication or External Link

Date

2006-07-27

Citation

DRUM DOI

Abstract

In this research we propose a highly predictable, low overhead and

yet dynamic, memory allocation strategy for embedded systems with

scratch-pad memory. A scratch-pad is a fast compiler-managed

SRAM memory that replaces the hardware-managed cache. It is

motivated by its better real-time guarantees vs cache and by its

significantly lower overheads in energy consumption, area and

overall runtime, even with a simple allocation scheme.

Scratch-pad allocation methods primarily

are of two types. First, software-caching schemes emulate the

workings of a hardware cache in software. Instructions are inserted

before each load/store to check the software-maintained cache tags.

Such methods incur large overheads in runtime, code size, energy

consumption and SRAM space for tags and deliver poor real-time

guarantees, just like hardware caches. A second category of

algorithms partitions variables at compile-time into the two banks.

However, a drawback of such static allocation schemes is that they

do not account for dynamic program behavior.

We propose a dynamic allocation methodology for global and stack

data and program code that (i) accounts for changing program

requirements at runtime (ii) has no software-caching tags (iii)

requires no run-time checks (iv) has extremely low overheads, and

(v) yields 100% predictable memory access times. In this method

data that is about to be accessed frequently is copied into the

scratch-pad using compiler-inserted code at fixed and infrequent

points in the program. Earlier data is evicted if necessary. When

compared to an existing static allocation scheme, results show that

our scheme reduces runtime by up to 39.8% and energy by up to

31.3% on average for our benchmarks, depending on the SRAM size

used. The actual gain depends on the SRAM size, but our results

show that close to the maximum benefit in run-time and energy is

achieved for a substantial range of small SRAM sizes commonly found

in embedded systems. Our comparison with a direct mapped cache shows

that our method performs roughly as well as a cached architecture in runtime

and energy while delivering better real-time benefits.

Notes

Rights