Compiler-Decided Dynamic Memory Allocation for Scratch-Pad Based Embedded Systems
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
In this research we propose a highly predictable, low overhead and
yet dynamic, memory allocation strategy for embedded systems with
scratch-pad memory. A scratch-pad is a fast compiler-managed
SRAM memory that replaces the hardware-managed cache. It is
motivated by its better real-time guarantees vs cache and by its
significantly lower overheads in energy consumption, area and
overall runtime, even with a simple allocation scheme.
Scratch-pad allocation methods primarily
are of two types. First, software-caching schemes emulate the
workings of a hardware cache in software. Instructions are inserted
before each load/store to check the software-maintained cache tags.
Such methods incur large overheads in runtime, code size, energy
consumption and SRAM space for tags and deliver poor real-time
guarantees, just like hardware caches. A second category of
algorithms partitions variables at compile-time into the two banks.
However, a drawback of such static allocation schemes is that they
do not account for dynamic program behavior.
We propose a dynamic allocation methodology for global and stack
data and program code that (i) accounts for changing program
requirements at runtime (ii) has no software-caching tags (iii)
requires no run-time checks (iv) has extremely low overheads, and
(v) yields 100% predictable memory access times. In this method
data that is about to be accessed frequently is copied into the
scratch-pad using compiler-inserted code at fixed and infrequent
points in the program. Earlier data is evicted if necessary. When
compared to an existing static allocation scheme, results show that
our scheme reduces runtime by up to 39.8% and energy by up to
31.3% on average for our benchmarks, depending on the SRAM size
used. The actual gain depends on the SRAM size, but our results
show that close to the maximum benefit in run-time and energy is
achieved for a substantial range of small SRAM sizes commonly found
in embedded systems. Our comparison with a direct mapped cache shows
that our method performs roughly as well as a cached architecture in runtime
and energy while delivering better real-time benefits.