Automated Techniques for Designing Embedded Signal Processors on Distributed Platforms

Thumbnail Image
Files KB)
No. of downloads: 308
CS-TR-3944.pdf(390.27 KB)
No. of downloads: 997
Publication or External Link
Kang, Dong-In
Gerber, Richard
Golubchik, Leana
In this paper, we present a performance-based technique to help synthesize high-bandwidth radar processors on commodity platforms. This problem is innately complex, for a number of reasons. Contemporary radars are very compute-intensive: they have high pulse rates, and they sample a large amount of range readings at each pulse. Indeed, modern radar processors can require CPU loads of in high-gigaflop to tera-flop ranges, performance which is only achieved by exploiting the radar's inherent data parallelism. Next-generation radars are slated to operate on scalable clusters of commodity systems. Throughput is only one problem. Since radars are usually embedded within larger real-time applications, they also must adhere to latency (or deadline) constraints. Building an embedded radar processor on a network of workstations (or a NOW) involves partitioning load in a balanced fashion, accounting for stochastic effects injected on all software-based systems, synthesizing runtime parameters for the on-line schedulers and drivers, and meeting the latency and throughput constraints. In this paper, we show how performance analysis can be used as an effective tool in the design loop; specifically, our method uses analytic approximation techniques to help synthesize efficient designs for radar processing systems. In our method, the signal-processor's topology is represented via a simple flow-graph abstraction, and the per-unit load requirements are modeled stochastically, to account for second-order effects like cache memory behavior, DMA interference, pipeline stalls, etc. Our design algorithm accepts the following inputs: (a)~the system topology, including the thread-to-CPU mapping, where multi-threading is assumed to be used; (b) the per-task load models; and (c) the required pulse rate and latency constraints. As output, it produces the proportion of load to allocate to each task, set at manageable time resolutions for the local schedulers; an optimal service interval over which all load proportions should be guaranteed; an optimal sampling frequency; and some reconfiguration schemes to accommodate single-node failures. Internally, the design algorithms use analytic approximations to quickly estimate output rates and propagation delays for candidate solutions. When the system is synthesized, its results are checked via a simulation model, which removes many of the analytic approximations. We show how our system synthesizes a real-time synthetic aperture radar, under a variety of loading conditions. Also cross-referenced as UMIACS TR # 98-57