Designing Practical Efficient Algorithms for Symmetric Multiprocessors
Abstract
Symmetric multiprocessors (SMPs) dominate the high-end server market and are
currently the primary candidate for constructing large scale multiprocessor
systems. Yet, the design of efficient parallel algorithms for this platform
currently poses several challenges. In this paper, we present a computational
model for designing efficient algorithms for symmetric multiprocessors. We
then use this model to create efficient solutions to two widely different
types of problems - linked list prefix computations and generalized sorting.
Our novel algorithm for prefix computations builds upon the sparse ruling set
approach of Reid-Miller and Blelloch. Besides being somewhat simpler and
requiring nearly half the number of memory accesses, we can bound our
complexity with high probability instead of merely on average. Our algorithm
for generalized sorting is a modification of our algorithm for sorting by
regular sampling on distributed memory architectures. The algorithm is a
stable sort which appears to be asymptotically faster than any of the
published algorithms for SMPs. Both of our algorithms were implemented in C
using POSIX threads and run on three symmetric multiprocessors - the DEC
AlphaServer, the Silicon Graphics Power Challenge, and the HP-Convex Exemplar.
We ran our code for each algorithm using a variety of benchmarks which we
identified to examine the dependence of our algorithm on memory access
patterns. In spite of the fact that the processors must compete for access
to main memory, both algorithms still yielded scalable performance up to 16
processors, which was the largest platform available to us. For some
problems, our prefix computation algorithm actually matched or exceeded the
performance of the best sequential solution using only a single thread.
Similarly, our generalized sorting algorithm always beat the performance of
sequential merge sort by at least an order of magnitude, even with a single
thread. (Also cross-referenced as UMIACS-TR-98-44)