Browsing by Author "Tseng, Chau-Wen"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item BioBench: A Benchmark Suite of Bioinformatics Applications(2005-03) Albayraktaroglu, Kursad; Jaleel, Aamer; Wu, Xue; Franklin, Manoj; Jacob, Bruce; Tseng, Chau-Wen; Yeung, DonaldRecent advances in bioinformatics and the significant increase in computational power available to researchers have made it possible to make better use of the vast amounts of genetic data that has been collected over the last two decades. As the uses of genetic data expand to include drug discovery and development of gene-based therapies, bioinformatics is destined to take its place in the forefront of scientific computing application domains. Despite the clear importance of this field, common bioinformatics applications and their implication on microarchitectural design have received scant attention from the computer architecture community so far. The availability of a common set of bioinformatics benchmarks could be the first step to motivate further research in this crucial area. To this end, this paper presents BioBench, a benchmark suite that represents a diverse set of bioinformatics applications. The first version of BioBench includes applications from different application domains, with a particular emphasis on mature genomics applications. The applications in the benchmark are described briefly, and basic execution characteristics obtained on a real processor are presented. Compared to SPEC INT and SPEC FP benchmarks, applications in BioBench display a higher percentage of load/store instructions, almost negligible floating point operation content, and higher IPC than either SPEC INT and SPEC FP applications. Our evaluation suggests that bioinformatics applications have distinctly different characteristics from the applications in both of the mentioned SPEC suites; and our findings indicate that bioinformatics workloads can benefit from architectural improvements to memory bandwidth and techniques that exploit their high levels of ILP. The entire BioBench suite and accompanying reference data will be made freely available to researchers.Item Compiler Optimizations for Eliminating Cache Conflict Misses(1998-10-15) Rivera, Gabriel; Tseng, Chau-WenLimited set-associativity in hardware caches can cause conflict misses when multiple data items map to the same cache locations. Conflict misses have been found to be a significant source of poor cache performance in scientific programs, particularly within loop nests. We present two compiler transformations to eliminate conflict misses: 1) modifying variable base addresses, 2) padding inner array dimensions. Unlike compiler transformations that restructure the computation performed by the program, these two techniques modify its data layout. Using cache simulations of a selection of kernels and benchmark programs, we show these compiler transformations can eliminate conflict misses for applications with regular memory access patterns. Cache miss rates for a 16K, direct-mapped cache are reduced by 35% on average for each program. For some programs, execution times on a DEC Alpha can be improved up to 60%. (Also cross-referenced as UMIACS-TR-97-59)Item Efficient Machine-Independent Programming of High-Performance Multiprocessors(1998-10-15) Tseng, Chau-WenParallel computing is regarded by most computer scientists as the most likely approach for significantly improving computing power for scientists and engineers. Advances in programming languages and parallelizing compilers are making parallel computers easier to use by providing a high-level portable programming model that protects software investment. However, experience has shown that simply finding parallelism is not always sufficient for obtaining good performance from today's multiprocessors. The goal of this project is to develop advanced compiler analysis of data and computation decompositions, thread placement, communication, synchronization, and memory system effects needed in order to take advantage of performance-critical elements in modern parallel architectures.Item Enhancing Software DSM for Compiler-Parallelized Applications(1998-10-15) Keleher, Pete; Tseng, Chau-WenCurrent parallelizing compilers for message-passing machines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine powerful shared-memory parallelizing compilers with software distributed-shared-memory (DSM) systems. We demonstrate such a system by combining the SUIF parallelizing compiler and the CVM software DSM. Innovations of the system include compiler-directed techniques that: 1) combine synchronization and parallelism information communication on parallel task invocation, 2) employ customized routines for evaluating reduction operations, and 3) select a hybrid update protocol that pre-sends data by flushing updates at barriers. For applications with sufficient granularity of parallelism, these optimizations yield very good speedups eight processors on an IBM SP-2 and DEC Alpha cluster, usually matching or exceeding the speedup of equivalent HPF and message-passing versions of each program. Based on our experimental results, we point out areas where additional compiler analysis and software DSM improvements can be used to achieve good performance on a broader range of applications. (Also cross-referenced as UMIACS-TR-96-70)Item ESTmapper: Efficiently Clustering EST Sequences Using Genome Maps(2004-04-19) Wu, Xue; Lee, Woei-Jyh (Adam); Gupta, Damayanti; Tseng, Chau-WenExpressed sequence tags (ESTs) are short transcribed nucleotide sequences that can be used to discover new genes and measuring gene expression. Because individual ESTs are short and error-prone, ESTs must first be clustered to be useful. In this paper, we describe ESTmapper, a new tool for clustering EST sequences based on efficiently mapping ESTs to the genome. Our mapping algorithm is based on first building an eager write-only top-down (WOTD) suffix tree for the genome, then searching for long common substrings between each EST and the genome to build matching regions, gapped local alignments between the EST and genome that account for sequencing errors and splicing. Long matching regions are then used to map ESTs to the genome and place ESTs into clusters based on location. Preliminary experimental evaluation shows that though ESTmapper requires a large amount of initial memory to store the genome suffix tree, it is quite precise and more efficient than previous techniques such as TGICL and PaCE when clustering large numbers of ESTs. (UMIACS-TR-2004-20)Item Evaluating the Impact of Memory System Performance on Software Prefetching and Locality Optimizations(2000-08-07) Aggarwal, Aneesh; Badawy, Abdel-Hameed A.; Tseng, Chau-Wen; Yeung, Donald;Software prefetching and locality optimizations are techniques for overcoming the gap between processor and memory speeds. Using the SimpleScalar simulator, we evaluate the impact of memory bandwidth and latency on the effectiveness of software prefetching and locality optimizations on three types of applications: regular scientific codes, irregular scientific codes, and pointer-based codes. We find software prefetching hides memory costs but increases instruction count and requires greater memory bandwidth. Locality optimizations change the computation order and data layout at compile or run time to eliminate cache misses, reducing memory costs without requiring more memory bandwidth. Combining prefetching and locality optimizations can improve performance, but interactions can also nullify the benefits of prefetching. We propose several algorithms to better integrate software prefetching and locality optimizations. (Also cross-referenced as UMIACS-TR-2000-57)Item Improving Locality For Adaptive Irregular Scientific Codes(1999-09-25) Han, Hwansoo; Tseng, Chau-WenAn important class of scientific codes access memory in an irregular manner. Because irregular access patterns reduce temporal and spatial locality, they tend to underutilize caches, resulting in poor performance. Researchers have shown that consecutively packing data relative to traversal order can significantly reduce cache miss rates by increasing spatial locality. In this paper, we investigate techniques for using partitioning algorithms to improve locality in adaptive irregular codes. We develop parameters to guide both geometric (RCB) and graph partitioning (METIS) algorithms, and develop a new graph partitioning algorithm based on hierarchical clustering (GPART) which achieves good locality with low overhead. We also examine the effectiveness of locality optimizations for adaptive codes, where connection patterns dynamically change at intervals during program execution. We use a simple cost model to guide locality optimizations when access patterns change. Experiments on irregular scientific codes for a variety of meshes show our partitioning algorithms are effective for static and adaptive codes on both sequential and parallel machines. Improved locality also enhances the effectiveness of LocalWrite, a parallelization technique for irregular reductions based on the owner computes rule. Also cross-referenced as UMIACS-TR-99-41Item Software Support For Improving Locality in Advanced Scientific Codes(2000-08-07) Tseng, Chau-WenScientists today rely on powerful computers to perform simulations critical for research and development. Modern microprocessors provide high performance by exploiting data locality with carefully designed multi-level caches. Programs can achieve good performance only if they possess data locality, keeping most data in cache and avoiding accesses to memory. Compiler transformations can improve locality and achieve large performance improvements, particularly for linear algebra codes. However, as scientific computations increase in complexity, they employ advanced features such as 3D arrays, sparse meshes, and pointer-based data structures that make it difficult to utilize caches well. This proposal aims to develop and evaluate software support for improving locality for advanced scientific applications for both sequential and parallel machines. The basic premise is that both compile-time analyses and sophisticated run-time systems are necessary. Run-time systems are needed because many programs are not analyzable statically. Compiler support is crucial both for inserting interfaces to the run-time system and for directly applying program transformations where possible. Cooperation between the compiler and run-time will be critical for advanced scientific codes. (Also cross-referenced as UMIACS-TR-2000-56)