Institute for Systems Research Technical Reports

Permanent URI for this collectionhttp://hdl.handle.net/1903/4376

This archive contains a collection of reports generated by the faculty and students of the Institute for Systems Research (ISR), a permanent, interdisciplinary research unit in the A. James Clark School of Engineering at the University of Maryland. ISR-based projects are conducted through partnerships with industry and government, bringing together faculty and students from multiple academic departments and colleges across the university.

Browse

Search Results

Now showing 1 - 10 of 26

Split Recursive Least Squares: Algorithms, Architectures, and Applications
(1994) Wu, A-Y.; Liu, K.J. Ray; ISR
In this paper, a new computationally efficient algorithm for recursive least-squares (RLS) filtering is presented. The proposed Split RLS algorithm can perform the approximated RLS with O(N) complexity for signals having no special data structure to be exploited, while avoiding the high computational complexity (O(N2)) required in the conventional RLS algorithms. Our performance analysis shows that the estimation bias will be small when the input data are less correlated. We also show that for highly correlated data, the orthogonal preprocessing scheme can be used to improve the performance of the Split RLS. Furthermore, the systolic implementation of our algorithm based on the QR- decomposition RLS (QRD-RLS) arrays as well as its application to multidimensional adaptive filtering is also discussed. The hardware complexity for the resulting array is only O(N) and the system latency can be reduced to O(log2 N). The simulation results show that the Split RLS outperforms the conventional RLS in the application of image restoration. A major advantage of the Split RLS is its superior tracking capability over the conventional RLS under non-stationary environments.
A Class of Square Root and Division Free Algorithms and Architectures for QRD-Based Adaptive Signal Processing
(1993) Frantzeskakis, Emmanuel N.; Liu, K.J. Ray; ISR
The least squares (LS) minimization problem constitutes the cores of many real-time signal processing problems, such as adaptive filtering, system identification and adaptive beamforming. Recently efficient implementations of the recursive least squares (RLS) algorithm and the constrained recursive least squares (CRLS) algorithm based on the numerically stable QR decomposition (QRD) have been of great interest. Several papers have proposed modifications to the rotation algorithm that circumvent the square root operations and minimize the number of divisions that are involved in the Givens rotation. It has also been shown that all the known square root free algorithms are instances of one parametric algorithm. Recently, a square root free and division free algorithm has been proposed [4].
In this paper, we propose a family of square root and division free algorithms and examine its relationship with the square root free parametric family. We choose a specific instance for each one of the two parametric algorithms and make a comparative study of the systolic structures based on these two instances, as well as the standard Givens rotation. We consider the architectures for both the optimal residual computation and the optimal weight vector extraction.
The dynamic range of the newly proposed algorithm for QRD-RLS optimal residual computation and the wordlength lower bounds that guarantee no overflow are presented. The numberical stability of the algorithm is also considered. A number of obscure points relevant to the realization of the QRD-RLS and QRD-CRLS algorithms are clarified. Some systolic structures that are described in this paper are very promising, since they require less computational complexity ( in various aspects) than the structures known to date and they make the VLSI implementation easier.
Optimal Unified Architectures for the Real-Time Computation of Time-Recursive Discrete Sinusoidal Transforms
(1993) Liu, K.J. Ray; Chiu, Ching-Te; Kolagotla, Ravi K.; JaJa, Joseph F.; ISR
An optimal unified architecture that can efficiently compute the Discrete Cosine, Sine, Hartley, Fourier, Lapped Orthogonal, and Complex Lapped transforms for a continuous input data stream is proposed. This structure uses only half as many multipliers as the previous best known scheme [1]. The proposed architecture is regular, modular, and has only local interconnections in both data and control paths. There is no limitation on the transform size N and only 2N - 2 multipliers are needed for the DCT. The throughput of this scheme is one input sample per clock cycle. We provide a theoretical justification by showing that any discrete transform whose basis functions satisfy the Fundamental recurrence Formula has a second-order autoregressive structure in its filter realization. We also demonstrate that dual generation transform pairs share the same autoregressive structure. We extend these time-recursive concepts to multi- dimensional transforms. The resulting d-dimensional structures are fully- pipelined and consist of only d 1-D transform arrays and shift registers.
Time-Recursive Computation, Part II: Methodology, and Application on QMF Banks and ELT
(1993) Frantzeskakis, Emmanuel N.; Baras, John S.; Liu, K.J. Ray; ISR
Recent advances in ISDN have promoted applications such as video- phone, tele-conferencing and HDTV, that demand real-time processing of large volume audio, video and speech data. Being the only refuge for this intense computation, the VLSI technology favors modular and regular designs with local communication requirements. In this light, the framework for time-recursive computation, presented in part I [7] of this two-part paper, provides the background for designing efficient VLSI implementations, capable of accommodating high throughput requirements. In part II, we develop a routine that can be used for designing the time-recursive architecture of a given linear operator in a systematic manner. Three classes of QMF banks are used as design examples: the lossless QMF bank, the cosine modulated QMF bank and two Extended Lapped Transforms, one of them being the Modulated Lapped Transform (MLT). In addition to demonstrating the use of the design procedure, these examples provide novel results, interesting on their own right. In particular, the time-recursive architecture we propose for an N - point MLT, also known as Modified DCT or Time Domain Aliasing Cancellation (TDAC) transform, requires 2N + 3 multipliers, 3N + 3 adders and N - 1 rotation circuits.
Time-Recursive Computation and Real-Time Parallel Architectures, Part I: Framework
(1993) Frantzeskakis, Emmanuel N.; Baras, John S.; Liu, K.J. Ray; ISR
The time-recursive computation has been proved as a particularly useful tool in real-time data compression, in transform domain adaptive filtering and in spectrum analysis. Unlike the FFT based ones, the time-recursive architectures require only local communication. Also, they are modular and regular, thus they are very appropriate for VLSI implementation and they allow high degree of parallelism. In this two part paper, we establish an architectural frame work for parallel time-recursive computation. In part I, we consider a class of linear operators that consists of the discrete time, time invariant, compactly supported, but otherwise arbitrary kernel functions. We show that the structure of the realization of a given linear operator is dictated by the decomposition of the latter with respect to proper basis functions. An optimal way for carrying out this decomposition is demonstrated. The parametric forms of the basis functions are identified and their properties pertinent to the architecture design are studied. A library of architectural building modules capable of realizing these functions is developed. An analysis of the implementation complexity for the aforementioned modules is conducted. Based on this framework, an architecture design procedure is developed in Part II [12] that can be used for routinely obtaining the time-recursive architecture of a given linear operator.
Fast Algorithms for 2-D Circular Convolutions and Number Theoretic Transforms Based on Polynomial Transforms over Finite Rings
(1992) Ran, X.; Liu, K.J. Ray; ISR
In this paper, we develop new fast algorithms for 2-D integer circular convolutions and 2-D Number Theoretic Transforms (NTT). These new algorithms, which offers improved computational complexity, are constructed based on polynomial transforms over Zp; these transforms are Fourier-like transforms over Zp [x ] which is the integral domain of polynomial forms over Zp. Having defined such polynomial transforms over Zp, we prove several necessary and sufficient conditions for their existence. We then apply the existence conditions to recognize two applicable polynomial transforms over Zp: One is for p equal to Mersenne numbers and the other for Fermat numbers. Based on these two transforms, referred to as Mersenne Number Polynomial Transforms (MNPT) and Fermat Number Polynomial Transforms (FNPT), we develop fast algorithms for 2-D integer circular convolutions, 2-D Mersenne Number Transforms, and 2-D Fermat Number Transforms. As compared to the conventional row-column computation of 2-D NTT for 2-D integer circular convolutions and 2-D NTTs, the new algorithms give rise to reduced computational complexities by saving more than 25% or 42% in numbers of operations for multiplying 2i, i 1; these percentages of savings also grow with the size of the 2-D integer circular convolutions or the 2-D NTTs.
A Geometric Theorem for Adaptable Log-Periodic Broadband Antenna
(1992) Liu, K.J. Ray; ISR
This communication presents a new geometric theorem for application to log-periodic antenna. Based on the new geometric theorem, two log-periodic antennas can be generated together so that the directive gain control and beam-form scanning can be done for log-periodic antenna. We will also show that the conventional log-periodic antenna is a special case of the result presented in this paper.
Optimal Unified Architectures for the Real-Time Computation of Time-Recursive Discrete Sinusoidal Transforms
(1992) Liu, K.J. Ray; Chiu, Ching-Te; Kolagotla, Ravi K.; JaJa, Joseph F.; ISR
An optimal unified architecture that can efficiently compute the Discrete Cosine, Sine, Hartley, Fourier, Lapped Orthogonal, and the Complex Lapped transforms for a continuous input data stream is proposed. This structure uses only half as many multipliers as the previous best known scheme [1]. This architecture is regular, modular, and has only local interconnections in both the data and control paths. There is no limitation on the transform size N and only 2N - 2 multipliers are needed for the DCT. The throughput of this scheme is one input sample per clock cycle. We provide a theoretical justification by showing that any discrete transform whose basis functions satisfy the Fundamental Recurrence Formula has a second-order autoregressive structure in its filter realization. We also demonstrate that dual generation transform pairs share the same autoregressive structure. We extend these time-recursive concepts to multi-dimensional transforms. The resulting multi-dimensional structure are fully- pipelined and consist of only d 1-D transform arrays and shift registers, where d is the dimension.
An ESPRIT Algorithm for Tracking Time-Varying Signals
(1992) Liu, K.J. Ray; O'Leary, D.P.; Stewart, G.W.; Wu, Y-J.; ISR
ESPRIT is a successful algorithm for determining the constant directions of arrival of a set of narrowband signals on an array of sensors. Unfortunately, its computational burden makes it unsuitable for real time processing of signals with time-varying directions of arrival. In this work we develop a new implementation of ESPRIT that has potential for real time processing. It is based on a rank-revealing URV decomposition, rather than the eigendecomposition or singular value decomposition used in previous ESPRIT algorithms. We demonstrate its performance on simulated data representing both constant and time-varying signals. We find that the URV-based ESPRIT algorithm (total least squares variant) is effective for time- varying directions-of-arrival using either rectangular or exponential windowing techniques to diminish the effects of old information.
VLSI Implementation of Real-Time Parallel DCT/DST Lattice Structures for Video
(1992) Chiu, Ching-Te; Kolagotla, Ravi K.; Liu, K.J. Ray; JaJa, Joseph F.; ISR
The alternate use [1] of the discrete cosine transform (DCT) and the discrete sine transform (DST) can achieve a higher data compression rate and less block effect in image processing. A parallel lattice structure that can dually generate the 1-D DCT and DST is proposed. We also develop a fully-pipelined 2-D DCT lattice architecture that consists of two 1-D DCT/DST arrays without transposition. Both architectures are ideally suited for VLSI implementation because they are modular, regular, and have only local interconnections. the VLSI implementation of the lattice module using the distributed arithmetic approach is described. This realization of the lattice module using 2 um CMOS technology can achieve an 80Mb/s data rate.

Institute for Systems Research Technical Reports

Browse

Filters

Settings

Sort By

Results per page

Search Results