Institute for Systems Research

Permanent URI for this communityhttp://hdl.handle.net/1903/4375

Browse

Search Results

Now showing 1 - 10 of 11
  • Thumbnail Image
    Item
    Split Recursive Least Squares: Algorithms, Architectures, and Applications
    (1994) Wu, A-Y.; Liu, K.J. Ray; ISR
    In this paper, a new computationally efficient algorithm for recursive least-squares (RLS) filtering is presented. The proposed Split RLS algorithm can perform the approximated RLS with O(N) complexity for signals having no special data structure to be exploited, while avoiding the high computational complexity (O(N2)) required in the conventional RLS algorithms. Our performance analysis shows that the estimation bias will be small when the input data are less correlated. We also show that for highly correlated data, the orthogonal preprocessing scheme can be used to improve the performance of the Split RLS. Furthermore, the systolic implementation of our algorithm based on the QR- decomposition RLS (QRD-RLS) arrays as well as its application to multidimensional adaptive filtering is also discussed. The hardware complexity for the resulting array is only O(N) and the system latency can be reduced to O(log2 N). The simulation results show that the Split RLS outperforms the conventional RLS in the application of image restoration. A major advantage of the Split RLS is its superior tracking capability over the conventional RLS under non-stationary environments.
  • Thumbnail Image
    Item
    An ESPRIT Algorithm for Tracking Time-Varying Signals
    (1992) Liu, K.J. Ray; O'Leary, D.P.; Stewart, G.W.; Wu, Y-J.; ISR
    ESPRIT is a successful algorithm for determining the constant directions of arrival of a set of narrowband signals on an array of sensors. Unfortunately, its computational burden makes it unsuitable for real time processing of signals with time-varying directions of arrival. In this work we develop a new implementation of ESPRIT that has potential for real time processing. It is based on a rank-revealing URV decomposition, rather than the eigendecomposition or singular value decomposition used in previous ESPRIT algorithms. We demonstrate its performance on simulated data representing both constant and time-varying signals. We find that the URV-based ESPRIT algorithm (total least squares variant) is effective for time- varying directions-of-arrival using either rectangular or exponential windowing techniques to diminish the effects of old information.
  • Thumbnail Image
    Item
    Real-Time Parallel and Fully-Pinelined Two-Dimensional DCT Lattice Structures with Application to HDTV Systems
    (1991) Chiu, Ching-Te; Liu, K.J. Ray; ISR
    The two-dimensional discrete cosine transform (2-D DCT) has been widely recognized as the most effective technique in image data compression. In this paper, we propose a new algorithm to compute the 2-D DCT from a frame-recursive point of view. Based on this approach, two real-time parallel lattice structures for successive frame and block 2-D DCT are developed. The systems is fully-pipelined with throughput rate N clock cycles for N x N successive input data frame. This is the fastest pipelined structure for the 2-D DCT known so far. Moreover, the 2-D DCT architecture is module, regular, and locally-connected and requires only two 1-D DCT blocks which can be extended directly from the 1-D DCT structure without transposition. Therefore, it is very suitable for VLSI implementation for the high speed HDTV systems. We also propose a parallel 2-D DCT architecture and a new scanning pattern for the HDTV system to achieve higher performance. The VLSI implementation of the 2-D DCT using distributed arithmetics to increase computational efficiency and reduce round off error is also discussed.
  • Thumbnail Image
    Item
    A Unified Approach for QRD-Based Recursive Least-Squres Estimation without Square Roots
    (1991) Hsieh, S.F.; Liu, K.J. Ray; Yao, K.; ISR
    The QR-decomposition (QRD)-based recursive least-squares (RLS) methods have been shown to be useful and effective towards adaptive signal processing in modern communications, radar, and sonar systems implementable with various modern parallel and systolic array architectures. The planar (Givens) and hyperbolic rotations are the most commonly used methods in performing the QRD up/downdating. But the generic formula for these rotations require explicit square-root (sqrt) computations, which constitute the computational bottleneck and are quite undesirable from the practical VLSI circuit design point of view. There has been more than ten sqrt-free algorithms known so far. In this paper, we provide a unified systematic approach for the sqrt-free QRD-based RLS estimation problem. By properly choosing two parameters, and v, all existing known sqrt-free methods fall in the category of our unified approach. The proposed method not only can generalize all currently known sqrt-free QRD algorithms, but also new sqrt-free algorithms as long as the parameters and v are properly chosen. The unified treatment is also extended to the QRD-based RLS problems for optimal residual acquisition without sqrt operations, and the systolic array implementation.
  • Thumbnail Image
    Item
    Fast Orthogonalization Algorithm and Parallel Implementation for AR Spectral Estimation Based on Forward-Backward Linear Prediction
    (1991) Liu, K.J. Ray; Hsieh, S.F.; ISR
    High-resolution spectral estimation is an important subject in many applications of modern signal processing. The fundamental problem in applying various high-resolution spectral estimation algorithms is the computational complexity. Recently, the truncated QR methods have been shown to be comparable to the SVD- based methods for the sinusoidal frequency estimation based on the forward-backward linear prediction (FBLP) model. However, without exploiting the special structure of the FBLP matrix, the QR decomposition (QRD) of the FBLP matrix has the computational complexity on the order of n cubic for a 2m x n FBLP matrix. Here we propose a fast algorithm to perform the QRD of the FBLP matrix. It is based on exploiting the special Toeplitz-Hankel form of the FBLP matrix. The computational complexity is then reduced to the order of n square. The fast algorithm can also be easily implemented onto a linear systolic array. The number of time steps required is further reduced to 2m + 5n - 4 by using the parallel implementation. The geometric transformation, which improves the numerical stability, for the downdating of the Cholesky factors is also considered.
  • Thumbnail Image
    Item
    Dual-State Systolic Architectures for Adaptive Filtering Using Up/Downdating RLS
    (1991) Hsieh, S.F.; Liu, K.J. Ray; Yao, K.; ISR
    We propose a dual-state systolic structure to perform joint up/down-dating operations encountered in windowed recursive least squares (RLS) estimation problems. It is derived by successively performing Givens rotations for updating and hyperbolic rotations for down-dating. Due to the data independency, a series of Givens and hyperbolic rotations can be interleaved and parallel processing can be achieved by alternatively performing updating and downdating both in time and space. This flip-flop nature of up/down-dating characterizes the feature of dual-state systolic triarray. To further reduce the complexity and increase the throughput rate, Cordic cells can be used to mimic the operations of rowbroadcasting and only one control bit is required along each row of processors. Efficient implementation to obtain optimal residuals and a transformation of the hyperbolic rotation to an algebraically equivalent orthogonal operation to provide a more stable implementation are also considered. This systolic architecture is very promising in VLSI implementation of the sliding-window recursive least squares estimations.
  • Thumbnail Image
    Item
    Unified Parallel Lattice Structures for Time-Recursive Discrete Cosine/Sine/Hartley Transforms
    (1991) Liu, K.J. Ray; Chiu, Ching-Te; ISR
    The problems of unified efficient computations of the discrete cosine transform (DCT), discrete sine transform (DST), discrete Hartley transform (DHT), and their inverse transforms are considered. In particular, a new scheme employing the time- recursive approach to compute these transforms is presented. Using such approach, unified parallel lattice structures that can dually generate the DCT and DST simultaneously as well as the DHT are developed. These structures can obtain the transformed data for sequential input time recursively and the total number of multipliers required is a linear function of the transform size N. Furthermore, there is no any constraint on N. The resulting architectures are regular, module, and without global communication so that it is very suitable for VLSI implementation for high-speed applications such as ISDN network and HDTV system. It is also shown in this paper that the DCT, DST, DHT and their inverse transforms share an almost identical lattice structure. The lattice structures can also be formulated into pre-lattice and post-lattice realizations. Two methods, the SISO and double- lattice approaches, are developed to reduce the number of multipliers in the parallel lattice structure by 2N and N respectively. The trade-off between time and area for the block data processing is also considered.
  • Thumbnail Image
    Item
    Systolic Implementations of Up/Down-dating Cholesky Factorization Using Vectorized Gram-Schmidt Pseudo Orthogonalization
    (1991) Hsieh, S.F.; Liu, K.J. Ray; Yao, K.; ISR
    We propose a new class of hyperbolic Gram-Schmidt methods to simultaneously update and downdate the Cholesky factor of a sample covariance matrix efficiently with applications to sliding window recursive least squares (RLS) filtering problems. Several vectorized versions of this Gram-Schmidt approach are introduced, which include conventional column-updating, modified row/column- updating, and square-root-free methods. Comparisons to the existing known methods, such as Householder transformation and Givens rotation, are also given. Upon further reformulating these algorithms, a systolic triarray structure is proposed to facilitate VLSI implementations.
  • Thumbnail Image
    Item
    Multi-phase Systolic Algorithms for Spectral Decomposition
    (1991) Liu, K.J. Ray; Yao, K.; ISR
    In this paper, we propose two multi-phase systolic algorithms to solve the spectral decomposition problem based on the QR algorithm. The spectral decomposition is one of the most computationally intensive modern signal processing operations. While the QR algorithm is well known to be an effective method to solve the eigenvalue problem, there is still no single systolic array architecture that can compute the unitary Q matrix readily and perform the QR algorithm efficiently. Previous methods using the QR algorithm had communication problems among different architectures. In this paper, two arrays, a triangular and a rectangular, are presented to implement the multi-phase algorithms. Details on these multi-phase operations of the QR algorithm as well as architectural consequences and performance evaluation are discussed in the paper. Efficient fault-tolerant schemes for these multi-phase operations are also considered.
  • Thumbnail Image
    Item
    Real-Time Algorithm-Based Fault-Tolerance for QRD Recursive Least-Squares Systolic Array: A Graceful Degradation Approach
    (1991) Liu, K.J. Ray; Yao, K.; ISR
    In this paper, we propose a new algorithm-based fault-tolerant method derived from the inherent nature of the QR lease-squares systolic algorithm. Since the residuals of different desired responses can be computed simultaneously, an artificial desired response can be designed to detect an error produced by a faulty processor. We show that if the artificial desired response is designed as some proper combinations of the input data, the output residual of the system will be zero if there is no fault. However, any occurring fault in the system will cause the residual to be non-zero and the fault can be detected in realtime. Once the fault has been detected, the system enters into the fault diagnosis phase from the concurrent error detection phase. Two methods, the flushing fault location and the checksum encoding methods, can be used to diagnose the location of the faulty row. When the faulty row is determined, this row and the associated column with the same boundary cell are eliminated by a reconfiguration operation. Then the system degrades in a graceful manner which is generally acceptable for many least-squares applications. Those eliminated processors enter into a self-checking phase, and when the transient fault condition is removed, a reconfiguration is performed to resume the normal full order operation. The analysis of error propagation and recovery latency is also considered in this paper.