Institute for Systems Research
Permanent URI for this communityhttp://hdl.handle.net/1903/4375
Browse
Search Results
Item VLSI Design of High-Speed Time-Recursive 2-D DCT/IDCT Processor for Video Applications(1994) Srinivasan, V.; Liu, K.J. Ray; ISRIn this paper we present a full-customer VLSI design of high- speed 2-D DCT/IDCT Processor based on the new class of time- recursive algorithms and architectures which has never been implemented to prove its performance. We show that the VLSI implementation of this class of DCT/IDCT algorithms can easily meet the high-speed requirements of HDTV due to its modularity, regularity, local connectivity, and scalability. Our design of the 8 x 8 DCT/IDCT can operate at 50 MHz with a 400 Mbps throughput based on a very conservative estimate under 1.2 CMOS technology. In comparison to the existing designs, our approach offers many advantages that can be further explored for even higher performance.Item Split Recursive Least Squares: Algorithms, Architectures, and Applications(1994) Wu, A-Y.; Liu, K.J. Ray; ISRIn this paper, a new computationally efficient algorithm for recursive least-squares (RLS) filtering is presented. The proposed Split RLS algorithm can perform the approximated RLS with O(N) complexity for signals having no special data structure to be exploited, while avoiding the high computational complexity (O(N2)) required in the conventional RLS algorithms. Our performance analysis shows that the estimation bias will be small when the input data are less correlated. We also show that for highly correlated data, the orthogonal preprocessing scheme can be used to improve the performance of the Split RLS. Furthermore, the systolic implementation of our algorithm based on the QR- decomposition RLS (QRD-RLS) arrays as well as its application to multidimensional adaptive filtering is also discussed. The hardware complexity for the resulting array is only O(N) and the system latency can be reduced to O(log2 N). The simulation results show that the Split RLS outperforms the conventional RLS in the application of image restoration. A major advantage of the Split RLS is its superior tracking capability over the conventional RLS under non-stationary environments.Item A Class of Square Root and Division Free Algorithms and Architectures for QRD-Based Adaptive Signal Processing(1993) Frantzeskakis, Emmanuel N.; Liu, K.J. Ray; ISRThe least squares (LS) minimization problem constitutes the cores of many real-time signal processing problems, such as adaptive filtering, system identification and adaptive beamforming. Recently efficient implementations of the recursive least squares (RLS) algorithm and the constrained recursive least squares (CRLS) algorithm based on the numerically stable QR decomposition (QRD) have been of great interest. Several papers have proposed modifications to the rotation algorithm that circumvent the square root operations and minimize the number of divisions that are involved in the Givens rotation. It has also been shown that all the known square root free algorithms are instances of one parametric algorithm. Recently, a square root free and division free algorithm has been proposed [4].In this paper, we propose a family of square root and division free algorithms and examine its relationship with the square root free parametric family. We choose a specific instance for each one of the two parametric algorithms and make a comparative study of the systolic structures based on these two instances, as well as the standard Givens rotation. We consider the architectures for both the optimal residual computation and the optimal weight vector extraction.
The dynamic range of the newly proposed algorithm for QRD-RLS optimal residual computation and the wordlength lower bounds that guarantee no overflow are presented. The numberical stability of the algorithm is also considered. A number of obscure points relevant to the realization of the QRD-RLS and QRD-CRLS algorithms are clarified. Some systolic structures that are described in this paper are very promising, since they require less computational complexity ( in various aspects) than the structures known to date and they make the VLSI implementation easier.
Item Optimal Unified Architectures for the Real-Time Computation of Time-Recursive Discrete Sinusoidal Transforms(1993) Liu, K.J. Ray; Chiu, Ching-Te; Kolagotla, Ravi K.; JaJa, Joseph F.; ISRAn optimal unified architecture that can efficiently compute the Discrete Cosine, Sine, Hartley, Fourier, Lapped Orthogonal, and Complex Lapped transforms for a continuous input data stream is proposed. This structure uses only half as many multipliers as the previous best known scheme [1]. The proposed architecture is regular, modular, and has only local interconnections in both data and control paths. There is no limitation on the transform size N and only 2N - 2 multipliers are needed for the DCT. The throughput of this scheme is one input sample per clock cycle. We provide a theoretical justification by showing that any discrete transform whose basis functions satisfy the Fundamental recurrence Formula has a second-order autoregressive structure in its filter realization. We also demonstrate that dual generation transform pairs share the same autoregressive structure. We extend these time-recursive concepts to multi- dimensional transforms. The resulting d-dimensional structures are fully- pipelined and consist of only d 1-D transform arrays and shift registers.Item Time-Recursive Computation, Part II: Methodology, and Application on QMF Banks and ELT(1993) Frantzeskakis, Emmanuel N.; Baras, John S.; Liu, K.J. Ray; ISRRecent advances in ISDN have promoted applications such as video- phone, tele-conferencing and HDTV, that demand real-time processing of large volume audio, video and speech data. Being the only refuge for this intense computation, the VLSI technology favors modular and regular designs with local communication requirements. In this light, the framework for time-recursive computation, presented in part I [7] of this two-part paper, provides the background for designing efficient VLSI implementations, capable of accommodating high throughput requirements. In part II, we develop a routine that can be used for designing the time-recursive architecture of a given linear operator in a systematic manner. Three classes of QMF banks are used as design examples: the lossless QMF bank, the cosine modulated QMF bank and two Extended Lapped Transforms, one of them being the Modulated Lapped Transform (MLT). In addition to demonstrating the use of the design procedure, these examples provide novel results, interesting on their own right. In particular, the time-recursive architecture we propose for an N - point MLT, also known as Modified DCT or Time Domain Aliasing Cancellation (TDAC) transform, requires 2N + 3 multipliers, 3N + 3 adders and N - 1 rotation circuits.Item Time-Recursive Computation and Real-Time Parallel Architectures, Part I: Framework(1993) Frantzeskakis, Emmanuel N.; Baras, John S.; Liu, K.J. Ray; ISRThe time-recursive computation has been proved as a particularly useful tool in real-time data compression, in transform domain adaptive filtering and in spectrum analysis. Unlike the FFT based ones, the time-recursive architectures require only local communication. Also, they are modular and regular, thus they are very appropriate for VLSI implementation and they allow high degree of parallelism. In this two part paper, we establish an architectural frame work for parallel time-recursive computation. In part I, we consider a class of linear operators that consists of the discrete time, time invariant, compactly supported, but otherwise arbitrary kernel functions. We show that the structure of the realization of a given linear operator is dictated by the decomposition of the latter with respect to proper basis functions. An optimal way for carrying out this decomposition is demonstrated. The parametric forms of the basis functions are identified and their properties pertinent to the architecture design are studied. A library of architectural building modules capable of realizing these functions is developed. An analysis of the implementation complexity for the aforementioned modules is conducted. Based on this framework, an architecture design procedure is developed in Part II [12] that can be used for routinely obtaining the time-recursive architecture of a given linear operator.Item Optimal Unified Architectures for the Real-Time Computation of Time-Recursive Discrete Sinusoidal Transforms(1992) Liu, K.J. Ray; Chiu, Ching-Te; Kolagotla, Ravi K.; JaJa, Joseph F.; ISRAn optimal unified architecture that can efficiently compute the Discrete Cosine, Sine, Hartley, Fourier, Lapped Orthogonal, and the Complex Lapped transforms for a continuous input data stream is proposed. This structure uses only half as many multipliers as the previous best known scheme [1]. This architecture is regular, modular, and has only local interconnections in both the data and control paths. There is no limitation on the transform size N and only 2N - 2 multipliers are needed for the DCT. The throughput of this scheme is one input sample per clock cycle. We provide a theoretical justification by showing that any discrete transform whose basis functions satisfy the Fundamental Recurrence Formula has a second-order autoregressive structure in its filter realization. We also demonstrate that dual generation transform pairs share the same autoregressive structure. We extend these time-recursive concepts to multi-dimensional transforms. The resulting multi-dimensional structure are fully- pipelined and consist of only d 1-D transform arrays and shift registers, where d is the dimension.Item VLSI Implementation of Real-Time Parallel DCT/DST Lattice Structures for Video(1992) Chiu, Ching-Te; Kolagotla, Ravi K.; Liu, K.J. Ray; JaJa, Joseph F.; ISRThe alternate use [1] of the discrete cosine transform (DCT) and the discrete sine transform (DST) can achieve a higher data compression rate and less block effect in image processing. A parallel lattice structure that can dually generate the 1-D DCT and DST is proposed. We also develop a fully-pipelined 2-D DCT lattice architecture that consists of two 1-D DCT/DST arrays without transposition. Both architectures are ideally suited for VLSI implementation because they are modular, regular, and have only local interconnections. the VLSI implementation of the lattice module using the distributed arithmetic approach is described. This realization of the lattice module using 2 um CMOS technology can achieve an 80Mb/s data rate.Item Novel Parallel Architectures for Short-Time Fourier Transform(1992) Liu, K.J. Ray; ISRIn this paper, novel parallel architectures for short-time Fourier transform based on adaptive time-recursive processing is proposed for efficient VLSI implementation. Only N - 1 multipliers and N + 1 adders are required. The proposed approach can be easily extended to multi-dimensional cases without the transpose operation. Various properties of the proposed architectures are also presented.Item A Fully Parallel and Pipelined Systolic Array for MVDR Beamforming(1991) Tang, C.F.T.; Liu, K.J. Ray; ISRA fully-pipelined systolic array for computing the minimum variance distortionless response (MVDR) was first proposed by McWhirter and Shepherd. The fundamental concept is to fit the MVDR beamforming to the non-contrainted recursive least-squares (RLS) minimization. Until now, their systolic array processor is well-recognized as the most efficient design for MVDR beamforming. In this paper, we first point out the mistake by relating the MVRD beamforming and RLS minimization and then propose a new algorithm for the MVDR beamforming. Moreover, a fully parallel and pipelined systolic array for the newly proposed algorithm is presented and the square-root free implementation is also considered.
- «
- 1 (current)
- 2
- 3
- »