Institute for Systems Research Technical Reports

Permanent URI for this collectionhttp://hdl.handle.net/1903/4376

This archive contains a collection of reports generated by the faculty and students of the Institute for Systems Research (ISR), a permanent, interdisciplinary research unit in the A. James Clark School of Engineering at the University of Maryland. ISR-based projects are conducted through partnerships with industry and government, bringing together faculty and students from multiple academic departments and colleges across the university.

Browse

Search Results

Now showing 1 - 7 of 7
  • Thumbnail Image
    Item
    Optimal Unified Architectures for the Real-Time Computation of Time-Recursive Discrete Sinusoidal Transforms
    (1993) Liu, K.J. Ray; Chiu, Ching-Te; Kolagotla, Ravi K.; JaJa, Joseph F.; ISR
    An optimal unified architecture that can efficiently compute the Discrete Cosine, Sine, Hartley, Fourier, Lapped Orthogonal, and Complex Lapped transforms for a continuous input data stream is proposed. This structure uses only half as many multipliers as the previous best known scheme [1]. The proposed architecture is regular, modular, and has only local interconnections in both data and control paths. There is no limitation on the transform size N and only 2N - 2 multipliers are needed for the DCT. The throughput of this scheme is one input sample per clock cycle. We provide a theoretical justification by showing that any discrete transform whose basis functions satisfy the Fundamental recurrence Formula has a second-order autoregressive structure in its filter realization. We also demonstrate that dual generation transform pairs share the same autoregressive structure. We extend these time-recursive concepts to multi- dimensional transforms. The resulting d-dimensional structures are fully- pipelined and consist of only d 1-D transform arrays and shift registers.
  • Thumbnail Image
    Item
    VLSI Algorithms and Architectures for Time-Recursive Discrete Sinuoidal Transforms with Applications to Real-Time Video Communications
    (1992) Chiu, Ching-Te; Liu, K.J. Ray; ISR
    In this dissertation, we address the problem of developing efficient VLSI algorithms and architectures for discrete sinusoidal transforms in real-time applications for video communication systems. The major difficulty of this problem is that the resulting architectures should compute a huge amount of data at very high speed for real-time video applications and match the requirement of VLSI architectures, regularity, modularity and locality. In traditional FFT based algorithms, the serial data is buffered and then transformed using the FFT scheme.

    We propose a "time-recursive" approach to perform transforms that merge the buffering and transform operations into a single unit. The transformed data are updated according to a recursive formula, whenever a new datum arrives. Therefore the waiting time is completely eliminated. The unified lattice and IIR architectures for time-recursive transforms are proposed. The resulting architectures are regular, modular, and have only local interconnections and are better suited for VLSI implementations. There is no limitation on the transform size N and the number of multipliers required for computing the DCT by lattice and IIR structures are 6N - 8 and 2N - 2 respectively. In the case of dual generation of the DCT and DST by IIR structure, only 1.5N multipliers are required for each transform on average. The throughput of this scheme is one input sample per clock cycle.

    We also apply the time-recursive approach to multidimensional separable transforms. The resulting d- dimensional structures are fully-pipelined and consist of only d 1-D transform arrays and shift registers for computing a d-D DXT. The delay time due to transpositions of the conventional d-D transforms is eliminated in our approach. It is shown that the architectures is optimal in the sense that the number of the multipliers used is minimum and both speed and area are asymptotically optimal.

    The VLSI implementation of the lattice module based on the distributed arithmetic is also described. The chip can dually generate the DCT and DST simultaneously. It has been fabricated under 2 m double-metal CMOS technology and tested to be fully functional with a throughput rate 14.5-MHz and a data processing rate of 116Mb/s.

  • Thumbnail Image
    Item
    Optimal Unified Architectures for the Real-Time Computation of Time-Recursive Discrete Sinusoidal Transforms
    (1992) Liu, K.J. Ray; Chiu, Ching-Te; Kolagotla, Ravi K.; JaJa, Joseph F.; ISR
    An optimal unified architecture that can efficiently compute the Discrete Cosine, Sine, Hartley, Fourier, Lapped Orthogonal, and the Complex Lapped transforms for a continuous input data stream is proposed. This structure uses only half as many multipliers as the previous best known scheme [1]. This architecture is regular, modular, and has only local interconnections in both the data and control paths. There is no limitation on the transform size N and only 2N - 2 multipliers are needed for the DCT. The throughput of this scheme is one input sample per clock cycle. We provide a theoretical justification by showing that any discrete transform whose basis functions satisfy the Fundamental Recurrence Formula has a second-order autoregressive structure in its filter realization. We also demonstrate that dual generation transform pairs share the same autoregressive structure. We extend these time-recursive concepts to multi-dimensional transforms. The resulting multi-dimensional structure are fully- pipelined and consist of only d 1-D transform arrays and shift registers, where d is the dimension.
  • Thumbnail Image
    Item
    VLSI Implementation of Real-Time Parallel DCT/DST Lattice Structures for Video
    (1992) Chiu, Ching-Te; Kolagotla, Ravi K.; Liu, K.J. Ray; JaJa, Joseph F.; ISR
    The alternate use [1] of the discrete cosine transform (DCT) and the discrete sine transform (DST) can achieve a higher data compression rate and less block effect in image processing. A parallel lattice structure that can dually generate the 1-D DCT and DST is proposed. We also develop a fully-pipelined 2-D DCT lattice architecture that consists of two 1-D DCT/DST arrays without transposition. Both architectures are ideally suited for VLSI implementation because they are modular, regular, and have only local interconnections. the VLSI implementation of the lattice module using the distributed arithmetic approach is described. This realization of the lattice module using 2 um CMOS technology can achieve an 80Mb/s data rate.
  • Thumbnail Image
    Item
    Real-Time Parallel and Fully-Pinelined Two-Dimensional DCT Lattice Structures with Application to HDTV Systems
    (1991) Chiu, Ching-Te; Liu, K.J. Ray; ISR
    The two-dimensional discrete cosine transform (2-D DCT) has been widely recognized as the most effective technique in image data compression. In this paper, we propose a new algorithm to compute the 2-D DCT from a frame-recursive point of view. Based on this approach, two real-time parallel lattice structures for successive frame and block 2-D DCT are developed. The systems is fully-pipelined with throughput rate N clock cycles for N x N successive input data frame. This is the fastest pipelined structure for the 2-D DCT known so far. Moreover, the 2-D DCT architecture is module, regular, and locally-connected and requires only two 1-D DCT blocks which can be extended directly from the 1-D DCT structure without transposition. Therefore, it is very suitable for VLSI implementation for the high speed HDTV systems. We also propose a parallel 2-D DCT architecture and a new scanning pattern for the HDTV system to achieve higher performance. The VLSI implementation of the 2-D DCT using distributed arithmetics to increase computational efficiency and reduce round off error is also discussed.
  • Thumbnail Image
    Item
    Unified Parallel Lattice Structures for Time-Recursive Discrete Cosine/Sine/Hartley Transforms
    (1991) Liu, K.J. Ray; Chiu, Ching-Te; ISR
    The problems of unified efficient computations of the discrete cosine transform (DCT), discrete sine transform (DST), discrete Hartley transform (DHT), and their inverse transforms are considered. In particular, a new scheme employing the time- recursive approach to compute these transforms is presented. Using such approach, unified parallel lattice structures that can dually generate the DCT and DST simultaneously as well as the DHT are developed. These structures can obtain the transformed data for sequential input time recursively and the total number of multipliers required is a linear function of the transform size N. Furthermore, there is no any constraint on N. The resulting architectures are regular, module, and without global communication so that it is very suitable for VLSI implementation for high-speed applications such as ISDN network and HDTV system. It is also shown in this paper that the DCT, DST, DHT and their inverse transforms share an almost identical lattice structure. The lattice structures can also be formulated into pre-lattice and post-lattice realizations. Two methods, the SISO and double- lattice approaches, are developed to reduce the number of multipliers in the parallel lattice structure by 2N and N respectively. The trade-off between time and area for the block data processing is also considered.
  • Thumbnail Image
    Item
    Dynamic Range, Stability, and Fault-tolerant Capability of Finite-precision RLS Systolic Array Based on Givens Rotations
    (1990) Liu, K.J. Ray; Hsieh, S.F.; Yao, K.; Chiu, Ching-Te; ISR
    The QRD RLS algorithm is generally recognized as having good numerical properties under finite-precision implementation. Also, it is very suitable for VLSI implementation since it can be easily mapped onto a systolic array. However, it is still unclear how to obtain the dynamic range of the algorithm such that a wordlength can be chosen to ensure correct operations of the algorithm. In this paper, we first propose a quasi-steady state model by observing the rotation parameters generated by boundary cells will eventually reach quasi steady-state regardless of the input data statistics if l is close to one. With this model, we can obtain upper bounds of the dynamic range of processing cells. Thus, the wordlength can be obtained from upper bounds of the dynamic range to prevent overflow and to ensure correct operations of the QRD RLS algorithm. Then we reconsider the stability problem under quantization effects with more general analysis and obtain tighter bounds than given in a previous work [13]. Finally, two fault-tolerant problems, the missing error detection and the false alarm effect, the arise under finite- precision implementation are considered. Detail analysis on preventing missing error detection with a false alarm free condition is presented.