Real-Time Algorithm-Based Fault-Tolerance for QRD Recursive Least-Squares Systolic Array: A Graceful Degradation Approach
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
In this paper, we propose a new algorithm-based fault-tolerant method derived from the inherent nature of the QR lease-squares systolic algorithm. Since the residuals of different desired responses can be computed simultaneously, an artificial desired response can be designed to detect an error produced by a faulty processor. We show that if the artificial desired response is designed as some proper combinations of the input data, the output residual of the system will be zero if there is no fault. However, any occurring fault in the system will cause the residual to be non-zero and the fault can be detected in realtime. Once the fault has been detected, the system enters into the fault diagnosis phase from the concurrent error detection phase. Two methods, the flushing fault location and the checksum encoding methods, can be used to diagnose the location of the faulty row. When the faulty row is determined, this row and the associated column with the same boundary cell are eliminated by a reconfiguration operation. Then the system degrades in a graceful manner which is generally acceptable for many least-squares applications. Those eliminated processors enter into a self-checking phase, and when the transient fault condition is removed, a reconfiguration is performed to resume the normal full order operation. The analysis of error propagation and recovery latency is also considered in this paper.