The Reliability of Systems with Two Levels of Fault Tolerance: The Return of the "Birthday Surprise"
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
This paper considers the reliability of systems that employ fault tolerance at two different hierarchical levels. Specifically, it assumes the system consists of a two-dimensional array of components. Each component is reliable as long as it has been afflicted by no more than t faults; when t + 1 faults occur in a particular component, the component ceases to be reliable. Furthermore, the system remains operative as long no more than one component in any row is unreliable. By generalizing the techniques used to analyze the well-known "birthday surprise" problem of applied probability, we derive an approximation to the average number of faults needed until the systems fails. Applications include random access memory systems with chip-level and board-level coding as well as fault-tolerant systolic arrays.